Archive for the ‘Uncategorized’ Category

A practical use for SSDs – that won’t break your budget (think caching…..)

January 23, 2009

In my last post I (briefly – too briefly?) discussed some of the basic principles of solid state drives (SSDs).  I want to dig a bit deeper in this post and focus on the enterprise space only, and suggest an example of where current SSDs that you can buy today are a great option to increase performance of an otherwise pokey storage array.

First off – what do I mean by “enterprise products”?  If you’ve used that phrase in a sentence in the last week, feel free to skip this section.  For this discussion, “enterprise” storage means disks (rotating or SSD) that are installed in server or storage systems, need to provide fast response time to data requests, and are typically used in systems shared by multiple users and accessed by multiple clients.

A couple of typical enterprise applications would be e-mail and databases.   Regardless of if the storage is inside the server (for smaller installations) or centralized in a storage area network (SAN), enterprise drives need to provide fast access, store large amounts of data, and must do so reliably.

For this post, let’s just limit the discussion to a SAN.  Typically, SANs are built around multiple storage appliances running some sort of data provisioning application and affording access to their storage to one or more remote systems.  The SAN can use Fibre Channel (FC) or iSCSI (IP) – when considering drives, this part doesn’t matter.   Also – don’t confuse FC drives with FC SANs – they are not the same thing.

So – suppose we have an IP SAN that is built around a few storage appliances that are all interconnected and work cooperatively to share the storage load.  Suppose that on that SAN we want to run a few instances of SQL, a couple of Exchange message stores, and say, a few Exchange logs as well.  We’d also like to provision some of that SAN for general purpose use by our network users (via some sort of file server head in front of the SAN).

In the case of the database instances, the Exchange servers, and the log files – performance is king.  Performance can be discussed primarily in two terms: input/output operations per second (IOPs) and total bytes transferred per second (throughput).  Typically most database admins are looking for IOPs first, throughput second (not always, but usually).

So – if we are looking to maximize IOPs, how do we do that now?  What are considerations when designing a SAN for maximum IOPs?

Look at the worst case scenario – random IOPs.  Many database records are access randomly, there is no set pattern of access.  Just because a sector is read at a point in time, there is no assurance that the next sector that needs to be read will be the next sector on the disk.  In fact, just the opposite is often the case.  Sector access is typically random.  Let’s assume nearly 100% random access.

Now, suppose our SAN is built of 4 storage units, each with 12 x 3.5″ drives for a total of 48 hard drives.  Seem like a lot?  It isn’t.  How do we get the best possible performance out of this SAN?  Simple, we load every drive slot with the fastest drive available – right now that would be a SAS drive running at 15K RPM.  To further increase performance, we would limit the amount of data on each SAS drive to some small percentage of its total capacity.  We do this because hard drives read data starting at the outside of the platter.  The outside of the platter has a larger circumference than the inside and since we have fixed spindle speed of 15K RPM, the outside of the disk is the fastest part.

Now we have our SAN built, our RAID arrays set up, and we’ve limited the amount of data stored on each SAS disk to ensure we get maximum performance.

What happens if we deploy this system, add some real data, and find out that it just isn’t fast enough?  Now what do we do?  Since we’ve optimized the speed of each drive and since we have set up our RAID protection to be speedy as well (say, a RAID 10) we have one choice – add more disks.  Adding more disks to the arrays should improve performance, but at what cost?  We’ll need at least one more storage shelf (ours are full), we’ll need more power, we’ll need to plan for more down time (more disks = more chances of failure).  Or at least, that’s what we used to have to do.

Many SAN software packages support data tiering – that is, some sort of internal mechanism that will exploit faster arrays to act as cache devices for slower arrays, in effect “fronting” the slow disks with fast one.  Fast disks are expensive, so you will typically see some sort of mix of fast (SAS, 15K RPM) and slow (SATA, 7.2K RPM) disks in the same SAN.  To dig a bit deeper into this point, we’ll look at one such SAN software product – FalconStor’s Network Storage Server (NSS).  NSS is just one example, there are many others that offer a similar mechanism.

NSS has two built in features that can help improve disk array performance when “fronting” slower drives with faster ones.  These features are called “Safe Cache” and “Hot Zone.”  I’ll dig into each a bit more below.

So, back to our SAN – we’ve got all our drive slots filled with the fastest drives we could buy and we still don’t get the speed our applications require.  Now what?  Buy more disks?  Buy more storage units?  Budget for more power to run them?  No.  Instead – consider an SSD.

“But SSDs are too expensive – I can’t afford to fill a 48 drive SAN with all SSDs!!!”  I know, but that’s not what I’m suggesting.

Instead of doing a fork lift upgrade and replacing all of the SAS drives in your SAN, consider adding only one or two SSDs and using them as a front end cache for the rotating drives.  This won’t break your budget, is an easy install, and will improve the performance of the entire array.  How?

I mentioned the “Safe Cache” feature of NSS – let’s start there.  Safe Cache lets the SAN administrator designate a LUN to be a cache device.  That LUN should be the fastest LUN on the SAN and will (typically) be relatively small.  Safe Cache will ensure that all incoming disk writes – regardless of the destination LUN, get written to the Safe Cache LUN first – in this case, a LUN comprised of SSDs.  The write sequence would look something like this:

data sent to SAN –> data written to SSD Safe Cache LUN –> ack sent back to application sending the data –> data migrated automatically from the Safe Cache LUN to the rotating drives

Because the ack is sent back to the system sending the data as soon as it is written to the SSD, the response time of the SAN is improved substantially.   Because the data is written to an SSD, it isn’t nearly as vulnerable to loss due to power failure compared to using a RAM drive as a Safe Cache LUN (FWIWNSS supports RAM drives as Safe Cache LUNs too, but because they are so volatile, I would not suggest one do that).  By automatically moving the data from the SSD Safe Cache LUN to the rotating disks, the Safe Cache is then cleared for the next incoming data write and the process starts all over.  Think that’s cool?  Check out “Hot Zone” below.

“Hot Zone” is another feature of NSS that seems to have been built for SSD use.  Hot Zone also requires the use of a fast LUN as a cache, but it uses that cache differently.  Hot Zone works great for random access database deployments.

The Hot Zone feature monitors the disk access patterns of the data stored on the rotating drives.  The user sets a threshold that says (basically) – when a sector is access this many times, automatically copy that data from the rotating disk array (the slow array) into the Hot Zone array, the fast SSD array.  That way as data blocks are accesses frequently, they are automatically brought into the SSD cache such that the next time the data is served, it comes from the much faster SSD array, not the slower rotating array.

What happens to the data in the SSD cache array if it isn’t accessed?  Suppose a given record in a database were accessed frequently (copying it into the SSD cache), then for some reason that frequent access stopped.  Then what?

Hot Zone would sense this and, base on user set thresholds, would migrate the data back into the (slower) rotating disk array, freeing up the space in the Hot Zone for additional frequently accessed data to be automatically brought into the Hot Zone.

I won’t get into all the tuning details of Hot Zone (maybe in another post), but FalconStor’s NSS is just a single example of storage provisioning software that can – out of the box – easily exploit SSD performance to substantially improve SAN performance without huge expense or a forklift upgrade.