About SAN storage at Thoughtworks - Part 2: dynamicproxy

dynamicproxy

About SAN storage at Thoughtworks - Part 2

Oct 08, 2011 16:39

Given that the various high end SAN vendors all receive good reviews for their performance, let's see what some of these performance criteria are:
a. Excellent Disk IO !
b. Excellent Network IO !
c. Excellent caching mechanisms !
d. Optimized network stacks for various protocols !
e. Fancy reporting for the management !

There are also other interesting features such as:
a. being able to use multiple connectivity mediums (1G and 10G Ethernet, Fibre Channel).
b. Phone home, where the the device diagnoses issues if any, and sends the SAN company a message so that they can take pre-emptive action.
c.

Here are something that we liked about the Oracle SAN (the others have some of these, but not all).
1. A full blown Enterprise grade Operating System.
This SAN runs a version of OpenSolaris, based on the same enterprise grade OS that powers many of the world's performance critical environments. As screenshots in further blog posts will show, this SAN gives great performance + analytics while using neglibile amounts of CPU.

2. An excellent network stack.
Unlike some earlier versions of Solaris which were nicknamed "Slowlaris", Solaris 10 receive a TCP/IP stack rewrite. This rewrite was nicknamed "Fire Engine". Later, the developers went on to improve network performance in many other ways too, with quite some of those benefits making their way into this SAN.

3. The ZFS file system.
This filesystem was designed with some good thinking about and questioning of lessons learned in the past, and whether they need to be applied today or not. ZFS has long been acknowledged as being a really superlative filesystem, with ports to some other operating systems as well. There are efforts to write equivalents for other platforms such as Linux to which ZFS cannot be ported to for legal (licensing) reasons.

Some interesting features here are the modified elevator-seek mechanism, near-platter-speed access rates, end-to-end checksumming to provide you with greater reliability.

4. Caching.
One interesting benefit of having ZFS around is the improved caching. ZFS lets you specify read cache devices and write cache devices. These cache devices can be ordinary disks, but practically, everyone uses SSD devices for read and write caching. This means, you can start off as we did with having just one cache, and then using the Analytics to "size your requirements". In terms of storage, this means you could now use Analytics to determine whether you have more read or write operations, and whether you need a read or a write cache, how much, etc.

Sinze ZFS is part of the kernel, it can play with unused RAM and "soft allocate" some RAM for use as a read cache for very frequently accessed blocks of data (in case of LUNs) or file blocks or even entire files (in case of NFS and CIFS). What this means is, most of your frequently accessed data will reside in RAM and be served from there. In case the kernel needs to allocate some memory, it'll take away some from ZFS' RAM cache on a need basis.

So a RAM cache coupled with a Read and/or Write cache depending upon your requirements, can do wonders for your performance, over and above what ZFS acces speeds themselves do.

5. End to end checksums in the file system
This one deserves a point by itself. One problem that had hit me twice with the the MD3000i SANs, is that my applications complained about data errors. This was really dangerous for us at work, since this happened with active source code once, and with a VM in another case. The SAN reported faithfully that it had no corruptions in data, and yet I could see with my own eyes that I was in a bit of a mess with having to restore from backups, and rebuild from individual commits made during the day.

ZFS has this notion of an end to end checksum, where a block of data is checksummed, and then sent to the storage sub-system for a write, and there's a reference block that contains its checksum, with the grand parent block containing that parent block's checksum, etc. The intent is to be able to check whether a block that's been retrieved has been retrieved generates the same checksum or not. In case there's a checksum mismatch - say due to media error - ZFS knows to retrieve an identical block and help you get back your data. There's some documentation + diagrams that explain this better than I have.

Now, end to end checksums in a SAN do not guarantee to you at all that your server (the SAN's consumer/customer) will always get the data that it had sent to the SAN. There are many locations where the data could have got corrupted on the way to/from the SAN - the server's RAM (this is why you need ECC!), the network device driver or the OS' TCP stack may have some bug, the NIC card may be faulty, the network cable and switch may have some problems of their own, etc. You get the point.

But what end to end checksumming _will_ assure you about is, once the data reaches the Oracle SAN's network driver, from that point on till it gets written to the storage medium, it'll receive high fidelity checksum calculation, and this will be used to validate the data just before it's dispatched to the SAN's NIC.

For ideal quality, you should run ZFS on your server and have ECC RAM, but for those of us who have other use cases like having to run VMWare, you can at least rest assured that once your data is written to that pool, you can is even most of the worst cases get some or all of it back.

6. Excellent reporting.
You need to see this for yourself lest you consider me biased. Most other SAN devices provide you with what I call "defensive reporting", where the Storage Administrator gets to show that his SAN's giving great disk I/O operations. If someone were to ask a non-Oracle iSCSI SAN user "please tell me which of my VMWare servers is accessing which of the SAN's LUNs", then the storage admin would likely thrust some more defensive reporting in their face and ask them to get lost. If pushed to the wall, he'll simply call the SAN vendor's tech support team, and they'll ofcourse come scrambling to his aid to throw more jargon.

Apart from the fact that such an attitude doesn't take anyone anywhere, is the fact that better/alternate reporting simply doesn't exist on these other SANs.

With the Oracle SAN, you'll find that you'll be able to design and sketch interactive and drill down reports on demand. Drill down reports are awesome. Here's an everyday scenario "Hi, the VMs are slow, could you tell us what's wrong ?", "Sure... hmm... VMWare's CPU and RAM utilization continues to remain low, let me check the SAN" (By now, I trust the Oracle SAN's reporting since it' way more superlative). "Ok, I see high write operations from two VMWare servers, they're acessing LUN_EnvironmentD and Lun_EnvironmentJ, what're you guys doing on that ?" "Aah, there's a deploy going on" "Well, I see that the storage is well within the IOPS threshold" "Must be those script changes that we put in. Anyway, thanks for helping us arrive at this so soon". Apart from the time required to log on to the VMWare Management console and the SAN's own Web console, all SAN analytics literally takes as much time as it'd take to speak this conversation I've listed.

7. Direct support from Oracle.
Apparently, before Oracle acquired Sun, support would be available via channel partners too. Post the acquisition, Oracle now handles all support cases directly. There are pros and cons to this, I feel. There may be channel partners and their team members who may have wanted a career on configuring SAN storage, while continuing to play the role of a generalist. Now, they need to be lucky to get into a technologically diverse company like mine, or join Oracle !

On the other hand, since Oracle have their reputation on the line, you get to speak to people who have access to all manner of skillsets within the company. When crisis strikes, and your customers are screaming seeking escalation, you can now reply that this is the highest you can escalate. (see below).

8. Phone home.
Since I've seen this on NetApp and EMC too, I presume that all higher end storage vendors have this feature today. I know that even within a company such as Thoughtworks, not everyone is or will be as familiar with ZFS and related topics as I am. So, higher end SAN devices today run a number of self-diagnostics, and in case they find out any errors, they send some diagnostic data back to the SAN support team. Such teams take a judgement on what needs to be done, get in touch with the customer and set fixes in motion. These could be pre-emptively replacing disks, asking the customer to add more cache, recommend a reconfiguration, etc.

Alright, more in another blog post !

opensolaris, belenix