Windows Azure IaaS Performance – SQL and IOps

It seems that recently there is a lot of buzz in the technology space around the word “Cloud”, though most people seem to think that applying old ways of developing software and just hosting them on someone else’s hardware equals achieving cloud greatness. In truth, the “cloud” is a new beast  - a completely different tool. One of the greatest examples of confusion is with the newly announced Windows Azure IaaS (Infrastructure as a Service).

This post will answer the following questions:

  • Am I just renting rack space from a Microsoft data center?
  • What is “horizontal” vs “vertical” scaling?
  • What speed is the hard-drive on an IaaS VM?
  • Does mounting more drives give more performance?
  • How does the Azure Storage SLA (service level agreement) get involved?
  • How does SQL Server perform in the IaaS environment?
  • What needs to be improved before I should use IaaS in Azure?

 Am I just renting rack space from a Microsoft data center?

This first question is highly important to help you understand the “cloud” better. Renting your own “dedicated server”, whether virtual or physical, has been around for many years. So you may be tempted to think of IaaS as the same thing – just provisioning a new instance of Windows Server or Linux in a data center that is armed with guards on Microsoft’s salary. That is the worst way of looking at IaaS.

Microsoft itself has been marketing IaaS as an ‘on-ramp to PaaS (platform as a service)’. Meaning, you should be using PaaS for your new designs, but they realize that some systems (such as Active Directory, Share Point, SQL Server Enterprise edition) might be mission critical – and you may not want to punch a whole in your firewall… in fact, you may want your entire data center to be hosted by someone else. To this end, IaaS is an on-ramp.

I personally think that viewing IaaS as simply an on-ramp to PaaS is the second worst way of looking at it. Why? Because the beauty of IaaS is not in the “hosting of a VM somewhere”… but in the “turn a dial, and now you’ve multiplied your horse power by 1,000″ – meaning, using it as an [infrustructure as a] SERVICE. This is known as “horizontal” scaling.

Quick definition: What is horizontal verses vertical scaling? Imagine have a SQL Server box with 16GB of RAM and 4 CPUs that handles all of your business needs. Suddenly your business increases 10 fold! So, you “vertically” scale your SQL box to have 160GB of RAM and 40 CPUs! However, with the cloud, you would instead “horizontally” scale by having 10 separate SQL boxes with 16GB of RAM and 4 CPUs each. Makes sense right?

What speed is the hard-drive on an Azure IaaS VM?

This question may seem random, but it’s not. In fact, it’s very important when you are considering IaaS. Consider, in the example above about vertical vs horizontal scaling – I mentioned RAM and CPU power… but if you know anything about administering SQL Servers, you’re #1 hang up is actually going to be IOps (Input/Output Operations per Second). If you only have 1 15K RPM spindle hard drive, then not even 4TB or RAM and 400 CPUs will increase your performance a bit.

So, because we could argue all day about spindles vs SSD (Solid-state drives), virtual vs physical, etc. – we will just talk in terms of IOps when it comes to disk performance. So, at the time of writing this post, one 15k spindle is roughly 200 IOps. Currently, a single VHD (Virtual Hard Disk – aka, then thing that Windows Azure uses as a hard drive) yields up to 500 IOps. So, think of a VHD as a pair of 15k spindles.

With Windows Azure Virtual Machines, you can have multiple VHDs for a single VM. You will have to pay more (by bumping up the VM “size” from small, to large, or extra large, etc.) So with an extra large VM, you can have up to 17 disks (1 for the main drive, and 16 mountable VHDs).

Does mounting more drives give more performance?

I’m so glad you asked! The short answer is “no for right now, yes in the near future”. Actual practice and stress testing shows that you don’t (currently) get (much) more IOps from 1 disk to 16 disks. The reason is because of a current limitation in IaaS. Actually, to me it’s the biggest limitation and it’s the number 1 issue that must be improved before I would personally recommend IaaS to larger organizations.

The issue comes down to the current “Azure Storage SLA”. According to the service level agreement for Azure storage, a single account can only expect up to 5,000 IOps. Now, keep in mind the following points:

  1. You may likely be able to get more IOps than 5k… but Microsoft is not contractually obligated to give you any more than that.
  2. IaaS is currently in “preview” – and since the storage SLA was made long before IaaS, it is clearly incompatible.
I have spoken with a few people at Microsoft about it, and Brad Calder, the General Manager of Windows Azure Storage, said
We plan to increase these by GA, but for where we are right now for the preview, … plan on the above, and use multiple disks and storage accounts as appropriate.

I threatened to hug him – and a few other people – for that statement. This leads us to our last question:

How does SQL Server perform in the IaaS environment?

Well basically, SQL performance can be easily predicted in any environment where you know the following 3 pieces of information:

  1. What is the CPU power?
  2. How much RAM does SQL Server get?
  3. What does the storage IOps look like?

Since we know the answer to all three of those questions, we know that SQL Server in an IaaS environment will perform pretty well – like a “good” box for a small company. I recommend that you mount 16 drives, stripe 12 of them together and use it as the MDF file location, then stripe the last 4 together and use it as the LDF file location.

Here are a couple of screen shots to help prove the point. The first is a single “Small” VM running SQL Server. This is a console app that opens 100 connections and just hammers as many insert statements as it can. The box was not able to handle much more than 100 connections (due to CPU/RAM power).

Now lets see that same test (on that same VM), but this time bump up the size to “Extra Large”, split out the MDF file onto a 12-striped VHD volume and split the LDF file onto a 4-stripe VHD. Being an XL box, we get 8 times more CPU and RAM.

In conclusion – IaaS is already a good tool for typical use… it is a great and brand new tool when it comes to being able to quickly ramp up or down… and (if Microsoft follows through – as they likely will), it will be even better for the general availability release!