Greatly Increase the Performance of Azure Storage CloudBlobClient

Windows Azure Storage boasts some very impressive transactions-per-second and general throughput numbers. But in your own applications you may find that blob storage, tables and queues all perform much slower than you’d like. This post will teach you one simple trick that literally increased the throughput of my application over 50 times.

The fix is very simple, and only a few lines of code – but I’m not just going to give it away so easily. You need to understand why this is a “fix”. You need to understand what is happening under the hood when you are using anything to do with the Windows Azure API calls. And finally, you need to suffer a little pain like I did – so that’s the primary reason why I’m making you wait. :)

The Problem – Windows Azure uses a REST based API

At first glance, this may not seem like a throughput problem. In fact, if you’re a purist, you likely have already judged me a fool for the above statement. But hear me out on this one. If someone made a REST-based API, then it is very likely that a web-browser would be a client application that would consume this service. Now, what is one issue that web-browsers have by default when it comes to consuming web services from a single domain?

“Ah!” If you are a strong web developer – or you architect many web-based solutions,  you probably have just figured out the issue and are no longer reading this blog post. However, for the sake of completeness, I will continue.

Seeing that uploading a stream to a blob is a semi-lengthy and IO-bound procedure, I thought to just bump up the number of threads. The performance increased only a little, and that led me to my next question.

Why is the CloudBlobClient slow even if I increase threads?

At first I assumed that I had simply hit the limit of throughput on an Azure Blob Container. I was getting about 10 blobs per second, and thought that I probably just need to create more containers – “perhaps it’s a partitioning issue.”

This didn’t feel right because Azure Blobs are supposed to partition based on “container + blob-name”, and not just on container alone… but I was desperate. So, I created 10 containers and ran the test again. This time more threads, more containers… the result? Zero improvement. The throughput was the exact same.

Then it hit me. I decided to do a test that “shouldn’t” make a difference – but it’s one that I’ve done before in the past to prove that I’m not crazy (or in some cases, to prove that I am). I ran my console app program many times. The results were strange. One application was getting about 10 inserts per second – but 3 applications were getting 10 each. This means that my computer, my network and the Azure Storage Service was able to process far more than my one console application was doing!

This proved my hunch that “something” was throttling my application. But what could it be? My code was super simple:

while (true)
	// Create a random blob name.
	string blobName = string.Format("test-{0}.txt", Guid.NewGuid());
	// Get a reference to the blob storage system.
	var blobReference = blobContainer.GetBlockBlobReference(blobName);
	// Upload the word "hello" from a Memory Stream.
	// Increment my stat-counter.
	Interlocked.Increment(ref count);

That’s when it hit me! My code is simple because I’m relying on other people who wrote code, in this case the Windows Azure Storage team! They, in turn, are relying on other people who wrote code… in their case the .Net Framework team!

So you might ask, “What functionality are they using that is so significant to the performance of their API?” That question leads us to the our final segment.

Putting it All Together – Getting More Throughput in Azure Storage

As was mentioned before, the Azure Storage system uses a REST (HTTP-based) API. As was also mentioned, the developers on the storage team used functionality that already existed in the .Net Framework to create web requests to call their API. That class – the WebRequest (or HttpWebRequest) class in particular was where our performance throttling was happening.

By default, a web browser – or in this case any .Net application that uses the System.Net.WebRequest class – will only allow up to 2 simultaneous threads at a time per host domain.

So no matter how many threads I added in my application code, ultimately I was being funneled back into a 2-thread-maximum bottleneck. Once I proved that out, all I had to do was add this simple configuration bit to my App.config file:

<?xml version="1.0" encoding="utf-8" ?>
			<add address="*" maxconnection="1000" />

Now my application inserts 50 times more than it used to: