Tableau Server on Windows Azure – WWRBD? He’d want to go fast. Let’s see how to get there on Windows Azure.
First, I’m assuming you’ve worked through both of the following articles which focus on Tableau and EC2. I’m too lazy to re-cap and you need the basics:
Some hard cold facts
The next few paragraphs may make you believe I’m an EC2 fan-boy. Not so. Only interested in going fast.
Much of the testing I’ve done on EC2 was with C3 XLarge and 2XLarge instances. These offer e5-2680 processors @ 2.8 GHZ
Both Azure A4 & A6 instances will be deployed with either AMD 4171 HE cores @ 2.09 GHZ, or Xeon e5-2660 cores @ 2.2 GHZ. Just stopping and starting your machine can cause different brands/speeds of cores to appear in machine properties.
Only the A8 and A9 Azure instances run higher-end processors: e5-2670s @ 2.6 GHZ. These instances cost 3x (plus) as much as the A4 and A6 instances. I wanna go fast, but I can’t bring myself to pay $2.5/hour when I can get something on EC2 with “good enough” RAM and MORE CPU (e5-2680s at 2.8 GHZ) at between $0.76 – $1.06.
The A8 and A9 instances are currently only available via the Western Europe region of Azure AND they haven’t been rolled out to support Virtual Machines yet. In the short term you can’t have the “best stuff” in terms of CPU.
Having enough IOPS is critical to good performance on disk-centric workloads. This is pretty simple to do in EC2 with EBS volumes and provisioned IOPS. Azure approaches this problem differently. From what I can see, most Azure storage gives you 500 IOPS per-disk, period. To get the throughput you need, you must create striped volumes across many “disks”. You can do the same thing in EC2, of course – but I found it often wasn’t necessary. In Azure you don’t have a choice.
I tested Tableau Server 8.1.6 and 8.2 Beta 3 on the following instances:
- A4 (8 cores, 14 GB RAM)
- A6 (4 cores, 28 GB RAM)
You can read about these instances here. I ran Windows Server 2008 R2 SP1 across the board because I’m sick of looking at Metro tiles right now.
Most of the work I did revolved around disk configuration:
- Tableau on the System Disk (don’t do this!)
- Tableau on a single dedicated disk
- Tableau on a 2-disk striped volume (read / write caching on)
- Tableau on a 4-disk striped volume (read / write caching on)
- Tableau on a 12-disk striped volume (caching off)
Let’s begin with the stink. Earlier, I complained about how slow Tableau Server was on a 300 IOP EC2 machine. The throughput looked like this:
Here’s what the 500 IOP disk performance on Azure looks like:
These numbers look pretty good in theory, but in practice they weren’t. The great read speeds are there because read / write host caching is automatically “on” for the system disk in Azure (You can read about host caching here under the section “Windows Azure Virtual Machine Disk Cache”).
Back in the real world, I couldn’t successfully publish a large (3 GB) Tableau extract to Tableau Server running on this machine because the disk response time was so poor. At one point I had 32,000 ms response times (yes, 32+ second waits for a request to the disk).
In my opinion, installing Tableau Server to C: is only good if you have very low expectations and/or small-ish extracts. Otherwise…
So, let’s try a stand-alone disk. Here’s a volume which is comprised of one disk on the same machine:
Caching is not on so the R/W rates look low compared to C:. In the real world, I saw little-to-no improvement in rendering speeds. (I’ll show you the numbers later). I didn’t bother to turn caching “on” for this drive since I saw relatively poor performance on C: with the same configuration. Probably should have tried it though. Sorry.
Final judgement on a single, uncached drive?:
Next, lets stripe our E: volume across two disks. We’ll also turn Read / Write host caching on:
These numbers look similar to what we saw on the single, cached system partition (C:), but render performance was much improved.
- Turning on caching: Good
- Striping: Good
If two disks are good, four disks must be very good, right?
Again, this looks pretty similar to what we’ve seen on C: and 2-disk striped E. However, our write is better across more block sizes. There was also a moderately positive difference in rendering speed (more on that later).
…and what about 12 disks?
If 2 disks are good and 4 disks are very good, 12 disks must be super-duper awesome!
I tried 8 disks as well and saw the same thing, which surprised me to no end. With both 8 and 12 disks, throughput dropped and render time increased.
It appears that the magic here is a combination of striped disks and host caching working together. Azure only allows you to turn caching on for a maximum of 4 disks on your VM. Adding non-caching disks to the striped set brought the performance of the whole volume down to that of the worst performer. Makes sense, I guess – it’s a striped set, after all – one or more poorly performing disks makes everyone wait.
I also ran a little experiment in which I turned caching OFF on my 4-disk stripe. Performance immediately dropped to what you see above, with render times rising. When I turned it back again, render times dropped, throughput increased.
Big takeaway: While “massive disk striping” may work to give applications like SQL Server better IO, the same technique doesn’t help Tableau Server. Don’t know why. Stick to 4 striped disks as a single volume with caching turned on.
And the temp disk…
Finally, there’s the temporary storage that you get on D:- this is a disk that is completely wiped anytime you restart or shut down your machine…but it’s FAST:
Read and write are both awesome here. This volume is used to host your OS’s page file, among other things. As a matter of course I switched the following system & user environment variables to point to this drive:
- APPDATA (Where Tableau desktop unzips twbx files and drops temp files – Must restart the OS before this takes effect)
You’re not going to want to install Tableau Server to this drive in a production environment, but use it as much as you can for other stuff. For kicks I actually measured render performance on this drive, which you can explore below.
SHOW ME THE NUMBERS!!!
Here’s a viz which combines earlier work I did with EC2, new Azure report runs, and a few more EC2 runs I did for giggles:
- I tested by cycling the OS itself (to unload extracts from RAM), then ran a report (“initial load”). Afterwards I logged in as a different user and ran the same report (“cached report”), then restarted Tableau (but not the OS) and ran the same report a 3rd time (“uncached report”). I repeated this process 2-3 times for each report / machine / disk config combination.
- 10,000 IOPS is an arbitrary number I stuck in as a placeholder for EWS instance and Azure temporary storage. It’s meaningless other than the fact it is “big”.
- Even when I striped disks, I kept the IOP values column in the report above the same. For example, striping 4 x 500 IOP disks should give us a 2000 IOP volume (in theory). I ignored this and recorded the IOPS that the individual disk delivers (500, in this case).
- I mentioned that Azure somewhat arbitrarily can load up your instance with either AMD or Intel Xeon cores at slightly different speeds each time you stop/start the machine. Because of that the screenshots of the vizzes below may not actually completely match the “live” viz above. Sorry.
The first thing I noticed is the (fairly wide) variance between render times for the same report on the same hardware inside an Azure VM. The initial load of the extract into memory takes longer on Azure than on EC2 storage, even when we should be getting about the same number of IOPS.
I wonder if this has something to do with it? (From the white paper, Performance Guide for SQL Server in Windows Azure – page 7):
With Windows Azure disks, we have observed a “warm-up effect” that can result in a reduced rate of throughput and bandwidth for a short period of time. In situations where a data disk is not accessed for a period of time (approximately 20 minutes), adaptive partitioning and load balancing mechanisms kick in. If the disk is accessed while these algorithms are active, you may notice some degradation in throughput and bandwidth for a short period of time (approximately 10 minutes), after which they return to their normal levels. This warm-up effect happens because of the adaptive partitioning and load balancing mechanism of Windows Azure, which dynamically adjusts to workload changes in a multi-tenant storage environment.
This appears to be a “use it or lose it” mechanism for disk. Perhaps it is getting in the way of delivering high IO consistently?
The variance might also have to do with the fact that sometimes Azure gave me AMD 2.09 GHZ cores on my test rigs and other times I got Xeon 2.2 GHZ cores.
That said, nearly all of the right-most outliers represent long loads of the extract into RAM at the front end of the rendering process. These only occur during “initial load” report runs. If we were somehow able to make those go away, things start to look a bit tighter. Try it yourself by lasso-selecting the 114 second+ render marks and excluding them. You’ll see that EC2 still has less variance, but Azure is starting to pull even.
Don’t get too excited about those nice clumps of tight report renders when using the temporary disk on Azure, by the way:
When using the Temporary disks, one can’t restart the OS without losing everything on the disk. As a result, I could never test a “virgin load” of the extract from disk as in the other scenarios. If I could, we’d some longer loads times here, too.
Here’s an interesting find. Note how each time we’re adding better disk throughput render time goes down…and when we have the same disk but use more cores, render time goes down. This makes sense to me:
This pattern is gobsmacked when looking at 8 cores on Server 8.1, however. Those E: renders should occur faster than C:, yet they don’t. As expected, a striped E: volume with two disks performs better than the single E: drive (and the 4-drive striped E: tested under 8.1 does better than the 2-drive striped EL I tested in 8.2 beta 3)
I couldn’t help myself…let’s do a shoot-out.
I also thought it would be fun to try and compare Tableau on hardware that was as similar as possible. In this case, it was an EC2 C3.2XLarge instance (8-core e-2670 v2 2.5 GHZ processor, 30 GB of RAM) – it was the closest match for the Azure A4 in terms of CPU (8-cores at between 2.09 and 2.2 GHZ) with 14 GB of RAM). The EC2 machine has more RAM, but both machines had tons of headroom when it came to free memory, so I’m not worrying about it.
Here’s what things look like at a high level:
I watched reads very closely during the “initial load” runs of the big report. Here’s what I saw as the extract was being loaded into RAM:
Azure: ~15 MB/sec peak, ~ 11 MB/sec sustained read
EC2: ~60 MB/sec peak, ~50 MB/sec sustained read
With numbers like that, it’s no wonder we see EC2 delivering the report much more quickly – we’re able to get the TDEngine working much sooner on EC2.
On a lark, I also turned disk caching off on Azure to see what would happen. Sustained read dropped to between 3-4 MB/sec. I didn’t include those runs in my results as they were similar to running on C: or unstriped E:.
What happens if you remove those initial “slow loading” reports, however?
Now things begin to look somewhat similar. EC2 is faster, but Azure is looking good once you get past the first load of the extract from disk.
When all is said and done, Azure can be a good alternative to EC2 once you get around the issues with “first load”. If you put a little bit more skin in the game, you can probably come up with a mechanism that runs certain reports (or just loads certain extracts into RAM) ahead of time…then problem solved.
So, let’s summarize:
- Currently, you can’t get the highest-of-high compute for VM roles on Azure
- EC2 storage is generally easier to configure and returns better results for Tableau Server
- Use disk striping and host caching on Azure to bump up your performance
- Adding more than 4 disks to a stripe appears to be counter-productive, at least where Tableau Server is concerned.
- Move your TEMP, TMP and APPDATA folder to Azure’s temporary drive
To close, here’s another way to look at the data above. Because, Tableau: