Comparing Tableau Server performance on EC2: The c3 vs. c4 bake-off.

The Intro

The AWS EC2 c4 instance type has been around for a while now, and I finally got some time to compare the performance of Tableau Server running on a c3 vs. c4.

Of course, I used this as an excuse to leverage TabJolt – I have a huge TabJolt crush right now.

I’m not going to write tons about high-level differences between the c3 and c4 because you can do this sort of due diligence stuff yourself.

That said, if you want to “go deep”, here are two great reads:

From a Tableau Server point of view, here are the big benefits of the c4 as I see them:

Better/faster CPU – Tableau Server bottlenecks on CPU, so this is important
EBS Optimized by default

EBS allows you to give your machine provisioned IOPS, and guaranteed disk throughput is a good thing when it comes to Tableau Server. For the c3 you pay a bit extra per hour to add this capability:

By the time you pay extra for EBS on a c3, you’re already paying more per hour than you would on a c4…so this is a no brainer as far as I’m concerned.

The test I couldn’t do

The very first thing I wanted to do was test each of the vizzes in my workload (defined here) by running them one at a time on a “16 core” c3 or c4 machine. I wanted to see what came back faster.

The tests I’ve done in the past point to the fact that while dealing with a relatively low number of concurrent users, “the bigger the box, the faster the viz returns”. After about 30-40+ concurrent users this is no longer the case.

Running this test proved to be impossible. The c3.8xlarge gives me the equivalent of 16 physical cores, but the c4.8xlarge delivers 36 vCPUs…or about 18 cores of goodness. There was no way I could do an apples-and-apples comparison on a “big box”.

So, I scrapped the “How fast can this viz run in a vacuum” angle and instead just started doing load testing with two configurations:

2 Instances with 8 cores each = 16 cores
4 Instances with 4 cores each = 16 cores

These two configs match the 2 x (8 Cores) v2 and 4 x (4 Cores) v1 implementations I detailed at the end of this blog entry.

For each of the two setups above I executed a light and heavy load test. I did so ramping from 10 to 260 concurrent users @10 users at a time every 10 minutes….just like I’ve been doing with TabJolt over the past month or so.

All I did during my tests was switch the instance type from c3 to c4 and back again. I used the exact same disk volumes with the same EBS provisioned IOPS (1500).

Top Line Results

(Standard disclaimer: These are my results with my specific workload. Yours will be different. Use this information to begin informing your decisions…but ultimately you need to test this stuff yourself.)

No surprises: c4 better. c3 worser.

Based on the type of workload and the configuration I was running, I saw a minimum 10% increase in tests per second (TPS) versus the c3. At the high end, my c4s delivered 22%+ more TPS. I observed a lower overall error rate in my tests on the c4 configurations as well.

Lets’ go to the tape and start with the light load tests.

In the images below, I refer to the c4 machines by their processor family – the Haswell V3. So, V3 = a c4 instance type and V2 = the older Ivy Bridge V2 processors used by a c3 instance. Got it?

The Intro

The test I couldn’t do

Top Line Results

Let’s get Heavy

Share this:

Related

Related Posts

Submit a comment Cancel reply