The Tableau Server 10 Performance Tracking Easter Egg

Click Bait

Well…not really an Easter egg per-se, you just have to know it’s there. But I made you click. I’ve been hanging around Marketing too much.

I’m a bit of a curmudgeon when it comes to running Tableau Server on a machine without good enough disk. If the Server is given some lousy 200-300 IOP storage and then is slow at scale, it’s not Server’s fault, it’s yours. Don’t be cheap. Sorry.

It’s not that difficult to track this stuff, but you have to think about it ahead of time and setup a tool like Perfmon or TabMon. You also have to know what to monitor.

In order to automatically and easily track what the “disk situation” looks like, Tableau has wisely added some instrumentation to cluster logging that collects this information in Tableau Server 10. All you have to do is know it’s there, and then use it.

So, let’s run a test…

I’m running a TabJolt load test on a home machine right now. The test has a single user calling vizzes which utilize long-running live queries. Some of these vizzes come back in a “mere” twenty seconds, some take upwards of two minutes. In essence, we’re not stressing the server – we’re just waiting on a database to answer my questions.

As you can see, there’s nothing going on as far as my disk is concerned.

Next, we’re going to use FIO to blast my disk. FIO is a great open source disk testing tool. It is going to lay down five or six big files and then begin reading/writing from/to them. You can see that change immediately:

Both the disk queue and disk latency (Avg. Disk sec/Transfer) go way up to what would be considered more than unacceptable levels. My average disk queue is about 8 right now, and that includes the dead time before I started the test:

After the test ends (leaving the disk well and truly pounded) , everything returns to normal. You can see this below:

Find the Easter Egg!

Now, it’s time to find our Easter Egg. Locate your cluster controller log:

Crack it open and note the disk performance metrics goodness:

Of course, we have this tool called “Tableau”, so reading the numbers is not good enough for us. Let’s make a viz.

First, make a copy of the log:

If you look inside the file, you’ll see that the first 60-70 lines don’t really concern our “disk spelunking” activities. They will actively confuse Tableau’s text parsing logic, so delete them and make the first line of the copied log file something which returns disk stats.

Use Tableau to open the modified log as a Text File, and you’ll see something like this:

We have some cleaning up to do, but that’s not too difficult since we can take advantage of Tableau’s Custom Split functionality. Let’s split out the date in field F1 by keying on the +0800 timezone offset value in our string. Note I include a leading space:

We’ll name the new field Date. We’ll still need to convert it TO a date later, but let’s move on.

Next, you’ll perform 5 more custom splits against the other fields. Key on the : symbol and choose to pull the Last Column.

Below I’ve just created the Reads column, and now need to do the same thing for readBytes, writes, etc.

After you’re all done, I’d suggest you hide the old “F” fields and add a data source filter to remove the NULL valued rows. They’ll only make you suffer. Switch over to the Data pane and covert your “Date” field to Date/Time. Convert your other metrics to Number (decimal) . Drag the metrics to the Metrics area of the data pane. You’re now ready to roll!

After creating a quick viz, you can very clearly see Tableau reporting “trouble at sea” in regards to disk during the time when I was running my disk test. Take a look at the data below, then scroll back up to eyeball the Perfmon screencaps and you’ll see they correlate perfectly.

Nice stuff!

(I did see something a little strange by the way – some of the writeByte numbers were actually recorded as negative values inside the cluster log. Have no idea why, and I’ll check it out next week, but just made a quick fix for now:)

So, now you have an always available, running record of whether your disk is happy or sad. This is a good think because a happy disk means a happy Server. And a happy Server means happy users. And happy users…they are everything.