The Challenge
For those running cloud databases in AWS, a common enough design pattern is to deploy the database to a private subnet of your VPC or set the Publicly Available option to “No”.
When you go this route, life gets a little bit more difficult from a Tableau point of view. A standard solution for connectivity is to use a combination of SSH tunneling and port forwarding to connect to that private database of yours. A good example of this technique can be found on the Periscope blog.
Publishing can be somewhat tricky too. If you “own” the Tableau Server and can run a permanent-ish SSH session against a bastion in your public subnet, you’ll be in good shape. What about Tableau Online, though?
No joy. Tableau Online can’t do SSH tunneling to an arbitrary SSH server on your behalf.
The Solution
So what’s a guy to do?
If you want to connect to your “private” data sources with Online, you’ll need to lean on a proxy. What follows is a pretty straight-forward walk through of connecting to an RDS MySQL database sitting on a private subnet. I’ll be using an open source proxy server called HAProxy to do so. Here’s what we’re going to build:
I’m going to assume a couple things:
- You know a little bit about AWS
- If you’re already using private subnets, then you know what a public subnet is and how to route traffic between them using a NAT and route tables
- You’re not afraid of a tiny bit of Linux
As I mentioned, we’re not going to build out your network infrastructure here. You should have a public subnet and a private subnet, and resources in each should be able to talk.
In my scenario, I’m running a small instance of MySQL in a private subnet, and it is not publicly accessible.
Steps
We need a instance running Linux – it’ll host HAProxy:
Remember that network throughput is fairly important here. Make sure you choose an instance with good enough network performance. I’m frankly not sure if a t2 qualifies as “good enough” since I don’t know your needs. You’ll need to experiment a bit. I went with a t2.medium for no real reason.
Yes, I just did the opposite of what I advised you to by not using a size with Moderate and/or High network performance. Do as I say, not as a I do.
As I configure the machine, note that I’m using a custom VPC and that I’m choosing a public subnet and auto-assigning a public ip:
Since this instance is going to act as a proxy for MySQL, I’ll need to be able to receive requests on port 3306. If you use a different database, you’ll replace 3306 with the appropriate port.
Modify your security group and open up that port. Below I’m being very lazy by choosing “Anywhere” (0.0.0.0/0). You should choose to accept connections only from Tableau Online and other trusted clients…so be a bit more selective than I am.
Laziness Alert #1
Note that I haven’t bothered to create and associate an Elastic IP Address with this instance. If I were going to use this solution long term, I probably would since the public IP address (and therefore the public DNS hostname) could change if I stop/start the instance. That would play havoc with my Tableau data source as I’d essentially be changing the name of the database host.
After the machine is live, we’ll connect to it via SSH:
Next, update and upgrade:
Now, we install HAProxy:
By default, the proxy capability of this service is disabled. We need to enable it. To do so, we’ll use the nano text editor to modify haproxy and change ENABLED from 0 to 1:
After you’ve made your changes CTRL-X to save.
Next, we need to edit the config file for HAProxy. Open /etc/haproxy/haproxy.cfg with nano.
The unmodified file will look like what you see below. Delete everything you see and add the text under the screenshot.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
global | |
log /dev/log local0 | |
log /dev/log local1 notice | |
chroot /var/lib/haproxy | |
maxconn 4000 | |
user haproxy | |
group haproxy | |
daemon | |
stats socket /var/lib/haproxy/stats mode 777 | |
listen MySQL 0.0.0.0:3306 | |
timeout connect 10s | |
timeout client 1m | |
timeout server 1m | |
mode tcp | |
server singleMySQL proxymenow.redacted.us-east-1.rds.amazonaws.com:3306 |
The config file instructs HAProxy to listen for connections on 3306 and proxy them to the MySQL endpoint. You will modify the the port to listen on (line 12) and the endpoint:port to forward to (line 17) based on the database type you’re dealing with and the values you see in your RDS/Redshift console.
CTRL-X to save.
After making configuration changes, we need to restart the service. Pay no attention to me forgetting to use sudo 🙂
Laziness Alert #2
Haproxy will not autostart if the machine is bounced. I should probably follow some guidance from here to make sure that happens. But since this is just a POC, I won’t bother.
So now the proxy is listening and should forward to our MySQL instance. Before we try using Tableau over the public internet, we should probably make sure that we can even connect to MySQL from this box. Install the MySQL Client:
Then, let’s point the MySQL client directly at MySQL:
Works. If I wanted to, I could now test things by trying to connect to the proxy and make sure the request gets forwarded…but I want to swing for the fences.
Now, for the fun bit. Let’s point Tableau directly at the DNS name of the EC2 instance running HAProxy:
We’ll treat this instance essentially like it’s actually running MySQL itself:
Laziness Alert #3
SSL.. We should be using it here (after all, the database is probably private for a reason and we’re giving up our tunnel). I’m not because threw up my RDS without thinking.
…and voila:
…we have data. We can create a viz, push it to Tableau Online, and get a live connection to our “private” RDS instance:
Here’s my connection sitting in Online, all ready to be used 🙂