Jon Roberts about the use cases for Greenplum and HAWQ, both technologies offered by Pivotal:
Greenplum is a robust MPP database that works very well for Data Marts and Enterprise Data Warehouses that tackles historical Business Intelligence reporting as well as predictive analytical use cases. HAWQ provides the most robust SQL interface for Hadoop and can tackle data exploration and transformation in HDFS.
First questions that popped in my mind:
why isn’t HAWQ good for reporting?
why isn’t HAWQ good for predictive analytics?
I don’t have a good answer for any of these. For the first, I assume that the implied answer is Hadoop’s latency. On the other hand, what I know is that Microsoft and Hortonworks are trying to bring Hadoop data into Excel with HDInsight. This is not traditional reporting, but if that’s acceptable from a latency point of view, I’m not sure why it wouldn’t work for reporting too.
For the second question, Hadoop and the tools built around it are well known for predictive analytics. So maybe this separation is due only to HAWQ. Another explanation could be product positioning.
This last part seems to be confirmed by the rest of the post which is making the point that data stored in HDFS is temporary and once it is processed with HAWQ it is moved into Greenplum.
In other words, HAWQ is just for ETL/ELT on Hadoop.
✚ I’m pretty sure that many traditional data warehouse companies that are forced to come up with coherent proposals for architectures based on their core products and Hadoop are facing the same product positioning problem — it’s difficult to accept in front of the customers that Hadoop might be capable to replace core functionality of the products you are selling.
What is the best answer to this positioning dilemma?
Find a spot for Hadoop that is not hurting your core products. Let’s say ETL.
Propose an architecture where your core products and Hadoop are fully complementing and interacting with each other.