Common theme: moving time, space, or processor intensive processing to Hadoop.
Flume provides ingestion of streaming data (e.g. logs) into Hadoop.
Client executesSqoop job.Sqoop interrogates DB for column names, types, etc.Based on extracted metadata, Sqoop creates source code for table class, and then kicks off MR job. This table class can be used for processing on extracted records.Sqoop by default will guess at a column for splitting data for distribution across the cluster. This can also be specified by client.
Pentaho also has integration with NoSQL DBs (Mongo, Cassandra, etc.)
Most of these tools integrate to existing data stores using the ODBC standard.
MSTR and Tableau are tested and certified now with the Cloudera driver, but other standard ODBC based tools should also work, and more integrations will be supported soon.
Also, Cloudera has implemented a solution for multi-user, which will also soon support authentication.
In memory model supports low-latency queries.
Integrating Hadoop Into the Enterprise
Integrating Hadoop into the EnterpriseJonathan SeidmanHadoop Summit 2012June 14th, 2012