3. In The Old Days …
Database
ETL Tools
Database
Data Warehouse
Database Data Integration
Tools
External
Data MDM Tools
Data Governance Tools
4. One Size Does Not Fit All
Transactional Databases
– Single digit millisecond latencies, and high
throughput
– Store data in rows
– Heavy on flash and main memory
– Indexing is very important
– High availability extremely important
5. One Size Does Not Fit All
Analytical Databases
– Single digit second latencies (and higher)
– Store data in columns
– Scale out commodity hardware
– Still need magnetic disk
– Indexing less important
– High availability less important
6. One Size Does Not Fit All
Streaming Databases
– Continuous queries
– Data flows through the system
– Network latencies are paramount
– Drop data to deal with load
7. Therefore, in my PhD years
alone …
Aurora and Borealis projects became
Streambase
C-Store project became Vertica
H-Store project became VoltDB
9. What We Have Now …
Analytical
Transactional Datamart OLAP Database Hadoop
DBMS
Reporting and High
Transactional Dashboarding Data Performance
Streaming DBMS
DBMS Warehouse Column-Store
Analytical DBMS
Web DBMS (like
Web Logs NoSQL NewSQL
MySQL)
10. What We Have Now …
Analytical
Transactional Datamart OLAP Database Hadoop
DBMS
Reporting and High
Transactional Dashboarding Data Performance
Streaming DBMS
DBMS Warehouse Column-Store
Analytical DBMS
Web DBMS (like
Web Logs NoSQL NewSQL
MySQL)
11. What We Have Now …
Analytical
Transactional Datamart OLAP Database Hadoop
DBMS
Reporting and High
Transactional Dashboarding Data Performance
Streaming DBMS
DBMS Warehouse Column-Store
Analytical DBMS
Web DBMS (like
Web Logs NoSQL NewSQL
MySQL)
12. What This Leads To…
Very little data provenance
Data silos
Non identical data copies
Not even close to a single version of the
truth
13. A Potential Way Towards a
Solution
Data Analysis Data
DBMS Streaming
(Hive, (Hstreaming
Hadoop
Hadapt) Flume)
NoSQL & Simple
Xacts & Short
Request Processing
(HBase, Brisk)
14. What this has Potential to
Enable
Fewer data silos
Increased data provenance
Reduced systems management overhead
Better resource utilization and
management
15. But we still need
Hadoop-based data integration tools
MDM and data governance tools for
Hadoop
Data provenance tracking across Hadoop
projects