The Proliferation of Database Systems and the Data Silo           Problem          @daniel_abadi      Yale University / Ha...
In The Old Days …      Database
In The Old Days …Database                   ETL ToolsDatabase                                Data WarehouseDatabase    Dat...
One Size Does Not Fit AllTransactional Databases– Single digit millisecond latencies, and high  throughput– Store data in ...
One Size Does Not Fit AllAnalytical Databases– Single digit second latencies (and higher)– Store data in columns– Scale ou...
One Size Does Not Fit AllStreaming Databases– Continuous queries– Data flows through the system– Network latencies are par...
Therefore, in my PhD years          alone …Aurora and Borealis projects becameStreambaseC-Store project became VerticaH-St...
Right Tool for the Job
What We Have Now …                      Analytical Transactional        Datamart         OLAP Database        Hadoop    DB...
What We Have Now …                      Analytical Transactional        Datamart         OLAP Database        Hadoop    DB...
What We Have Now …                      Analytical Transactional        Datamart         OLAP Database        Hadoop    DB...
What This Leads To…Very little data provenanceData silosNon identical data copiesNot even close to a single version of the...
A Potential Way Towards a         Solution  Data Analysis                            Data     DBMS                        ...
What this has Potential to           EnableFewer data silosIncreased data provenanceReduced systems management overheadBet...
But we still needHadoop-based data integration toolsMDM and data governance tools forHadoopData provenance tracking across...
Upcoming SlideShare
Loading in …5
×

Boston Hadoop Meetup, April 26 2012

1,959 views
1,846 views

Published on

Daniel Abadi presentation at the Boston Hadoop Meetup held on April 26, 2012.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,959
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
34
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Boston Hadoop Meetup, April 26 2012

  1. 1. The Proliferation of Database Systems and the Data Silo Problem @daniel_abadi Yale University / Hadapt April 26th, 2012
  2. 2. In The Old Days … Database
  3. 3. In The Old Days …Database ETL ToolsDatabase Data WarehouseDatabase Data Integration Tools External Data MDM Tools Data Governance Tools
  4. 4. One Size Does Not Fit AllTransactional Databases– Single digit millisecond latencies, and high throughput– Store data in rows– Heavy on flash and main memory– Indexing is very important– High availability extremely important
  5. 5. One Size Does Not Fit AllAnalytical Databases– Single digit second latencies (and higher)– Store data in columns– Scale out commodity hardware– Still need magnetic disk– Indexing less important– High availability less important
  6. 6. One Size Does Not Fit AllStreaming Databases– Continuous queries– Data flows through the system– Network latencies are paramount– Drop data to deal with load
  7. 7. Therefore, in my PhD years alone …Aurora and Borealis projects becameStreambaseC-Store project became VerticaH-Store project became VoltDB
  8. 8. Right Tool for the Job
  9. 9. What We Have Now … Analytical Transactional Datamart OLAP Database Hadoop DBMS Reporting and High Transactional Dashboarding Data Performance Streaming DBMS DBMS Warehouse Column-Store Analytical DBMSWeb DBMS (like Web Logs NoSQL NewSQL MySQL)
  10. 10. What We Have Now … Analytical Transactional Datamart OLAP Database Hadoop DBMS Reporting and High Transactional Dashboarding Data Performance Streaming DBMS DBMS Warehouse Column-Store Analytical DBMSWeb DBMS (like Web Logs NoSQL NewSQL MySQL)
  11. 11. What We Have Now … Analytical Transactional Datamart OLAP Database Hadoop DBMS Reporting and High Transactional Dashboarding Data Performance Streaming DBMS DBMS Warehouse Column-Store Analytical DBMSWeb DBMS (like Web Logs NoSQL NewSQL MySQL)
  12. 12. What This Leads To…Very little data provenanceData silosNon identical data copiesNot even close to a single version of thetruth
  13. 13. A Potential Way Towards a Solution Data Analysis Data DBMS Streaming (Hive, (Hstreaming Hadoop Hadapt) Flume) NoSQL & Simple Xacts & Short Request Processing (HBase, Brisk)
  14. 14. What this has Potential to EnableFewer data silosIncreased data provenanceReduced systems management overheadBetter resource utilization andmanagement
  15. 15. But we still needHadoop-based data integration toolsMDM and data governance tools forHadoopData provenance tracking across Hadoopprojects

×