Data Integration in 2013:  A working session  Adam Muise  March 26 2013Note: This deck is purposely sparse. Want value?Joi...
Proposed Agenda•   Introductions•   Discuss common Data Integration Patterns•   Round-table of User Group Member CDC/ETL U...
IntroductionsWho let you in?                               Page 3     © Hortonworks Inc. 2012
General Data Integration Patterns• Enterprise Application Integration*       – Metadata lookup       – Validation       – ...
Use Case RoundtableData that’s keeping you up at night…                                       Page 5     © Hortonworks Inc...
Scotia iTrade: Geoffrey Li                              Page 6    © Hortonworks Inc. 2012
New Data Integration SolutionsFresh Ideas to new and old problems…                                       Page 7     © Hort...
Hadoop: The Data Lake                                               Publish Event                                         ...
Streaming & Hadoophttp://developer.yahoo.com/blogs/ydn/posts/2013/02/storm-and-hadoop-convergence-of-big-data-and-low-late...
Streaming & Hadoophttp://developer.yahoo.com/blogs/ydn/posts/2013/02/storm-and-hadoop-convergence-of-big-data-and-low-late...
DataBus (LinkedIn)Databus is a low latency change capture system which has become anintegral part of LinkedIn’s data proce...
DataBus (LinkedIn) https://github.com/linkedin/databus/wiki                                            Page 12          © ...
Upcoming SlideShare
Loading in...5
×

2013 march 26_thug_etl_cdc_talking_points

588

Published on

Some diagrams for our roundtable on modern ETL/CDC with Hadoop and other new technologies

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
588
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

2013 march 26_thug_etl_cdc_talking_points

  1. 1. Data Integration in 2013: A working session Adam Muise March 26 2013Note: This deck is purposely sparse. Want value?Join the conversation in the Toronto Hadoop UserGroup:http://www.meetup.com/TorontoHUG/ © Hortonworks Inc. 2012
  2. 2. Proposed Agenda• Introductions• Discuss common Data Integration Patterns• Round-table of User Group Member CDC/ETL Use Cases• New Data Integration Solutions: A change from the Old Guard: – Hadoop and the Data Lake – Streaming (+ Hadoop) – Data Lake Governance / Management (InfoTrellis) – Databus (LinkedIn) Page 2 © Hortonworks Inc. 2012
  3. 3. IntroductionsWho let you in? Page 3 © Hortonworks Inc. 2012
  4. 4. General Data Integration Patterns• Enterprise Application Integration* – Metadata lookup – Validation – Extra-app communication• Enterprise Service Bus (SOA, Message Bus/Hub)*• Federation* – Bridging multiple databases with a query layer – Eg: Composite• Extract Transform Load (ETL)* – Collection – Aggregation – Format/Schema transformation• Data Lake – Landing Zone for multiple datasets in one store – Mixed schema, often raw structured/unstructured data – Eg: Hadoop* Source: Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture, Anthony David Giordano, 2010, IBM Press. Page 4 © Hortonworks Inc. 2012
  5. 5. Use Case RoundtableData that’s keeping you up at night… Page 5 © Hortonworks Inc. 2012
  6. 6. Scotia iTrade: Geoffrey Li Page 6 © Hortonworks Inc. 2012
  7. 7. New Data Integration SolutionsFresh Ideas to new and old problems… Page 7 © Hortonworks Inc. 2012
  8. 8. Hadoop: The Data Lake Publish Event Signal Data Transformation Model/ Transform & Apply Metadata Aggregate Publish Exchange Explore Visualize Extract & Report Load Analyze Page 8 © Hortonworks Inc. 2012
  9. 9. Streaming & Hadoophttp://developer.yahoo.com/blogs/ydn/posts/2013/02/storm-and-hadoop-convergence-of-big-data-and-low-latency-processing/ Page 9 © Hortonworks Inc. 2012
  10. 10. Streaming & Hadoophttp://developer.yahoo.com/blogs/ydn/posts/2013/02/storm-and-hadoop-convergence-of-big-data-and-low-latency-processing/ Page 10 © Hortonworks Inc. 2012
  11. 11. DataBus (LinkedIn)Databus is a low latency change capture system which has become anintegral part of LinkedIn’s data processing pipeline. Databus addresses afundamental requirement to reliably capture, flow and processes primarydata changes. Databus provides the following features: 1. Isolation between sources and consumers 2. Guaranteed in order and at least once delivery with high availability 3. Consumption from an arbitrary time point in the change stream including full bootstrap capability of the entire data. 4. Partitioned consumption 5. Source consistency preservation https://github.com/linkedin/databus/wiki Page 11 © Hortonworks Inc. 2012
  12. 12. DataBus (LinkedIn) https://github.com/linkedin/databus/wiki Page 12 © Hortonworks Inc. 2012

×