Tackling Big Data with Hadoop andOpen Source Integration                         Ciaran Dynes                         Remy...
Agenda 1. Talend’s Goal: Democratizing Integration 2. What is Big Data (integration)? 3. Big Data for the Masses: Talend’s...
Our goal
Talend – The Market Leading Unified Integration Platform                                     Talend Enterprise            ...
Who uses Talend? A high adoption rate  § 20 million downloads  § 950,000 users  § 3,500 customers                1 prod...
Trying to get from this… © Talend 2011 – Stri2y Private & Confidential © Talend 2011                                   6
to this… Why Talend… ONLY Talend generates code that is executed within map reduce. This open approach removes the limitat...
Big data is….                                          Hans Rosling – uses big data to analyze world health trends     Key...
Big Data = Transactions + Interactions + Observations                                                       Sensors/RFID/D...
What is Big Data integration?
Traditional Data Flows          CRM                                                 ETL                                   ...
The new world of big data                                                             Social                              ...
The new world of big data                                                              Social                             ...
The new world of big data                                                              Social                             ...
The new world of big data                                                               Social                            ...
Key Takeaway #2                 Forces us to think© Talend 2011                 differently© Talend 2011 – Stri2y Private ...
But for Talend…. Big data is…                …everything that is old, is new again!© Talend 2011 – Stri2y Private & Confid...
Data driven business                            enables          data            governance                               ...
BIG data driven business                            enables     BIG data             governance                           ...
“Big Data for the Masses”
Goal: Democratize Big Data                                                 Talend Open Studio for Big Data                ...
Let us show you…© Talend 2012
Where to next?© Talend 2012
How is big data integration being used? Use Cases •     Recommendation Engine •     Sentiment Analysis •     Risk Modeling...
Poor Data Quality + Big Data = Big ProblemsPoor Data Quality * Big Data = Big Problems^2           Key Takeaway #3        ...
Two methods for inserting data quality into a big data job 1.  Pipelining: as part of the load process 2.  Load the cluste...
E-T-L - Load      Extract – Transform© Talend 2011 – Stri2y Private & Confidential© Talend 2011                           ...
E- DQ -L      Extract – Improve/Cleanse - Load© Talend 2011 – Stri2y Private & Confidential© Talend 2011                  ...
Pipelining: data quality with big data               CRM                                                DQ               E...
Big data alternative: Load and improve within the cluster               CRM                                               ...
big2012         data                                   now   Q4   2013Talend Open Studio for Big Data¾ Packaged within Ho...
Thanks for attending
Sessions will resume at 11:25am                             Page 33
Upcoming SlideShare
Loading in...5
×

Tackling big data with hadoop and open source integration

1,831

Published on

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,831
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

Transcript of "Tackling big data with hadoop and open source integration"

  1. 1. Tackling Big Data with Hadoop andOpen Source Integration Ciaran Dynes Remy Dubois
  2. 2. Agenda 1. Talend’s Goal: Democratizing Integration 2. What is Big Data (integration)? 3. Big Data for the Masses: Talend’s strategy and vision© Talend 2011 2
  3. 3. Our goal
  4. 4. Talend – The Market Leading Unified Integration Platform Talend Enterprise Data Data MDM ESB BPM Quality Integration ¾  Commercial license ¾  Subscription model Studio Repository Deployment Execution Monitoring ¾  Open source license Talend Open Studio for ¾  Free of charge ¾  Optional support Data Data Quality Integration MDM ESBRecognized as the open source leader in each of its market category by all industry analysts© Talend 2011 4
  5. 5. Who uses Talend? A high adoption rate § 20 million downloads § 950,000 users § 3,500 customers 1 product download 150 new customers every 30 seconds per month© Talend 2011 5
  6. 6. Trying to get from this… © Talend 2011 – Stri2y Private & Confidential © Talend 2011 6
  7. 7. to this… Why Talend… ONLY Talend generates code that is executed within map reduce. This open approach removes the limitation of a proprietary “engine” to provide a truly unique and powerful set of tools for big data.
  8. 8. Big data is…. Hans Rosling – uses big data to analyze world health trends Key Takeaway #1 transactions, interactions, observations© Talend 2011 – Stri2y Private & Confidential© Talend 2011 8
  9. 9. Big Data = Transactions + Interactions + Observations Sensors/RFID/Devices User Generated Content Big Data Mega, Giga, Tera, Peta bytes Sentiment Social Interactions & Feeds Mobile Web Spatial & GPS coordinates User Clicks External Demographics Web logs WEB Business Data Feeds Offer history A/B testing Video, Audio, Images Dynamic pricing SMS/MMS CRM Segmentation Affiliate Networks Search Marketing ERP Offer details Purchase detail Customer Touchpoints Behavioral Targeting Purchase record Support Contacts Dynamic Funnels Payment record Increasing Data Variety and Complexity Source: Hortonworks© Talend 2011 – Stri2y Private & Confidential© Talend 2011 9
  10. 10. What is Big Data integration?
  11. 11. Traditional Data Flows CRM ETL Normalized Traditional Data ERP Data Data Warehouse Quality Finance •  Scheduled–daily or weekly, sometimes more frequently. Business Business Analyst User •  Volumes rarely exceed terabytes Warehouse Administrator Executives© Talend 2011 – Stri2y Private & Confidential© Talend 2011 11
  12. 12. The new world of big data Social Networking CRM ERP Big Data Finance© Talend 2011 – Stri2y Private & Confidential© Talend 2011 12
  13. 13. The new world of big data Social Networking CRM Mobile Devices ERP Big Data Finance© Talend 2011 – Stri2y Private & Confidential© Talend 2011 13
  14. 14. The new world of big data Social Networking CRM Mobile Devices ERP Transactions Finance Big Data© Talend 2011 – Stri2y Private & Confidential© Talend 2011 14
  15. 15. The new world of big data Social Networking CRM Mobile Devices ERP Transactions Finance Network Devices Big Data Sensors© Talend 2011 – Stri2y Private & Confidential© Talend 2011 15
  16. 16. Key Takeaway #2 Forces us to think© Talend 2011 differently© Talend 2011 – Stri2y Private & Confidential 16
  17. 17. But for Talend…. Big data is… …everything that is old, is new again!© Talend 2011 – Stri2y Private & Confidential© Talend 2011 17
  18. 18. Data driven business enables data governance supports information decisions drives Information provides value to the business If you cant rely on your information then Your the result can be missed opportunities, or business higher costs. Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE).© Talend 2011 – Stri2y Private & Confidential© Talend 2011 18
  19. 19. BIG data driven business enables BIG data governance supports BIG BIG information decisions drives Information provides value to the business If you cant rely on your information then the result can be missed opportunities, or BIG higher costs. business Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE).© Talend 2011 – Stri2y Private & Confidential© Talend 2011 19
  20. 20. “Big Data for the Masses”
  21. 21. Goal: Democratize Big Data Talend Open Studio for Big Data ¾  “Big Data for the Masses” ¾  Improves efficiency of big data job design with graphic interface ¾  Abstracts and generates code ¾  Run transforms inside Hadoop Pig ¾  Native support for HDFS, Pig, HBase, Sqoop and Hive ¾  Apache License 2.0 ¾  Embedded in Hortonworks Data …an open source Platform ecosystem© Talend 2011 – Stri2y Private & Confidential© Talend 2011 21
  22. 22. Let us show you…© Talend 2012
  23. 23. Where to next?© Talend 2012
  24. 24. How is big data integration being used? Use Cases •  Recommendation Engine •  Sentiment Analysis •  Risk Modeling •  Fraud Detection •  Marketing Campaign Analysis •  Customer Churn Analysis •  Social Graph Analysis •  Customer Experience Analytics •  Network Monitoring •  Research And Development BUT: to what level is DQ required for your use case?© Talend 2011 – Stri2y Private & Confidential© Talend 2011 24
  25. 25. Poor Data Quality + Big Data = Big ProblemsPoor Data Quality * Big Data = Big Problems^2 Key Takeaway #3 In big data… poor data quality can be magnified at huge scale© Talend 2011 25
  26. 26. Two methods for inserting data quality into a big data job 1.  Pipelining: as part of the load process 2.  Load the cluster than implement and execute a data quality map reduce job© Talend 2011 26
  27. 27. E-T-L - Load Extract – Transform© Talend 2011 – Stri2y Private & Confidential© Talend 2011 27
  28. 28. E- DQ -L Extract – Improve/Cleanse - Load© Talend 2011 – Stri2y Private & Confidential© Talend 2011 28
  29. 29. Pipelining: data quality with big data CRM DQ ERP DQ Finance Big Data Social Networking •  Use traditional data quality tools •  No new programming, no PHDs •  Once and done Mobile Devices© Talend 2011 – Stri2y Private & Confidential© Talend 2011 29
  30. 30. Big data alternative: Load and improve within the cluster CRM DQ ERP DQ Finance Big Data Social Networking •  Load first, improve later •  Really complex to build, limited tools •  Constant on, increments Mobile Devices •  Insane performance© Talend 2011 – Stri2y Private & Confidential© Talend 2011 30
  31. 31. big2012 data now Q4 2013Talend Open Studio for Big Data¾ Packaged within Hortonworks Data Platform …Eclipse tools for HIVE, HDFS, PIG, SCOOP …supports Oozie, Hcatalog, Kerberos¾ Free to download and use under the Apache license …democratizing big data through intuitive tools© Talend 2011 – Stri2y Private & Confidential© Talend 2011 31
  32. 32. Thanks for attending
  33. 33. Sessions will resume at 11:25am Page 33

×