Tackling big data with hadoop and open source integration

  • 1,681 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,681
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Tackling Big Data with Hadoop andOpen Source Integration Ciaran Dynes Remy Dubois
  • 2. Agenda 1. Talend’s Goal: Democratizing Integration 2. What is Big Data (integration)? 3. Big Data for the Masses: Talend’s strategy and vision© Talend 2011 2
  • 3. Our goal
  • 4. Talend – The Market Leading Unified Integration Platform Talend Enterprise Data Data MDM ESB BPM Quality Integration ¾  Commercial license ¾  Subscription model Studio Repository Deployment Execution Monitoring ¾  Open source license Talend Open Studio for ¾  Free of charge ¾  Optional support Data Data Quality Integration MDM ESBRecognized as the open source leader in each of its market category by all industry analysts© Talend 2011 4
  • 5. Who uses Talend? A high adoption rate § 20 million downloads § 950,000 users § 3,500 customers 1 product download 150 new customers every 30 seconds per month© Talend 2011 5
  • 6. Trying to get from this… © Talend 2011 – Stri2y Private & Confidential © Talend 2011 6
  • 7. to this… Why Talend… ONLY Talend generates code that is executed within map reduce. This open approach removes the limitation of a proprietary “engine” to provide a truly unique and powerful set of tools for big data.
  • 8. Big data is…. Hans Rosling – uses big data to analyze world health trends Key Takeaway #1 transactions, interactions, observations© Talend 2011 – Stri2y Private & Confidential© Talend 2011 8
  • 9. Big Data = Transactions + Interactions + Observations Sensors/RFID/Devices User Generated Content Big Data Mega, Giga, Tera, Peta bytes Sentiment Social Interactions & Feeds Mobile Web Spatial & GPS coordinates User Clicks External Demographics Web logs WEB Business Data Feeds Offer history A/B testing Video, Audio, Images Dynamic pricing SMS/MMS CRM Segmentation Affiliate Networks Search Marketing ERP Offer details Purchase detail Customer Touchpoints Behavioral Targeting Purchase record Support Contacts Dynamic Funnels Payment record Increasing Data Variety and Complexity Source: Hortonworks© Talend 2011 – Stri2y Private & Confidential© Talend 2011 9
  • 10. What is Big Data integration?
  • 11. Traditional Data Flows CRM ETL Normalized Traditional Data ERP Data Data Warehouse Quality Finance •  Scheduled–daily or weekly, sometimes more frequently. Business Business Analyst User •  Volumes rarely exceed terabytes Warehouse Administrator Executives© Talend 2011 – Stri2y Private & Confidential© Talend 2011 11
  • 12. The new world of big data Social Networking CRM ERP Big Data Finance© Talend 2011 – Stri2y Private & Confidential© Talend 2011 12
  • 13. The new world of big data Social Networking CRM Mobile Devices ERP Big Data Finance© Talend 2011 – Stri2y Private & Confidential© Talend 2011 13
  • 14. The new world of big data Social Networking CRM Mobile Devices ERP Transactions Finance Big Data© Talend 2011 – Stri2y Private & Confidential© Talend 2011 14
  • 15. The new world of big data Social Networking CRM Mobile Devices ERP Transactions Finance Network Devices Big Data Sensors© Talend 2011 – Stri2y Private & Confidential© Talend 2011 15
  • 16. Key Takeaway #2 Forces us to think© Talend 2011 differently© Talend 2011 – Stri2y Private & Confidential 16
  • 17. But for Talend…. Big data is… …everything that is old, is new again!© Talend 2011 – Stri2y Private & Confidential© Talend 2011 17
  • 18. Data driven business enables data governance supports information decisions drives Information provides value to the business If you cant rely on your information then Your the result can be missed opportunities, or business higher costs. Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE).© Talend 2011 – Stri2y Private & Confidential© Talend 2011 18
  • 19. BIG data driven business enables BIG data governance supports BIG BIG information decisions drives Information provides value to the business If you cant rely on your information then the result can be missed opportunities, or BIG higher costs. business Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE).© Talend 2011 – Stri2y Private & Confidential© Talend 2011 19
  • 20. “Big Data for the Masses”
  • 21. Goal: Democratize Big Data Talend Open Studio for Big Data ¾  “Big Data for the Masses” ¾  Improves efficiency of big data job design with graphic interface ¾  Abstracts and generates code ¾  Run transforms inside Hadoop Pig ¾  Native support for HDFS, Pig, HBase, Sqoop and Hive ¾  Apache License 2.0 ¾  Embedded in Hortonworks Data …an open source Platform ecosystem© Talend 2011 – Stri2y Private & Confidential© Talend 2011 21
  • 22. Let us show you…© Talend 2012
  • 23. Where to next?© Talend 2012
  • 24. How is big data integration being used? Use Cases •  Recommendation Engine •  Sentiment Analysis •  Risk Modeling •  Fraud Detection •  Marketing Campaign Analysis •  Customer Churn Analysis •  Social Graph Analysis •  Customer Experience Analytics •  Network Monitoring •  Research And Development BUT: to what level is DQ required for your use case?© Talend 2011 – Stri2y Private & Confidential© Talend 2011 24
  • 25. Poor Data Quality + Big Data = Big ProblemsPoor Data Quality * Big Data = Big Problems^2 Key Takeaway #3 In big data… poor data quality can be magnified at huge scale© Talend 2011 25
  • 26. Two methods for inserting data quality into a big data job 1.  Pipelining: as part of the load process 2.  Load the cluster than implement and execute a data quality map reduce job© Talend 2011 26
  • 27. E-T-L - Load Extract – Transform© Talend 2011 – Stri2y Private & Confidential© Talend 2011 27
  • 28. E- DQ -L Extract – Improve/Cleanse - Load© Talend 2011 – Stri2y Private & Confidential© Talend 2011 28
  • 29. Pipelining: data quality with big data CRM DQ ERP DQ Finance Big Data Social Networking •  Use traditional data quality tools •  No new programming, no PHDs •  Once and done Mobile Devices© Talend 2011 – Stri2y Private & Confidential© Talend 2011 29
  • 30. Big data alternative: Load and improve within the cluster CRM DQ ERP DQ Finance Big Data Social Networking •  Load first, improve later •  Really complex to build, limited tools •  Constant on, increments Mobile Devices •  Insane performance© Talend 2011 – Stri2y Private & Confidential© Talend 2011 30
  • 31. big2012 data now Q4 2013Talend Open Studio for Big Data¾ Packaged within Hortonworks Data Platform …Eclipse tools for HIVE, HDFS, PIG, SCOOP …supports Oozie, Hcatalog, Kerberos¾ Free to download and use under the Apache license …democratizing big data through intuitive tools© Talend 2011 – Stri2y Private & Confidential© Talend 2011 31
  • 32. Thanks for attending
  • 33. Sessions will resume at 11:25am Page 33