Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Talend Open Studio and Hortonworks Data Platform


Published on

Data Integration is a key step in a Hadoop solution architecture. It is the first obstacle encountered once your cluster is up and running. OK, I have a cluster…now what? Complex scripts? For wide scale adoption of Apache Hadoop, an intuitive set of tools that abstract away the complexity of integration is necessary.

Published in: Education, Technology

Talend Open Studio and Hortonworks Data Platform

  1. 1. Big Data IntegrationTalend Open Studio & Hortonworks Data PlatformCiaran Dynes: Senior Director, Product Marketing - TalendJim Walker: Director, Product Marketing - HortonworksAugust 8, 2012© Hortonworks Inc. 2012 Page 1
  2. 2. Your Presenters Ciaran Dynes Senior Director, Product Marketing Jim Walker Director, Product Marketing Page 2 © Hortonworks Inc. 2012
  3. 3. Talend – The Market Leading Unified Integration Platform Talend Enterprise Data Data MDM ESB BPM Quality Integration ¾  Commercial license ¾  Subscription model Studio Repository Deployment Execution Monitoring ¾  Open source license Talend Open Studio for ¾  Free of charge ¾  Optional support Data Data Quality Integration MDM ESBRecognized as the open source leader in each of its market category by all industry analysts© Talend 2011 3
  4. 4. Hortonworks Snapshot The industry leading and only 100% open source Apache Hadoop distribution•  Headquarters Sunnyvale, CA Most experienced open source leadership team –  Rob Bearden – CEO (JBoss, SpringSource, i2, Oracle)•  90+ Employees –  Shaun Connolly – VP Strategy (VMW, SpringSource, Red Hat, JBoss)•  Formed with core –  John Kreisa – VP Marketing (Red Hat, Cloudera, MarkLogic, Bus Obj) Apache Hadoop –  Ari Zilka – CPO (Teracotta, Accenture, engineering team from Yahoo! –  Greg Pavlik – VP Eng. (Oracle SOA & Integration platform)•  35 engineers and architects including Business model focused on customer success: 25+ Hadoop Hadoop support, services & training committers – Subscription support for Hortonworks Data Platform – Training business: Private and public classes available for developers & administrators © Hortonworks Inc. 2012
  5. 5. Next-gen data architecture driversBusiness •  Enable new business models & drive faster growth (20%+) Drivers •  Find insights for competitive advantage & optimal returns •  Data continues to grow exponentiallyTechnical •  Data is increasingly everywhere and in many formats Drivers •  Legacy solutions unfit for new requirements growthFinancial •  Cost of data systems, as % of IT spend, continues to grow Drivers •  Cost advantages of commodity hardware & open source © Hortonworks Inc. 2012
  6. 6. Big data changes the game Transactions + InteractionsPetabytes BIG DATA Mobile Web + Observations Sentiment User Click Stream SMS/MMS = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches User Generated Content ERP Megabytes Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity © Hortonworks Inc. 2012
  7. 7. Use cases: optimize outcomes at scale Media optimize Content Intelligence optimize Detection Investment optimize Algorithms Advertising optimize Performance Fraud optimize Prevention Regulation optimize Compliance Retail / Wholesale optimize Inventory turns Manufacturing optimize Supply chains Healthcare optimize Patient outcomes Education optimize Learning outcomes Government optimize Citizen services Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation. © Hortonworks Inc. 2012
  8. 8. Hortonworks Data Platform •  Simplify deployment to get started quickly and easily •  Monitor, manage any size cluster with familiar console and tools •  Only platform to include data integration services to interact 1 with any data source •  Metadata services opens the platform for integration with Hortonworks Data Platform existing applications Delivers enterprise grade functionality on a proven Apache Hadoop distribution to ease •  Dependable high availability management, simplify use and ease integration architecture into the enterpriseThe only 100% open source data platform for Apache Hadoop © Hortonworks Inc. 2012
  9. 9. Data Integration Services•  Intuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and Pig•  Oozie scheduling allows you to manage and stage jobs•  Connectors for any database, business application or system•  Integrated HCatalog storage Bridge the gap between legacy data & Hadoop Simplify and speed development Page 9 © Hortonworks Inc. 2012
  10. 10. What is Big Data integration?
  11. 11. Trying to get from this… © Talend 2011 – Stri2y Private & Confidential © Talend 2011 11
  12. 12. to this… Why Talend… ONLY Talend generates code that is executed within map reduce. This open approach removes the limitation of a proprietary “engine” to provide a truly unique and powerful set of tools for big data.
  13. 13. Key Takeaway #2 Forces us to think© Talend 2011 differently© Talend 2011 – Stri2y Private & Confidential 13
  14. 14. But for Talend…. Big data is… …everything that is old, is new again!© Talend 2011 – Stri2y Private & Confidential© Talend 2011 14
  15. 15. Data driven business enables data governance supports information decisions drives Information provides value to the business If you cant rely on your information then Your the result can be missed opportunities, or business higher costs. Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE).© Talend 2011 – Stri2y Private & Confidential© Talend 2011 15
  16. 16. BIG data driven business enables BIG data governance supports BIG BIG information decisions drives Information provides value to the business If you cant rely on your information then the result can be missed opportunities, or BIG higher costs. business Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE).© Talend 2011 – Stri2y Private & Confidential© Talend 2011 16
  17. 17. Let us show you…© Talend 2012
  18. 18. Putting Web Logs to use Scenario: ¾  ACME Web Inc. have thousands of customers and millions of daily page hits on their ecommerce website ¾  ACME believe they could sell more things, if they could simply figure our buying trends ¾  ACME turns to Big Data to help get a handle on the volume of data they need to manage© © Talend 2011 2012 Talend 18 18
  19. 19. Poor Data Quality + Big Data = Big ProblemsPoor Data Quality * Big Data = Big Problems^2 Key Takeaway #3 In big data… poor data quality can be magnified at huge scale© Talend 2011 19
  20. 20. Metadata ServicesApache HCatalog provides flexible metadataservices across tools and external access •  Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive) •  Accessibility: share data as tables in and out of HDFS •  Availability: enables flexible, thin-client access via REST API HCatalog Shared table and schema management •  Raw Hadoop data Table access opens the •  Inconsistent, unknown Aligned metadata platform •  Tool specific access REST API © Hortonworks Inc. 2012
  21. 21. Talend Open Studio for Big Data Democratize Big Data Talend Open Studio for Big Data •  Improves efficiency of big data job design with graphic interface •  Generates Hadoop code •  Run transforms inside Hadoop Pig •  Native support for HDFS, Pig, Hbase, Sqoop and Hive •  Apache License •  Available at …an open source •  Distribution with hadoop vendors coming ecosystem© Talend 2011 21
  22. 22. Talend Platform for Big Data Make Faster and More Informed Decisions Talend Platform for Big Data •  Builds on Talend Open Studio for Big Data •  Adds data quality, advanced scalability and management functions •  MapReduce massively parallel data processing Pig •  Shared Repository and remote deployment •  Data quality and profiling •  Data cleansing •  Reporting and dashboards •  Commercial support, warranty/IP indemnity under a subscription license …an open source ecosystem© Talend 2011 22
  23. 23. Why HDP?Only Hortonworks Data Platform provides…•  Tightly aligned to core Apache Hadoop development line - Reduces risk for customers who may add custom coding or projects•  Enterprise Integration - HCatalog provides scalable, extensible integration point to Hadoop data•  Most reliable Hadoop distribution - Full stack high availability on v1 delivers the strongest SLA guarantees•  Multi-tenant scheduling and resource management - Capacity and fair scheduling optimizes cluster resources•  Integration with operations, eases cluster management - Ambari is the most open/complete operations platform for Hadoop clusters © Hortonworks Inc. 2012
  24. 24. What next? Download Hortonworks Data Platform1 & Talend Open Studio or Use the getting started guide Learn more… get support Hortonworks Support •  Expert role based training •  Full lifecycle technical support •  Course for admins, developers across four service levels and operators •  Delivered by Apache Hadoop •  Certification program Experts/Committers •  Custom onsite options •  Forward-compatible Page 24 © Hortonworks Inc. 2012
  25. 25. Questions & Answers TRY download at download at LEARN Hortonworks University FOLLOW twitter: @hortonworks Facebook: MORE EVENTS Further questions & comments: Page 25 © Hortonworks Inc. 2012