Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Upcoming SlideShare
Loading in...5
×
 

Hadoop's Role in the Big Data Architecture, OW2con'12, Paris

on

  • 1,559 views

 

Statistics

Views

Total Views
1,559
Views on SlideShare
1,558
Embed Views
1

Actions

Likes
3
Downloads
96
Comments
0

1 Embed 1

http://www.weebly.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hadoop's Role in the Big Data Architecture, OW2con'12, Paris Hadoop's Role in the Big Data Architecture, OW2con'12, Paris Presentation Transcript

  • Hadoop & HortonworksOpen Source Wild FireNovember 2012OW2 Con© Hortonworks Inc. 2012 Page 1
  • Big data changes the game Transactions + InteractionsPetabytes BIG DATA Mobile Web + Observations Sentiment User Click Stream SMS/MMS = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches User Generated Content ERP Megabytes Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity © Hortonworks Inc. 2012
  • Big Data: Optimize Outcomes at Scale Sports optimize Championships Intelligence optimize Detection Finance optimize Algorithms Advertising optimize Performance Fraud optimize PreventionRetail / Wholesale optimize Inventory turns Manufacturing optimize Supply chains Healthcare optimize Patient outcomes Education optimize Learning outcomes Government optimize Citizen services Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation. Page 3 © Hortonworks Inc. 2012
  • Apache HadoopOpen Source data management Key Characteristics • Scalablewith scale-out storage & – Efficiently store and processdistributed processing petabytes of data – Linear scale driven by additional processing and storage HDFS • ReliableStorage • Distributed across “nodes” – Redundant storage • Natively redundant – Failover across nodes and racks • Name node tracks locations • Flexible – Store all types of data in any format – Apply schema on analysis and Map Reduce sharing of the dataProcessing • Splits a task across processors • Economical “near” the data & assembles results – Use commodity hardware • Self-Healing, High Bandwidth – Open source software guards Clustered Storage against vendor lock-in Page 4 © Hortonworks Inc. 2012
  • What is a Hadoop “Distribution” Talend WebHDFS Sqoop FlumeA complimentary set HCatalogof open source HBase Pig Hivetechnologies that MapReduce HDFSmake up a complete Ambari Oozie HAdata platform ZooKeeper• Tested and pre-packaged to ease installation and usage• Collects the right versions of the components that all have different release cycles and ensures they work together © Hortonworks Inc. 2012
  • Hadoop in Enterprise Data Architectures Existing Business Infrastructure Web New Tech Datameer Tablaeu Karmasphere IDE & ODS & Applications & Visualization & Web Splunk Dev Tools Datamarts Spreadsheets Intelligence Applications Operations Discovery Low Tools EDW Latency/NoSQ L Custom Existing Templeton WebHDFS Sqoop Flume HCatalog HBase Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Social Exhaust logs files CRM ERP financials Media Data Big Data Sources (transactions, observations, interactions) Page 6 © Hortonworks Inc. 2012
  • Apache Hadoop & Big Data Use Cases Big Data Transactions, Interactions, Observations Refine Explore Enrich Business Case Page 7 © Hortonworks Inc. 2012
  • Operational Data RefineryHadoop as platform for ETL modernization Enric Refine Explore hUnstructured Log files DB data Capture • Capture new unstructured data along with log files all alongside existing sources • Retain inputs in raw form for audit and Capture and archive continuity purposes Parse & Cleanse Process Structure and join • Parse the data & cleanse Upload • Apply structure and definition • Join datasets together across disparate data Refinery sources Exchange • Push to existing data warehouse for downstream consumption • Feeds operational reporting and online systems Enterprise Data Warehouse Page 8 © Hortonworks Inc. 2012
  • Big Data Exploration & Visualization Hadoop as agile, ad-hoc data mart Refine Explore Enrich Unstructured Log files DB data Capture • Capture multi-structured data and retain inputs in raw form for iterative analysis Capture and archive Process • Parse the data into queryable format Structure and join • Explore & analyze using Hive, Pig, Mahout and Categorize into tables other tools to discover value • Label data and type information for upload JDBC / ODBC compatibility and later discovery Explore • Pre-compute stats, groupings, patterns in dataOptional to accelerate analysis Exchange • Use visualization tools to facilitate exploration and find key insights Visualization Tools • Optionally move actionable insights into EDW EDW / Datamart or datamart Page 9 © Hortonworks Inc. 2012
  • Application EnrichmentDeliver Hadoop analysis to online apps Refine Explore EnrichUnstructured Log files DB data Capture • Capture data that was once too bulky and unmanageable Capture Enrich Parse Process Derive/Filter • Uncover aggregate characteristics across data Scheduled & • Use Hive Pig and Map Reduce to identify patterns near real time NoSQL, HBase • Filter useful data from mass streams (Pig) Low Latency • Micro or macro batch oriented schedules Exchange • Push results to HBase or other NoSQL alternative for real time delivery • Use patterns to deliver right content/offer to the Online right person at the right time Applications Page 10 © Hortonworks Inc. 2012
  • Balancing Innovation & Stability • Hadoop is “pre-chasm” • Ecosystem still evolvingcustomers relative % • Enterprises endure 1-3 year adoption cycle The CHASM Innovators, Early Early Late majority, Laggards, technology adopters, majority, conservatives Skeptics enthusiasts visionaries pragmatists time Customers want Customers want technology & performance solutions & convenience Source: Geoffrey Moore - Crossing the Chasm Page 11 © Hortonworks Inc. 2012
  • What Hortonworks does… We believe that by the end of 2015, more than half the worlds data will be processed by Apache Hadoop. Strategy: invest in Apache Hadoop to make it “The enterprise big data platform”Distribution Ecosystem Support• Hortonworks Data • Enable an Ecosystem of • Deliver highest quality Platform (HDP) Big Data Apps support and expertise• Enterprise Ready, Stable, • Our goal os to make sure all • Access to Apache Hadoop Reliable, Tested your tools work WITH Experts• 100% open source Hadoop • Hadoop training an• Built by the architects, • HDP is Hadoop for certification by the Hadoop builders and operators of • Microsoft experts(web, public, private) Apache Hadoop • Teradata Page 12 © Hortonworks Inc. 2012
  • AMSTERDAM March 20-21, 2013 Enabling the Next Generation Enterprise Data Platform • LEARN: Dozens of Sessions • INTERACT: Community Focused EventRegister today! @ hadoopsummit.org Page 13 © Hortonworks Inc. 2012