Eric.kavanagh@bloorgroup.comTwitter Tag: #briefr                   The Briefing Room
!   Reveal the essential characteristics of enterprise       software, good and bad    !   Provide a forum for detailed an...
!  November: Cloud   !  December: Innovators   !  January: Big Data   !  February: Performance   !  March: IntegrationTwit...
!  The Data Warehouse was once considered the Holy Grail of         Business Intelligence, but as data volumes increase   ...
Mark Madsen is president of Third                       Nature, a technology research and                       consulting...
!    Hortonworks is an enterprise software company that focuses on         the development and support of Apache Hadoop.  ...
Jim is the Director of Product Marketing at    Hortonworks. He is a recovering developer,    professional marketer and ama...
Hadoop: What It Is & Isn’tOctober 2012Jim WalkerDirector, Product MarketingHortonworks© Hortonworks Inc. 2012       Page 9
Big Data: Organizational Game Changer                                                                     Transactions + I...
What is a Data Driven Business?     •  DEFINITION        Better use of available data in the decision making process     •...
Big Data: Optimize Outcomes at Scale                     Media     optimize                 Content       Intelligence    ...
Enterprise Big Data Flows        Unstructured                                                                  Business   ...
Data Platform Requirements for Big Data                                  Data Platform for Big Data          Capture      ...
Apache Hadoop & Big Data Use Cases                                           Big Data                             Transact...
Operational Data RefineryHadoop as platform for ETL modernization                                                         ...
Big Data Exploration & Visualization  Hadoop as agile, ad-hoc data mart                                                   ...
Application EnrichmentDeliver Hadoop analysis to online apps                                                              ...
Hadoop in Enterprise Data Architectures    Existing Business Infrastructure                                               ...
Where Does It Fit into Your Business?   Vertical Refine                                  Explore                          ...
Hortonworks Vision & Leadership                                    We believe that by the end of 2015,                    ...
Hortonworks Data Platform                                                       •  Simplify deployment to get             ...
Twitter Tag: #briefr   The Briefing Room
“In	  pioneer	  days	  they	  used	  oxen	  for	  heavy	  pulling,	  and	       when	  one	  ox	  couldnt	  budge	  a	  lo...
What’s	  different	  today?	    We’re	  not	  ge@ng	  more	  CPU	    speed,	  but	  more	  CPU	  cycles.	    There	  are	  ...
Data	  volume	  is	  the	  oldest,	  easiest	  problem	  © Third Nature Inc.                                              ...
Analy:cs	  makes	  the	  data	  volume	  problem	  bigger                                                                 ...
I need that           It would be logical                                  data now.             to keep all the          ...
The	  proposed	  solu:on?	  Load	  Hadoop	  and	  analyze	  © Third Nature Inc.
Welcome	  to	  the	  Hadoop	  schema!                                                              	      Why	  soJ	  /	  ...
Whether	  to	  switch	  from	  a	  DB	  isn’t	  the	  right	  discussion	                    SQL?                         ...
Strategy:	  There’s	  a	  pony	  in	  there	  somewhere	  © Third Nature Inc.
…but	  you	  need	  a	  unicorn	  to	  find	  the	  pony	  © Third Nature Inc.
Ques:ons	  for	  discussion	     1. Is	  scale	  of	  data	  really	  that	  much	  of	  a	  problem	  for	  most	        ...
CC	  Image	  AOribu:ons	       Thanks	  to	  the	  people	  who	  supplied	  the	  creaEve	  commons	  licensed	  images	 ...
Twitter Tag: #briefr   The Briefing Room
!  This Month: Database   !  November: Cloud   !  December: Innovators   !  January: Big Data   !  2013 Editorial Calendar...
Twitter Tag: #briefr   The Briefing Room
Hadoop: What It Is and What It's Not
Upcoming SlideShare
Loading in …5
×

Hadoop: What It Is and What It's Not

933 views

Published on

The Briefing Room with Mark Madsen and Hortonworks
Slides from the Live Webcast on Oct. 16, 2012

The power of Hadoop cannot be denied, as evidenced by the fact that all the biggest closed-source vendors in the world of data management have embraced this open-source project with virtually open arms. But Hadoop is not a data warehouse, nor ever will it likely be. Rather, it's ideal role for now is to augment traditional data warehousing and business intelligence. As an adjunct, Hadoop provides an amazing mechanism for storing and analyzing Big Data. The key is to manage expectations and move forward carefully.

Check out this episode of The Briefing Room to hear veteran Analyst Mark Madsen of Third Nature, who will explain how, where, when and why to leverage the open-source elephant in the enterprise. He'll be briefed by Jim Walker of Hortonworks who will tout his company's vision for the future of Big Data management. He'll provide details on their data platform and how it can be used to complete the picture of information management. He'll also discuss how the Hortonworks partner network can help companies get big value from Big Data.

Visit: http://www.insideanalysis.com

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
933
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
21
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Hadoop: What It Is and What It's Not

  1. 1. Eric.kavanagh@bloorgroup.comTwitter Tag: #briefr The Briefing Room
  2. 2. !   Reveal the essential characteristics of enterprise software, good and bad !   Provide a forum for detailed analysis of today s innovative technologies !   Give vendors a chance to explain their product to savvy analysts !   Allow audience members to pose serious questions... and get answers!Twitter Tag: #briefr The Briefing Room
  3. 3. !  November: Cloud !  December: Innovators !  January: Big Data !  February: Performance !  March: IntegrationTwitter Tag: #briefr The Briefing Room
  4. 4. !  The Data Warehouse was once considered the Holy Grail of Business Intelligence, but as data volumes increase exponentially, we’re finding that data warehousing cannot be all things for all users. ! Hadoop was initially developed at Yahoo! to support a search engine project and has since turned into the poster child for open source Big Data processing. !  While Hadoop is not a data warehouse, its capabilities can help organizations store and analyze huge volumes of data.Twitter Tag: #briefr The Briefing Room
  5. 5. Mark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, data integration and data management. Mark is an award-winning author, architect and CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributor at Forbes Online and Information Management. For more information or to contact Mark, follow @markmadsen on Twitter or visit http://ThirdNature.netTwitter Tag: #briefr The Briefing Room
  6. 6. ! Hortonworks is an enterprise software company that focuses on the development and support of Apache Hadoop. !  Its product is the Hortonworks Data Platform, an open source platform for storing, processing and analyzing large volumes of data from many sources and in a variety of formats. ! Hortonworks recently introduced its Hive ODBC Driver 1.0, which allows users to integrate its Hadoop platform with the BI apps running on top.Twitter Tag: #briefr The Briefing Room
  7. 7. Jim is the Director of Product Marketing at Hortonworks. He is a recovering developer, professional marketer and amateur photographer with nearly twenty years experience building products and developing emerging technologies. During his career, he has brought multiple  products to market in a variety of fields, including data loss prevention, master data management and now big data.  At Hortonworks, Jim is focused on accelerating the development and adoption of Apache Hadoop.Twitter Tag: #briefr The Briefing Room
  8. 8. Hadoop: What It Is & Isn’tOctober 2012Jim WalkerDirector, Product MarketingHortonworks© Hortonworks Inc. 2012 Page 9
  9. 9. Big Data: Organizational Game Changer Transactions + InteractionsPetabytes BIG DATA Mobile Web + Observations Sentiment User Click Stream SMS/MMS = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches User Generated Content ERP Megabytes Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity Page 10 © Hortonworks Inc. 2012
  10. 10. What is a Data Driven Business? •  DEFINITION Better use of available data in the decision making process •  RULE Key metrics derived from data should be tied to goals •  PROVEN RESULTS Firms that adopt Data-Driven Decision Making have output and productivity that is 5-6% higher than what would be expected given their investments and usage of information technology*1110010100001010011101010100010010100100101001001000010010001001000001000100000100010010010001000010111000010010001000101001001011110101001000100100101001010010011111001010010100011111010001001010000010010001010010111101010011001001010010001000111 * “Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance?” Brynjolfsson, Hitt and Kim (April 22, 2011) Page 11 © Hortonworks Inc. 2012
  11. 11. Big Data: Optimize Outcomes at Scale Media optimize Content Intelligence optimize Detection Finance optimize Algorithms Advertising optimize Performance Fraud optimize PreventionRetail / Wholesale optimize Inventory turns Manufacturing optimize Supply chains Healthcare optimize Patient outcomes Education optimize Learning outcomes Government optimize Citizen services Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation. Page 12 © Hortonworks Inc. 2012
  12. 12. Enterprise Big Data Flows Unstructured Business CRM, ERP Data Transactions Web, Mobile & Interactions Point of sale Log files Big Data Platform Exhaust Data Classic Data Integration & ETL Social Media Sensors, devices Business Dashboards, Intelligence Reports, & Analytics Visualization, … DB data Capture Big Data Process Distribute Results Feedback1 Collect data from all sources structured &unstructured 2 Transform, refine, aggregate, analyze, report 3 Interoperate and share data with applications/analytics 4 Use operational data w/in big data platform, preserve data Page 13 © Hortonworks Inc. 2012
  13. 13. Data Platform Requirements for Big Data Data Platform for Big Data Capture Process Exchange •  Collect data from all •  Transform, refine, •  Deliver data with sources - structured and aggregate, analyze, enterprise data systems unstructured data report •  Share data with analytic •  all speeds batch, async, applications and streaming, real-time processing Operate •  Provision, monitor, diagnose, manage at scale •  Reliability, availability, affordability, scalability, interoperability Across all deployment models Operating Virtual Cloud Big Data Systems Platforms Platforms Appliances Page 14 © Hortonworks Inc. 2012
  14. 14. Apache Hadoop & Big Data Use Cases Big Data Transactions, Interactions, Observations Refine Explore Enrich Business Case Page 15 © Hortonworks Inc. 2012
  15. 15. Operational Data RefineryHadoop as platform for ETL modernization Refine Explore EnrichUnstructured Log files DB data Capture •  Capture new unstructured data along with log files all alongside existing sources •  Retain inputs in raw form for audit and Capture and archive continuity purposes Parse & Cleanse Process Structure and join •  Parse the data & cleanse Upload •  Apply structure and definition Refinery •  Join datasets together across disparate data sources Exchange •  Push to existing data warehouse for downstream consumption Enterprise •  Feeds operational reporting and online systems Data Warehouse Page 16 © Hortonworks Inc. 2012
  16. 16. Big Data Exploration & Visualization Hadoop as agile, ad-hoc data mart Refine Explore Enrich Unstructured Log files DB data Capture •  Capture multi-structured data and retain inputs in raw form for iterative analysis Capture and archive Process •  Parse the data into queryable format Structure and join •  Explore & analyze using Hive, Pig, Mahout and Categorize into tables other tools to discover value upload JDBC / ODBC •  Label data and type information for compatibility and later discovery Explore •  Pre-compute stats, groupings, patterns in dataOptional to accelerate analysis Exchange •  Use visualization tools to facilitate exploration and find key insights Visualization EDW / Datamart Tools •  Optionally move actionable insights into EDW or datamart Page 17 © Hortonworks Inc. 2012
  17. 17. Application EnrichmentDeliver Hadoop analysis to online apps Refine Explore EnrichUnstructured Log files DB data Capture •  Capture data that was once too bulky and unmanageable Capture Enrich Parse Process Derive/Filter •  Uncover aggregate characteristics across data Scheduled & near real time •  Use Hive Pig and Map Reduce to identify patterns NoSQL, HBase •  Filter useful data from mass streams (Pig) Low Latency •  Micro or macro batch oriented schedules Exchange •  Push results to HBase or other NoSQL alternative for real time delivery Online •  Use patterns to deliver right content/offer to the Applications right person at the right time Page 18 © Hortonworks Inc. 2012
  18. 18. Hadoop in Enterprise Data Architectures Existing Business Infrastructure Web New Tech Datameer Tableau Karmasphere IDE & ODS & Applications & Visualization & Web Splunk Dev Tools Datamarts Spreadsheets Intelligence Applications Operations Discovery Low Latency/ Tools EDW NoSQL Custom Existing Templeton WebHDFS Sqoop Flume HCatalog HBase Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Social Exhaust logs files CRM ERP financials Media Data Big Data Sources (transactions, observations, interactions) Page 19 © Hortonworks Inc. 2012
  19. 19. Where Does It Fit into Your Business? Vertical Refine Explore Enrich •  Dynamic Pricing •  Log Analysis/Site Retail & Web •  Social Network Analysis •  Session & Content Optimization Optimization •  Loyalty Program •  Dynamic Pricing/Targeted Retail •  Brand and Sentiment Analysis Optimization Offer Intelligence •  Threat Identification •  Person of Interest Discovery •  Cross Jurisdiction Queries •  Risk Modeling & Fraud •  Surveillance and Fraud Identification •  Real-time upsell, cross sales Finance •  Trade Performance Detection marketing offers •  Customer Risk Analysis Analytics •  Smart Grid: Production •  Grid Failure Prevention Energy •  Individual Power Grid Optimization •  Smart Meters •  Dynamic DeliveryManufacturing •  Supply Chain Optimization •  Customer Churn Analysis •  Replacement parts Healthcare & •  Electronic Medical Records •  Clinical Trials Analysis •  Insurance Premium Payer (EMPI) Determination Page 20 © Hortonworks Inc. 2012
  20. 20. Hortonworks Vision & Leadership We believe that by the end of 2015, more than half the worlds data will be processed by Apache Hadoop. Trusted Open Innovative•  Stewards of core Hadoop •  100% open platform •  Innovating current platform•  Original builders and •  No POS holdback with HCatalog, Ambari, HA operators of Hadoop •  Open to the Hadoop •  Innovating future platform•  100+ years Hadoop community with YARN, HA development experience •  Open to the Hadoop •  Complete vision for•  Managed every viable, ecosystem Hadoop-based platform stable Hadoop release •  Closely aligned to •  Enable the Hadoop•  HDP built on Hadoop 1.0 Hadoop core ecosystem Page 21 © Hortonworks Inc. 2012
  21. 21. Hortonworks Data Platform •  Simplify deployment to get started quickly and easily •  Monitor, manage any size cluster with familiar console and tools 1 •  Only platform to include data integration services to interact with any data •  Metadata services opens the platform for integration with existing applications •  Dependable high availability architectureü  Reduce risks and cost of adoptionü  Lower the total cost to administer and provision •  Tested at scale to future proof your cluster growthü  Integrate with your existing ecosystem Page 22 © Hortonworks Inc. 2012
  22. 22. Twitter Tag: #briefr The Briefing Room
  23. 23. “In  pioneer  days  they  used  oxen  for  heavy  pulling,  and   when  one  ox  couldnt  budge  a  log,  they  didnt  try  to   grow  a  larger  ox.  We  shouldnt  be  trying  for  bigger   computers,  but  for  more  systems  of  computers.”    Grace  Hopper  © Third Nature Inc.
  24. 24. What’s  different  today?   We’re  not  ge@ng  more  CPU   speed,  but  more  CPU  cycles.   There  are  too  many  CPUs   relaEve  to  other  resources,   creaEng  an  imbalance  in   hardware  plaForms.   We  therefore  use  nodes  to   aggregate  memory,  network   bandwidth  and  IOPs.   Most  soJware  is  designed  for   a  single  worker,  not    high   degrees  of  parallelism  and   won’t  scale  well.  © Third Nature Inc.
  25. 25. Data  volume  is  the  oldest,  easiest  problem  © Third Nature Inc. Teradata
  26. 26. Analy:cs  makes  the  data  volume  problem  bigger   Many  of  the  processing  problems  are  O(n2)  or  worse,  so   moderate  data  can  be  a  problem  for  DW  architectures  © Third Nature Inc.
  27. 27. I need that It would be logical data now. to keep all the It will take.   data in one place. 6 months       A  common  problem  with  new  projects  or  © Third Nature Inc. unexpected  business  problems…  
  28. 28. The  proposed  solu:on?  Load  Hadoop  and  analyze  © Third Nature Inc.
  29. 29. Welcome  to  the  Hadoop  schema!   Why  soJ  /  no  schema  can  be  good:   Easier  programming   Easier  modeling  since  you  don’t  have  to  be  perfect  in  advance,  and   it’s  change-­‐resilient   Join  eliminaEon  =  I/O  savings  (if  no  updates)  © Third Nature Inc.
  30. 30. Whether  to  switch  from  a  DB  isn’t  the  right  discussion   SQL? Hadoop SQL! SQL SQL.. .© Third Nature Inc.
  31. 31. Strategy:  There’s  a  pony  in  there  somewhere  © Third Nature Inc.
  32. 32. …but  you  need  a  unicorn  to  find  the  pony  © Third Nature Inc.
  33. 33. Ques:ons  for  discussion   1. Is  scale  of  data  really  that  much  of  a  problem  for  most   organizaEons?   2. Hadoop  is  designed  for  batch  work  –  how  good  is  it  for   interacEve  use?  Real-­‐Eme  use  cases?   3. How  do  you  define  “plaForm”?   4. ETL  modernizaEon  is  menEoned,  but  isn’t  this  a  reversion   to  manual  coding?   5. How  do  you  design  for  long-­‐term  use  rather  than  one-­‐off   analysis  projects?   6. Does  open  source  really  macer  for  this  part  of  the  stack?  © Third Nature Inc.
  34. 34. CC  Image  AOribu:ons   Thanks  to  the  people  who  supplied  the  creaEve  commons  licensed  images  used  in  this  presentaEon:     Phone  dump  -­‐  Richard  Barnes   ponies  in  field.jpg  -­‐  hcp://www.flickr.com/photos/bulle_de/352732514/    © Third Nature Inc.
  35. 35. Twitter Tag: #briefr The Briefing Room
  36. 36. !  This Month: Database !  November: Cloud !  December: Innovators !  January: Big Data !  2013 Editorial Calendar (www.insideanalysis.com)Twitter Tag: #briefr The Briefing Room
  37. 37. Twitter Tag: #briefr The Briefing Room

×