2014 july 24_what_ishadoop

1,199 views
1,108 views

Published on

Presentation for Silicon Peel group at Microsoft Canada HQ, July 24, 2014

Published in: Software
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,199
On SlideShare
0
From Embeds
0
Number of Embeds
359
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2014 july 24_what_ishadoop

  1. 1. EVERYONE LIKES ELEPHANTS Adam Muise amuise@hortonworks.com Principal Architect Hortonworks
  2. 2. Who am I?
  3. 3. Who is ?
  4. 4. We do Hadoop The leaders of Hadoop’s development Community driven, Enterprise Focused Drive Innovation in the platform – We lead the roadmap 100% Open Source – Democratized Access to Data
  5. 5. We do Hadoop successfully. Support Professional Services Training
  6. 6. What is Hadoop? What is everyone talking about?
  7. 7. “Big Data” is the marketing term of the decade in IT
  8. 8. What lurks behind the hype is the democratization of Data, a move to aggregate disparate data silos into one shiny pile of analytic gold
  9. 9. So what are the problems with Big Data?
  10. 10. Let’s talk challenges…
  11. 11. Volume Volume Volume Volume
  12. 12. Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume
  13. 13. Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume VolumeVolume Volume Volume VolumeVolume Volume Volume Volume Volume VolumeVolume VolumeVolume Volume Volume Volume Volume Volume VolumeVolume Volume Volume Volume Volume Volume VolumeVolume Volume Volume Volume Volume Volume VolumeVolume Volume Volume
  14. 14. Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume Volume VolumeVolume Volume Volume VolumeVolume Volume Volume Volume Volume VolumeVolume VolumeVolume Volume Volume Volume Volume Volume VolumeVolume Volume Volume Volume Volume Volume VolumeVolume Volume Volume Volume Volume Volume VolumeVolume Volume Volume Volume VolumeVolume VolumeVolume Volume Volume Volume Volume Volume VolumeVolume Volume Volume Volume Volume Volume VolumeVolume Volume Volume Volume VolumeVolume VolumeVolume Volume Volume Volume VolumeVolume VolumeVolume Volume Volume Volume Volume Volume VolumeVolume Volume Volume Volume VolumeVolume VolumeVolume Volume Volume Volume Volume Volume VolumeVolume Volume Volume Volume Volume Volume VolumeVolume Volume Volume Volume VolumeVolume VolumeVolume Volume Volume Volume VolumeVolume VolumeVolume Volume Volume Volume VolumeVolume VolumeVolume Volume Volume
  15. 15. Storage, Management, Processing all become challenges with Data at Volume
  16. 16. Traditional technologies adopt a divide, drop, and conquer approach
  17. 17. The solution? EDW Data Data Data Data Data Data Data Data Data Yet Another EDW Data Data Data Data Data Data Data Data Data Analytical DB Data Data Data Data Data Data Data Data Data OLTP Data Data Data Data Data Data Data Data Data Another EDW Data Data Data Data Data Data Data Data Data
  18. 18. Ummm…you dropped something Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data DataData Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data DataData Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data EDW Data Data Data Data Data Data Data Data Data Yet Another EDW Data Data Data Data Data Data Data Data Data Analytical DB Data Data Data Data Data Data Data Data Data OLTP DataData Data Data Data Data Data Data Data Another EDW Data Data Data Data Data Data Data Data Data
  19. 19. Analyzing the data usually raises more interesting questions…
  20. 20. …which leads to more data
  21. 21. Wait, you’ve seen this before. DataData Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data DataData Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Analytics Sausage Factory Data Data Data Data Data Data Data Data Data …Data Data Data … Data Data Data Data
  22. 22. Data begets Data.
  23. 23. What keeps us from our Data?
  24. 24. “Prices, Stupid passwords, and Boring Statistics.” - Hans Rosling http://www.youtube.com/watch?v=hVimVzgtD6w
  25. 25. Your data silos are lonely places. EDW Data Data Data Data Data Data Data Data Data Accounts Data Data Data Data Data Data Data Data Data Customers Data Data Data Data Data Data Data Data Data Web Properties Data Data Data Data Data Data Data Data Data
  26. 26. … Data likes to be together. EDW Data Data Data Data Data Data Data Data Data Accounts Data Data Data Data Data Data Data Data Data Customers Data Data Data Data Data Data Data Data Data Web Properties Data Data Data Data Data Data Data Data Data
  27. 27. Data likes to socialize too. EDW Data Data Data Data Data Data Data Data Data Accounts Data Data Data Data Data Data Data Data Data Customers Data Data Data Data Data Data Data Data Data Web Properties Data Data Data Data Data Data Data Data Data Machine Data Data Data Data Data Data Data Data Data Data Twitter DataData Data Data Data Data Data Data Data Facebook Data Data Data Data Data Data Data Data Data CDR Data Data Data Data Data Data Data Data Data Weather Data Data Data Data Data Data Data Data Data Data
  28. 28. New types of data don’t quite fit into your pristine view of the world. My Little Data Empire Data Data Data Data Data Data Data Data Data Logs Data DataData Data Data Data Data Machine Data Data DataData Data Data Data Data ? ? ? ?
  29. 29. To resolve this, some people take hints from Lord Of The Rings...
  30. 30. …and create One-Schema-To- Rule-Them-All… EDW Data Data Data Data Data Data Data Data DataSchema
  31. 31. …but that has its problems too. EDW Data Data Data Data Data Data Data Data DataSchemaData Data Data ETL ETL ETL ETL EDW Data Data Data Data Data Data Data Data DataSchemaData Data Data ETL ETL ETL ETL
  32. 32. What if the data was processed and stored centrally? What if you didn’t need to force it into a single schema? We call it a Data Lake. EDW Data Data Data Data Data Data Data Schema BI & Analytics Schema Schema Data Data Data Data Lake Data Data Data Data Data DataData Data Data Data Data Data Schema Schema Data Data Data Process Process Data Data Data Data Data Data Data Data Data Data Data DataData Sources Data Sources
  33. 33. A Data Lake Architecture enables: - Landing data without forcing a single schema - Landing a variety and large volume of data efficiently - Retaining data for a long period of time with a very low $/TB - A platform to feed other Analytical DBs - A platform to execute next gen data analytics and processing applications (SAS, Informatica, Graph Analytics, Machine Learning, SAP, etc…)
  34. 34. In most cases, more data is better. Work with the population, not just a sample.
  35. 35. Your view of a client today. Male Female Age: 25-30 Town/City Middle Income Band Product Category Preferences
  36. 36. Your view with more data. Male Female Age: 27 but feels old GPS coordinates $65-68k per year Product recommendations Tea Party Hippie Looking to start a business Walking into Starbucks right now… A depressed Toronto Maple Leaf’s Fan Products left in basket indicate drunk amazon shopper Gene Expression for Risk Taker Thinking about a new house Unhappy with his cell phone plan Pregnant Spent 25 minutes looking at tea cozies
  37. 37. So what is the answer?
  38. 38. Enter the Hadoop. http://www.fabulouslybroke.com/2011/05/ninja-elephants-and-other-awesome-stories/ ………
  39. 39. Hadoop was created because traditional technologies never cut it for the Internet properties like Google, Yahoo, Facebook, Twitter, and LinkedIn
  40. 40. Traditional architecture didn’t scale enough… DB DB DB SAN AppApp AppApp DB DB DB SAN AppApp AppApp DB DB DB SAN AppApp AppApp
  41. 41. Traditional architectures cost too much at that volume… $/TB $pecial Hardware $upercomputing
  42. 42. So what is the answer?
  43. 43. If you could design a system that would handle this, what would it look like?
  44. 44. It would probably need a highly resilient, self-healing, cost-efficient, distributed file system… Storage Storage Storage Storage Storage Storage Storage Storage Storage
  45. 45. It would probably need a completely parallel processing framework that took tasks to the data… Storage Storage Storage Storage Storage Storage Storage Storage Storage Processing Processing Processing Processing Processing Processing Processing Processing Processing
  46. 46. It would probably run on commodity hardware, virtualized machines, and common OS platforms Storage Storage Storage Storage Storage Storage Storage Storage Storage Processing Processing Processing Processing Processing Processing Processing Processing Processing
  47. 47. It would probably be open source so innovation could happen as quickly as possible
  48. 48. It would need a critical mass of users
  49. 49. {Processing + Storage} = {YARN + HDFS}
  50. 50. Want to get your hands dirty?
  51. 51. To do this, we need to install Hadoop right?
  52. 52. Nope.
  53. 53. Enter the Sandbox.
  54. 54. The Sandbox is ‘Hadoop in a Can’. It contains one copy of each of the Master and Worker node processes used in a cluster, only in a single virtual node. Storage Storage Storage Storage Storage Storage Storage Storage Storage Processing Processing Processing Processing Processing Processing Processing Processing Processing Processing Storage Linux VM
  55. 55. Getting started with Sandbox VM: - Pick your flavor of VM at… http://www.hortonworks.com/sandbox - Start the sandbox VM - find the IP displayed - go to… http://172.16.130.137 - Register - Click on ‘Start Tutorials’ - On the left hand nav, click on ‘HCatalog, Basic Pig & Hive Commands’
  56. 56. http://hortonworks.com/hadoop-tutorial/how-to-use- hcatalog-basic-pig-hive-commands/ In this tutorial you can… - Land files in HDFS - Assign metadata with HCatalog - Use SQL with Hive - Learn to process data with Pig
  57. 57. Hadoop has other open source projects…
  58. 58. Apache Hadoop FlumeAmbari HBase Falcon MapReduce HDFS SqoopHCatalog Pig Hive Storm YARN Knox Tez
  59. 59. Hortonworks Data Platform FlumeAmbari HBase Falcon MapReduce HDFS SqoopHCatalog Pig Hive Storm YARN Knox Tez
  60. 60. What else are we working on? hortonworks.com/labs/
  61. 61. © Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page 62 There is NO second place Hortonworks We do Hadoop.

×