Making Sense of Big data with Hadoop

7,236 views

Published on

Published in: Technology
  • Be the first to comment

Making Sense of Big data with Hadoop

  1. Making Sense ofBIG DATA with Hadoop
  2. ● 13 years with a pager● Oracle ACE Director● Oak table member● Senior consultant for Pythian● @gwenshap● http://www.pythian.com/news/ author/shapira/● shapira@pythian.com © 2012 Pythian
  3. Pythian Recognized Leader: • Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server • Work with over 165 multinational companies such as LinkShare Corporation, IGN Entertainment, CrowdTwist, TinyCo and Western Union to help manage their complex IT deployments Expertise: • One of the world’s largest concentrations of dedicated, full-time DBA expertise. Employ 7 Oracle ACEs/ACE Directors. Heavily involved in the MySQL community, driving the MySQL Professionals Group and sit on the IOUG Advisory Board for MySQL. • Hold 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC Global Reach & Scalability: • 24/7/365 global remote support for DBA and consulting, systems administration, special projects or emergency response3 © 2012 Pythian
  4. What is Big Data?
  5. MORE DATA THANYOU CAN HANDLE © 2012 Pythian
  6. MORE DATA THANRELATIONALDATABASESCAN HANDLE © 2012 Pythian
  7. MORE DATA THANRELATIONALDATABASESCAN HANDLECHEAPLY © 2012 Pythian
  8. Data Arriving at fast RatesTypically unstructuredStored without aggregationAnalyzed in Real TimeFor Reasonable Cost © 2012 Pythian
  9. Complex Data Architecture © 2012 Pythian
  10. Your Data is NOT as BIG as you think© 2012 Pythian
  11. Why Big Data?Why Hadoop?
  12. BECAUSE WE CAN © 2012 Pythian
  13. More Data Beats SmarterAlgorithms © 2012 Pythian
  14. email Photos Job postingTweets Video Medical imaging Sensors Blog posts Tags Scanned docs © 2012 Pythian
  15. Data is Messy
  16. An Imperial College Team found: •3,000 patients under 19 were treated in geriatric clinics • between 15,000 and 20,000 men have been admitted to obstetric wards •and almost 10,000 to gynecology wards http://www.straightstatistics.org/blog/2012/04/06/why-are-so-many-men-pregnant16 © 2012 Pythian
  17. UnstructuredEventually Structured Data
  18. Scalable Storage +Massive Parallel Processing + Reasonable Cost © 2012 Pythian
  19. Hadoop: Platform for distributedcomputing © 2012 Pythian
  20. Hadoop is Scalable. But not fast. © 2012 Pythian
  21. Much Ado about Hadoop
  22. Assumptions• Lots of data• Large Files• Unstructured• Scan entire files• Unreliable Hardware• Adding servers = increase capacity © 2012 Pythian
  23. Principles• Bring Code to Data• Share Nothing © 2012 Pythian
  24. HDFS• Distributed• Replicated• Big Files• Write Once• Read Entire File © 2012 Pythian
  25. /users/shapira/log-1, blocks {1,4,5} /users/shapira/log-2, blocks {2,3,6}1 4 5 2 3 1 452 4 1 3 2 36 6 5 6 © 2012 Pythian
  26. Map Reduce Combine Map Reduce Start Map Stop Job 1 Reduce? Job 1 … … Map Reduce? Hadoop Job Results Combine Map Reduce Start Map Reduce? Job 2 Stop … Job 1 … Map Reduce?
  27. Implementation• Balance disks, cores and RAM• High Bandwidth• More nodes or better nodes? © 2012 Pythian
  28. It’s about the Ecosystem• Sqoop• Flume• Hive• Pig• HBase © 2012 Pythian
  29. Use Cases
  30. Use Case:Log processing
  31. Use Case: ETL BIOLTP DWH © 2012 Pythian
  32. Use Case:Recommendations
  33. Use case:Listening to the crowd © 2012 Pythian
  34. Our customers use Hadoop for: • Storing lots of pre-processed data • Merging different data types • Scalable data processing • Advanced data processing34 © 2012 Pythian
  35. Big Data in your Company
  36. Easy case:Your CTO heard about Big DataAnd is eager to invest.You have a Big Budget. © 2012 Pythian
  37. RequireMeasure Acquire Serve Organize Analyze © 2012 Pythian
  38. Require Hadoop Measure NoSQL OLTP BI,NoSQL, RDMBOracle Hadoop BI, R © 2012 Pythian
  39. Data Scientist=Sneaky BIDisregards SilosCool Toys © 2012 Pythian
  40. Mining Tools:• Machine Learning• Cluster Detection• Regression• Graph Analysis• Visualization © 2012 Pythian
  41. http://nicolasrapp.com/?p=1118 © 2012 Pythian
  42. http://www.orgnet.com/slumlords.html © 2012 Pythian
  43. Want to do more with your data?Don’t know where to start?No budget?No problem! © 2012 Pythian
  44. Sneak Hadoop to Your Business• Find an important business problem• Acquire data (be sneaky!)• Get the tools: R, Hadoop, Tableau• Laptops, desktops, test servers• Analyze data• Make pretty charts• Get business used to it• Wait for an Outage• PROFIT! © 2012 Pythian
  45. Oracle Big DataThe “ETL Machine”
  46. Hardware18 servers216 cores864G RAM648T disksInfiniband © 2012 Pythian
  47. SoftwareOracle NoSQLCloudera Hadoop DistributionOracle Loader for HadoopData Integrator for HadoopDirect Connector for HadoopOracle Connector for R © 2012 Pythian
  48. Cores, Storage, Infiniband and SoftwareMakes Oracle Big DataThe Ultimate ETL Machine © 2012 Pythian
  49. Thank you & Q&A To contact us… sales@pythian.com 1-866-PYTHIAN To follow us… http://www.pythian.com/news/ http://www.facebook.com/pages/The-Pythian-Group/ http://twitter.com/pythian http://www.linkedin.com/company/pythian49 © 2012 Pythian

×