Big data in the enterprise: When to use what?

1,468 views
1,324 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,468
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
73
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Big data in the enterprise: When to use what?

  1. 1. Big Data in the Enterprise. When to Use What? Jesus Rodriguez, Tellago, KidoZen, Inc
  2. 2. Agenda • Big Data principles • The Hadoop ecosystem • Other big data technologies
  3. 3. About Me • Co-Founder Tellago, Inc • Co-Founder KidoZen, Inc • Microsoft MVP • Architect Advisor • Investor • Speaker, Author • http://jrodthoughts.com • http://weblogs.asp.net/gsusx • http://kidozen.com
  4. 4. About Tellago • Application development firm focused on big enterprise trends (launched 2008) • Enterprise mobility, cloud computing, augmented reality, modern BI & big data • Advisor to software companies such as Microsoft or Oracle • American Business Awards(2011) “Best Overall Company of the Year < 100” • American Business Awards(2012) Silver: “Best Computer Services Company of the Year < 100”, Silver: Best Computer Services Executive of the Year • Inc 500 (114) & other industry awards
  5. 5. Some Housekeeping Rules • Tellago Technology Updates focused on modern enterprise software trends • Real world stories • No sales pitch • Leverage GTW to ask questions
  6. 6. We Love Data!
  7. 7. Where all Started?
  8. 8. CAP Theorem
  9. 9. Big Data Opportunity
  10. 10. The Landscape
  11. 11. Or a Bit More Crowded
  12. 12. Or Worse
  13. 13. Hadoop Led the Way
  14. 14. Hadoop Design Principles • System Shall Manage and Heal Itself • Performance Shall Scale Linearly • Compute Shall Move to Data • Simple Core, Modular and Extensible
  15. 15. The Solution: HDFS + Map Reduce
  16. 16. Mapping
  17. 17. Hadoop Ecosystem HDFS (Hadoop Distributed File System) HBase (key-value store) MapReduce (Job Scheduling/Execution System) Pig (Data Flow) Hive (SQL) BI ReportingETL Tools Avro(Serialization) Zookeepr(Coordination) Sqoop RDBMS (Streaming/Pipes APIs)
  18. 18. Reducing
  19. 19. HDFS
  20. 20. Map-Reduce
  21. 21. Relational vs. Hadoop
  22. 22. The Hadoop Ecosystem
  23. 23. WebHDFS
  24. 24. Sqoop
  25. 25. Flume
  26. 26. HBase
  27. 27. Pig
  28. 28. HCatalog
  29. 29. Hive
  30. 30. Ambari
  31. 31. Oozie
  32. 32. Zookeeper
  33. 33. Hadoop Enterprise Architecture
  34. 34. Hadoop is not a silver bullet...
  35. 35. Some Challenges • Hadoop doesn’t power big data applications • Not a transactional datastore. Slosh back and forth via ETL • Processing latency • Non-incremental, must re-slurp entire dataset every pass • Ad-Hoc queries • Bare metal interface, data import • Graphs • Only a handful of graph problems amenable to MR
  36. 36. Beyond Hadoop • Percolator(incremental processing) http://research.google.com/pubs/pub36726.html • Dremel(ad-hoc analysis queries) http://research.google.com/pubs/pub36632.html • Pregel (Big graphs) http://dl.acm.org/citation.cfm?id=1807184
  37. 37. Important Big Data Technologies in the Enterprise
  38. 38. Real Time Analytics
  39. 39. Real Time Analytics • Storm • Hstreaming • StreamBase • IBM Streams • Microsoft StreamInsight
  40. 40. MPP: Massively Parallel Processing
  41. 41. MPP Columnar Stores • Oracle Exadata • IBM Netezza • Teradata • EMC Greenplum • HP Vertica • ParAccel • Microsoft SQL Server PDW
  42. 42. Oracle & Big Data
  43. 43. Microsoft & Big Data
  44. 44. NoSQL DBs
  45. 45. NoSQL DBs
  46. 46. NewSQL DBs
  47. 47. New SQL / Cloud DB • VoltDB • NimbusDB • SimpleDB • NuoDB • Clustrix • Totutek
  48. 48. Traditional BI Suites
  49. 49. New SQL / Cloud DB • Hadoop Support In: • Microsoft SSIS • Informatica Datastage • Talend • Pentaho • Microstrategy , SaaS • Tableau, Qlikview
  50. 50. Big Data & Cloud
  51. 51. Big Data & Cloud • Hadoop distributions (AWS, Microsoft HDInsight, Cloud Foundry) • Data marketplaces (Factual, Infochimps) • Data visualization (WibiData) • NOSQL as a Service (MongoHQ)
  52. 52. If you are interested on evaluating Big Data in your organization
  53. 53. Tellago Big Data Strategy Session • 1 day strategy session • Start with a real world scenario • Explore various big data technology vendors • Present a potential technology roadmap • Free • Emails us at info@tellago.com
  54. 54. Summary • The big data ecosystem is super crowded • Hadoop distributions are leading the way in the enterprise • Complementary technologies include: • NOSQL • New SQL • MPP • Data Visualization
  55. 55. Thanks jesus.rodriguez@tellago.com http://www.tellagostudios.com http://jrodthoughts.com http://twitter.com/#!/jrodthoughts http://weblogs.asp.net/gsusx

×