Big data in the enterprise: When to use what?

  • 1,029 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,029
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
67
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Big Data in the Enterprise. When to Use What? Jesus Rodriguez, Tellago, KidoZen, Inc
  • 2. Agenda • Big Data principles • The Hadoop ecosystem • Other big data technologies
  • 3. About Me • Co-Founder Tellago, Inc • Co-Founder KidoZen, Inc • Microsoft MVP • Architect Advisor • Investor • Speaker, Author • http://jrodthoughts.com • http://weblogs.asp.net/gsusx • http://kidozen.com
  • 4. About Tellago • Application development firm focused on big enterprise trends (launched 2008) • Enterprise mobility, cloud computing, augmented reality, modern BI & big data • Advisor to software companies such as Microsoft or Oracle • American Business Awards(2011) “Best Overall Company of the Year < 100” • American Business Awards(2012) Silver: “Best Computer Services Company of the Year < 100”, Silver: Best Computer Services Executive of the Year • Inc 500 (114) & other industry awards
  • 5. Some Housekeeping Rules • Tellago Technology Updates focused on modern enterprise software trends • Real world stories • No sales pitch • Leverage GTW to ask questions
  • 6. We Love Data!
  • 7. Where all Started?
  • 8. CAP Theorem
  • 9. Big Data Opportunity
  • 10. The Landscape
  • 11. Or a Bit More Crowded
  • 12. Or Worse
  • 13. Hadoop Led the Way
  • 14. Hadoop Design Principles • System Shall Manage and Heal Itself • Performance Shall Scale Linearly • Compute Shall Move to Data • Simple Core, Modular and Extensible
  • 15. The Solution: HDFS + Map Reduce
  • 16. Mapping
  • 17. Hadoop Ecosystem HDFS (Hadoop Distributed File System) HBase (key-value store) MapReduce (Job Scheduling/Execution System) Pig (Data Flow) Hive (SQL) BI ReportingETL Tools Avro(Serialization) Zookeepr(Coordination) Sqoop RDBMS (Streaming/Pipes APIs)
  • 18. Reducing
  • 19. HDFS
  • 20. Map-Reduce
  • 21. Relational vs. Hadoop
  • 22. The Hadoop Ecosystem
  • 23. WebHDFS
  • 24. Sqoop
  • 25. Flume
  • 26. HBase
  • 27. Pig
  • 28. HCatalog
  • 29. Hive
  • 30. Ambari
  • 31. Oozie
  • 32. Zookeeper
  • 33. Hadoop Enterprise Architecture
  • 34. Hadoop is not a silver bullet...
  • 35. Some Challenges • Hadoop doesn’t power big data applications • Not a transactional datastore. Slosh back and forth via ETL • Processing latency • Non-incremental, must re-slurp entire dataset every pass • Ad-Hoc queries • Bare metal interface, data import • Graphs • Only a handful of graph problems amenable to MR
  • 36. Beyond Hadoop • Percolator(incremental processing) http://research.google.com/pubs/pub36726.html • Dremel(ad-hoc analysis queries) http://research.google.com/pubs/pub36632.html • Pregel (Big graphs) http://dl.acm.org/citation.cfm?id=1807184
  • 37. Important Big Data Technologies in the Enterprise
  • 38. Real Time Analytics
  • 39. Real Time Analytics • Storm • Hstreaming • StreamBase • IBM Streams • Microsoft StreamInsight
  • 40. MPP: Massively Parallel Processing
  • 41. MPP Columnar Stores • Oracle Exadata • IBM Netezza • Teradata • EMC Greenplum • HP Vertica • ParAccel • Microsoft SQL Server PDW
  • 42. Oracle & Big Data
  • 43. Microsoft & Big Data
  • 44. NoSQL DBs
  • 45. NoSQL DBs
  • 46. NewSQL DBs
  • 47. New SQL / Cloud DB • VoltDB • NimbusDB • SimpleDB • NuoDB • Clustrix • Totutek
  • 48. Traditional BI Suites
  • 49. New SQL / Cloud DB • Hadoop Support In: • Microsoft SSIS • Informatica Datastage • Talend • Pentaho • Microstrategy , SaaS • Tableau, Qlikview
  • 50. Big Data & Cloud
  • 51. Big Data & Cloud • Hadoop distributions (AWS, Microsoft HDInsight, Cloud Foundry) • Data marketplaces (Factual, Infochimps) • Data visualization (WibiData) • NOSQL as a Service (MongoHQ)
  • 52. If you are interested on evaluating Big Data in your organization
  • 53. Tellago Big Data Strategy Session • 1 day strategy session • Start with a real world scenario • Explore various big data technology vendors • Present a potential technology roadmap • Free • Emails us at info@tellago.com
  • 54. Summary • The big data ecosystem is super crowded • Hadoop distributions are leading the way in the enterprise • Complementary technologies include: • NOSQL • New SQL • MPP • Data Visualization
  • 55. Thanks jesus.rodriguez@tellago.com http://www.tellagostudios.com http://jrodthoughts.com http://twitter.com/#!/jrodthoughts http://weblogs.asp.net/gsusx