Your SlideShare is downloading. ×
0
Big Data in the Enterprise.
When to Use What?
Jesus Rodriguez, Tellago, KidoZen, Inc
Agenda
• Big Data principles
• The Hadoop ecosystem
• Other big data technologies
About Me
• Co-Founder Tellago, Inc
• Co-Founder KidoZen, Inc
• Microsoft MVP
• Architect Advisor
• Investor
• Speaker, Aut...
About Tellago
• Application development firm focused on big enterprise trends (launched
2008)
• Enterprise mobility, cloud...
Some Housekeeping Rules
• Tellago Technology Updates focused on modern enterprise software
trends
• Real world stories
• N...
We Love Data!
Where all Started?
CAP Theorem
Big Data Opportunity
The Landscape
Or a Bit More Crowded
Or Worse
Hadoop Led the Way
Hadoop Design Principles
• System Shall Manage and Heal Itself
• Performance Shall Scale Linearly
• Compute Shall Move to ...
The Solution: HDFS + Map Reduce
Mapping
Hadoop Ecosystem
HDFS
(Hadoop Distributed File System)
HBase (key-value store)
MapReduce (Job Scheduling/Execution System)...
Reducing
HDFS
Map-Reduce
Relational vs. Hadoop
The Hadoop Ecosystem
WebHDFS
Sqoop
Flume
HBase
Pig
HCatalog
Hive
Ambari
Oozie
Zookeeper
Hadoop Enterprise Architecture
Hadoop is not a silver bullet...
Some Challenges
• Hadoop doesn’t power big data applications
• Not a transactional datastore. Slosh back and forth via ETL...
Beyond Hadoop
• Percolator(incremental processing)
http://research.google.com/pubs/pub36726.html
• Dremel(ad-hoc analysis ...
Important Big Data Technologies in the
Enterprise
Real Time Analytics
Real Time Analytics
• Storm
• Hstreaming
• StreamBase
• IBM Streams
• Microsoft StreamInsight
MPP: Massively Parallel Processing
MPP Columnar Stores
• Oracle Exadata
• IBM Netezza
• Teradata
• EMC Greenplum
• HP Vertica
• ParAccel
• Microsoft SQL Serv...
Oracle & Big Data
Microsoft & Big Data
NoSQL DBs
NoSQL DBs
NewSQL DBs
New SQL / Cloud DB
• VoltDB
• NimbusDB
• SimpleDB
• NuoDB
• Clustrix
• Totutek
Traditional BI Suites
New SQL / Cloud DB
• Hadoop Support In:
• Microsoft SSIS
• Informatica Datastage
• Talend
• Pentaho
• Microstrategy , SaaS...
Big Data & Cloud
Big Data & Cloud
• Hadoop distributions (AWS, Microsoft HDInsight, Cloud Foundry)
• Data marketplaces (Factual, Infochimps...
If you are interested on evaluating Big
Data in your organization
Tellago Big Data Strategy Session
• 1 day strategy session
• Start with a real world scenario
• Explore various big data t...
Summary
• The big data ecosystem is super crowded
• Hadoop distributions are leading the way in the enterprise
• Complemen...
Thanks
jesus.rodriguez@tellago.com
http://www.tellagostudios.com
http://jrodthoughts.com
http://twitter.com/#!/jrodthought...
Upcoming SlideShare
Loading in...5
×

Big data in the enterprise: When to use what?

1,124

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,124
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
71
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Big data in the enterprise: When to use what?"

  1. 1. Big Data in the Enterprise. When to Use What? Jesus Rodriguez, Tellago, KidoZen, Inc
  2. 2. Agenda • Big Data principles • The Hadoop ecosystem • Other big data technologies
  3. 3. About Me • Co-Founder Tellago, Inc • Co-Founder KidoZen, Inc • Microsoft MVP • Architect Advisor • Investor • Speaker, Author • http://jrodthoughts.com • http://weblogs.asp.net/gsusx • http://kidozen.com
  4. 4. About Tellago • Application development firm focused on big enterprise trends (launched 2008) • Enterprise mobility, cloud computing, augmented reality, modern BI & big data • Advisor to software companies such as Microsoft or Oracle • American Business Awards(2011) “Best Overall Company of the Year < 100” • American Business Awards(2012) Silver: “Best Computer Services Company of the Year < 100”, Silver: Best Computer Services Executive of the Year • Inc 500 (114) & other industry awards
  5. 5. Some Housekeeping Rules • Tellago Technology Updates focused on modern enterprise software trends • Real world stories • No sales pitch • Leverage GTW to ask questions
  6. 6. We Love Data!
  7. 7. Where all Started?
  8. 8. CAP Theorem
  9. 9. Big Data Opportunity
  10. 10. The Landscape
  11. 11. Or a Bit More Crowded
  12. 12. Or Worse
  13. 13. Hadoop Led the Way
  14. 14. Hadoop Design Principles • System Shall Manage and Heal Itself • Performance Shall Scale Linearly • Compute Shall Move to Data • Simple Core, Modular and Extensible
  15. 15. The Solution: HDFS + Map Reduce
  16. 16. Mapping
  17. 17. Hadoop Ecosystem HDFS (Hadoop Distributed File System) HBase (key-value store) MapReduce (Job Scheduling/Execution System) Pig (Data Flow) Hive (SQL) BI ReportingETL Tools Avro(Serialization) Zookeepr(Coordination) Sqoop RDBMS (Streaming/Pipes APIs)
  18. 18. Reducing
  19. 19. HDFS
  20. 20. Map-Reduce
  21. 21. Relational vs. Hadoop
  22. 22. The Hadoop Ecosystem
  23. 23. WebHDFS
  24. 24. Sqoop
  25. 25. Flume
  26. 26. HBase
  27. 27. Pig
  28. 28. HCatalog
  29. 29. Hive
  30. 30. Ambari
  31. 31. Oozie
  32. 32. Zookeeper
  33. 33. Hadoop Enterprise Architecture
  34. 34. Hadoop is not a silver bullet...
  35. 35. Some Challenges • Hadoop doesn’t power big data applications • Not a transactional datastore. Slosh back and forth via ETL • Processing latency • Non-incremental, must re-slurp entire dataset every pass • Ad-Hoc queries • Bare metal interface, data import • Graphs • Only a handful of graph problems amenable to MR
  36. 36. Beyond Hadoop • Percolator(incremental processing) http://research.google.com/pubs/pub36726.html • Dremel(ad-hoc analysis queries) http://research.google.com/pubs/pub36632.html • Pregel (Big graphs) http://dl.acm.org/citation.cfm?id=1807184
  37. 37. Important Big Data Technologies in the Enterprise
  38. 38. Real Time Analytics
  39. 39. Real Time Analytics • Storm • Hstreaming • StreamBase • IBM Streams • Microsoft StreamInsight
  40. 40. MPP: Massively Parallel Processing
  41. 41. MPP Columnar Stores • Oracle Exadata • IBM Netezza • Teradata • EMC Greenplum • HP Vertica • ParAccel • Microsoft SQL Server PDW
  42. 42. Oracle & Big Data
  43. 43. Microsoft & Big Data
  44. 44. NoSQL DBs
  45. 45. NoSQL DBs
  46. 46. NewSQL DBs
  47. 47. New SQL / Cloud DB • VoltDB • NimbusDB • SimpleDB • NuoDB • Clustrix • Totutek
  48. 48. Traditional BI Suites
  49. 49. New SQL / Cloud DB • Hadoop Support In: • Microsoft SSIS • Informatica Datastage • Talend • Pentaho • Microstrategy , SaaS • Tableau, Qlikview
  50. 50. Big Data & Cloud
  51. 51. Big Data & Cloud • Hadoop distributions (AWS, Microsoft HDInsight, Cloud Foundry) • Data marketplaces (Factual, Infochimps) • Data visualization (WibiData) • NOSQL as a Service (MongoHQ)
  52. 52. If you are interested on evaluating Big Data in your organization
  53. 53. Tellago Big Data Strategy Session • 1 day strategy session • Start with a real world scenario • Explore various big data technology vendors • Present a potential technology roadmap • Free • Emails us at info@tellago.com
  54. 54. Summary • The big data ecosystem is super crowded • Hadoop distributions are leading the way in the enterprise • Complementary technologies include: • NOSQL • New SQL • MPP • Data Visualization
  55. 55. Thanks jesus.rodriguez@tellago.com http://www.tellagostudios.com http://jrodthoughts.com http://twitter.com/#!/jrodthoughts http://weblogs.asp.net/gsusx
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×