Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What is the past future tense of data?

2,283 views

Published on

This is a vision pitch about where big data is going and why.

Published in: Technology
  • Be the first to comment

What is the past future tense of data?

  1. 1. The Shape of Data to Come it isn’t what we thought it was ©MapR Technologies - Confidential 1
  2. 2. Do you remember the future? ©MapR Technologies - Confidential 2
  3. 3. ©MapR Technologies - Confidential 3
  4. 4. Some things turned out as expected ©MapR Technologies - Confidential 4
  5. 5. Guys wearing Fedoras ©MapR Technologies - Confidential 5
  6. 6. What about “Big Data”? ©MapR Technologies - Confidential 6
  7. 7. Harvard University 6 will have 200 x 10 volumes by 2040 Fremont Rider, 1944 ©MapR Technologies - Confidential 7
  8. 8. To cope … only short papers should be published. … not more than 2500 characters counting “space,” punctuation marks, etc. Gray and Ruston in IEEE Transactions on Electronic Computers, 1964 ©MapR Technologies - Confidential 8
  9. 9. Remember the guy in the Fedora? ©MapR Technologies - Confidential 9
  10. 10. He’s tweeting about this right now ©MapR Technologies - Confidential 10
  11. 11. So what is the big data monorail and what is the cool hat? ©MapR Technologies - Confidential 11
  12. 12. Data curation Rigid Schemas Engineered Structure ©MapR Technologies - Confidential 12
  13. 13. Data curation Rigid Schemas Engineered Structure ©MapR Technologies - Confidential 13
  14. 14. Data as-you-find-it Flexible schemas Late binding ©MapR Technologies - Confidential 14
  15. 15. Data as-you-find-it Flexible schemas Late binding ©MapR Technologies - Confidential 15
  16. 16. ©MapR Technologies - Confidential 16
  17. 17. ©MapR Technologies - Confidential 17
  18. 18. ©MapR Technologies - Confidential 18
  19. 19. ©MapR Technologies - Confidential 19
  20. 20. Why is it different? How does it work? ©MapR Technologies - Confidential 20
  21. 21. The Conventional Answer More data is being produced more quickly Data sizes are bigger than even a very large computer can hold Cost to create and store continues to decrease ©MapR Technologies - Confidential 21
  22. 22. Analytics Scaling Laws  Analytics scaling is all about the 80-20 rule – –  The key to net value is how costs scale – –  Big gains for little initial effort Rapidly diminishing returns Old school – exponential scaling Big data – linear scaling, low constant Cost/performance has changed radically – IF you can use many commodity boxes ©MapR Technologies - Confidential 22
  23. 23. Which bytes first? ©MapR Technologies - Confidential 23
  24. 24. ©MapR Technologies - Confidential 24
  25. 25. 1 Value 0.75 0.5 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 25 1500 2,000
  26. 26. 1 Value 0.75 Net value optimum has a sharp peak well before maximum effort 0.5 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 26 1500 2,000
  27. 27. But scaling laws are changing both slope and shape ©MapR Technologies - Confidential 27
  28. 28. 1 Value 0.75 0.5 More than just a little 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 28 1500 2,000
  29. 29. 1 Value 0.75 0.5 They are changing a LOT! 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 29 1500 2,000
  30. 30. ©MapR Technologies - Confidential 30
  31. 31. ©MapR Technologies - Confidential 31
  32. 32. 1 Value 0.75 0.5 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 32 1500 2,000
  33. 33. 1 Value 0.75 0.5 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 33 1500 2,000
  34. 34. 1 0.75 Value A tipping point is reached and things change radically … 0.5 Initially, linear cost scaling actually makes things worse 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 34 1500 2,000
  35. 35. Evolution of Data Storage Scalability Over decades of progress, Unix-based systems have set the standard for compatibility and functionality Linux POSIX Functionality Compatibility ©MapR Technologies - Confidential 35
  36. 36. Evolution of Data Storage Scalability Hadoop achieves much higher Hadoop scalability by trading away essentially all of this compatibility Linux POSIX Functionality Compatibility ©MapR Technologies - Confidential 36
  37. 37. Evolution of Data Storage Scalability Hadoop MapR enhances Apache Hadoop by restoring the compatibility while increasing scalability and performance Linux POSIX Functionality Compatibility ©MapR Technologies - Confidential 37
  38. 38. Introducing MapR MapR offers the technology leading distribution for Hadoop ©MapR Technologies - Confidential 38
  39. 39. The Industry-Leaders Choose MapR in the Cloud Google chose MapR to provide Hadoop on Google Compute Engine Amazon EMR is the largest Hadoop provider in revenue and # of clusters ©MapR Technologies - Confidential 39
  40. 40. MapR Supports Broad Set of Use Cases Leading Retailer Leading Bank    Recommendation Engine Fraud detection and Prevention  Customer Behavior Analysis Brand Monitoring   Customer targeting Viewer Behavioral analytics    Intrusion detection & prevention Forensic analysis  Recommendation Engine Family tree connections    Patient care monitoring    Log analysis HBase  Clickstream Analysis Quality profiling/field failure analysis    Fraud Detection Channel analytics   Customer Revenue Analytics ETL Offload ©MapR Technologies - Confidential  Advertising exchange analysis and optimization  Customer targeting Social media analysis  40   Global threat analytics Virus analysis Customer Sentiment Network Analytics Monitors and measures behavior of online shoppers
  41. 41. MapR MapR The guys with the cool hats ©MapR Technologies - Confidential 41
  42. 42. MapR’s Innovations ©MapR Technologies - Confidential 42
  43. 43. Seamless integration with existing applications  100% POSIX compliant  Industry standard APIs - NFS, ODBC, LDAP, REST  More 3rd party solutions  Proprietary connectors unnecessary  Language neutral ©MapR Technologies - Confidential 43
  44. 44. MapR’s Innovations ©MapR Technologies - Confidential 44
  45. 45. MapR: Lights Out Data Center Ready Reliable Compute Dependable Storage  Automated stateful failover   Automated re-replication   Self-healing from HW and SW failures   Load balancing  Rolling upgrades  No lost jobs or data  99999’s of uptime ©MapR Technologies - Confidential   45 End-to-end checksums Strong consistency Business continuity with snapshots and mirrors Recover to a point in time with snapshots Mirror across sites for disaster recovery
  46. 46. MapR’s Innovations ©MapR Technologies - Confidential 46
  47. 47. Why MapR Is Faster Lockless Storage Service™ Direct Block Device IO Hadoop Direct Shuffle • Eliminates storage contention • Provides throughput at device speed • Exploits MapR-FS architecture to deliver performance using Hadoop Direct Shuffle Client Side Compression • Reduces network overhead using automatic compression C vs Java • Eliminates sporadic Java garbage collection overhead (system written in C) ©MapR Technologies - Confidential 47
  48. 48. Security  MapR is pushing the envelope on Hadoop security  Integrates with Linux security (PAM) –  Strong wire-level authentication and encryption –  Works with any user directory: Active Directory, LDAP, NIS, … Kerberos and non-Kerberos options Fine-grained access control – – – – Full POSIX permissions on files and directories ACLs on tables, column families, columns, cells ACLs on MapReduce jobs and queues Administration ACLs on cluster and volumes ©MapR Technologies - Confidential 48
  49. 49. Bullet-proof NoSQL with Zero Administration Performance Reliability Easy Administration Benefit Features High Performance Over 1 Million ops/sec with 10 Node Cluster Continuous Low Latency No I/O Storms, No Compactions 24x7 Applications Instant Recovery, Online Schema Modification, Snapshots, Mirroring Zero Administration No Processes to Manage, Automated Splits, Self-tuning High Scalability 1 Trillion Tables Low TCO Files and Tables on One Platform ©MapR Technologies - Confidential 49
  50. 50. MapR M7 vs. CDH – Mixed Load (50-50) ©MapR Technologies - Confidential 50
  51. 51. MapR M7 vs. CDH – Mixed Load (50-50) ©MapR Technologies - Confidential 51
  52. 52. MapR MapR The guys with the cool solutions ©MapR Technologies - Confidential 52
  53. 53. MapR MapR The future of the future ©MapR Technologies - Confidential 53
  54. 54. Thank You ©MapR Technologies - Confidential 54

×