What is the past future tense of data?

2,066 views
1,940 views

Published on

This is a vision pitch about where big data is going and why.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,066
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
55
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • The different kinds of scaling laws have different shape and I think that shape is the key.
  • The value of analytics always increases with more data, but the rate of increase drops dramatically after an initial quick increase.
  • In classical analytics, the cost of doing analytics increases sharply.
  • The result is a net value that has a sharp optimum in the area where value is increasing rapidly and cost is not yet increasing so rapidly.
  • New techniques such as Hadoop result in linear scaling of cost. This is a change in shape and it causes a qualitative change in the way that costs trade off against value to give net value. As technology improves, the slope of this cost line is also changing rapidly over time.
  • This next sequence shows how the net value changes with different slope linear cost models.
  • Notice how the best net value has jumped up significantly
  • And as the line approaches horizontal, the highest net value occurs at dramatically larger data scale.
  • MapR has been selected by two of the companies most experienced with MapReduce technology which is a testament to the technology advanges of MapR’s distribution. Amazon through its Elastic MapReduce service (EMR) hosted over 2 million clusters in the past year. Amazon selected MapR to complement EMR as the only commercial Hadoop distribution being offered, sold and supported as a service by Amazon to its customers. MapR was also selected by Google – the pioneer of MapReduce and the company whose white paper on MapReduce inspired the creation of Hadoop – has also selected MapR to make our distribution available on Google Compute Engine. Hadoop in the cloud makes a great deal of sense: the elastic resource allocation that cloud computing is premised on works well for cluster-based data processing infrastructure used on varying analyses and data sets of indeterminate size. MapR has unique features such as mirroring between sites and multi-tenancy support that further enhance cloud deployments
  • MapR is used today across industries. We have 10 of the Fortune 100 that are using MapR in production. We have leading web 2.0 properties such as leading digital advertising platforms, using MapR.These customers are using MapR in production for a variety of use cases. Examples include one of the largest credit card issuers in the world that has standardized on MapR for fraud and consumer targeting applications.Other examples include a major health care group,national cyber security, and one of the largest retailers in the world. These are all provided by MapR’s complete distribution for Apache Hadoop
  • MapR enables integration by providing industry-standard interfacesMore 3rd party solutions work with MapR than any other distributionProprietary connectors not neededNFSAll file-based applications can read and write dataExamples: Linux utilities, file browsers, Informatica UltraMessagingODBC 3.52All BI applications can leverage HiveExamples: Excel, Crystal Reports, Tableau, MicroStrategyLinux PAMAny authentication provider can be usedExamples: LDAP, Kerberos, 3rd party
  • With MapR Hadoop is Lights out Data Center ReadyMapR provides 5 99999’s of availability including support for rolling upgrades, self –healing and automated stateful failover. MapR is the only distribution that provides these capabilities, MapR also provides dependable data storage with full data protection and business continuity features. MapR provides point in time recovery to protect against application and user errors. There is end to end check summing so data corruption is automatically detected and corrected with MapR’s self healing capabilities. Mirroring across sites is fully supported.All these features support lights out data center operations. Every two weeks an administrator can take a MapR report and a shopping cart full of drives and replace failed drives.
  • What is the past future tense of data?

    1. 1. The Shape of Data to Come it isn’t what we thought it was ©MapR Technologies - Confidential 1
    2. 2. Do you remember the future? ©MapR Technologies - Confidential 2
    3. 3. ©MapR Technologies - Confidential 3
    4. 4. Some things turned out as expected ©MapR Technologies - Confidential 4
    5. 5. Guys wearing Fedoras ©MapR Technologies - Confidential 5
    6. 6. What about “Big Data”? ©MapR Technologies - Confidential 6
    7. 7. Harvard University 6 will have 200 x 10 volumes by 2040 Fremont Rider, 1944 ©MapR Technologies - Confidential 7
    8. 8. To cope … only short papers should be published. … not more than 2500 characters counting “space,” punctuation marks, etc. Gray and Ruston in IEEE Transactions on Electronic Computers, 1964 ©MapR Technologies - Confidential 8
    9. 9. Remember the guy in the Fedora? ©MapR Technologies - Confidential 9
    10. 10. He’s tweeting about this right now ©MapR Technologies - Confidential 10
    11. 11. So what is the big data monorail and what is the cool hat? ©MapR Technologies - Confidential 11
    12. 12. Data curation Rigid Schemas Engineered Structure ©MapR Technologies - Confidential 12
    13. 13. Data curation Rigid Schemas Engineered Structure ©MapR Technologies - Confidential 13
    14. 14. Data as-you-find-it Flexible schemas Late binding ©MapR Technologies - Confidential 14
    15. 15. Data as-you-find-it Flexible schemas Late binding ©MapR Technologies - Confidential 15
    16. 16. ©MapR Technologies - Confidential 16
    17. 17. ©MapR Technologies - Confidential 17
    18. 18. ©MapR Technologies - Confidential 18
    19. 19. ©MapR Technologies - Confidential 19
    20. 20. Why is it different? How does it work? ©MapR Technologies - Confidential 20
    21. 21. The Conventional Answer More data is being produced more quickly Data sizes are bigger than even a very large computer can hold Cost to create and store continues to decrease ©MapR Technologies - Confidential 21
    22. 22. Analytics Scaling Laws  Analytics scaling is all about the 80-20 rule – –  The key to net value is how costs scale – –  Big gains for little initial effort Rapidly diminishing returns Old school – exponential scaling Big data – linear scaling, low constant Cost/performance has changed radically – IF you can use many commodity boxes ©MapR Technologies - Confidential 22
    23. 23. Which bytes first? ©MapR Technologies - Confidential 23
    24. 24. ©MapR Technologies - Confidential 24
    25. 25. 1 Value 0.75 0.5 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 25 1500 2,000
    26. 26. 1 Value 0.75 Net value optimum has a sharp peak well before maximum effort 0.5 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 26 1500 2,000
    27. 27. But scaling laws are changing both slope and shape ©MapR Technologies - Confidential 27
    28. 28. 1 Value 0.75 0.5 More than just a little 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 28 1500 2,000
    29. 29. 1 Value 0.75 0.5 They are changing a LOT! 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 29 1500 2,000
    30. 30. ©MapR Technologies - Confidential 30
    31. 31. ©MapR Technologies - Confidential 31
    32. 32. 1 Value 0.75 0.5 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 32 1500 2,000
    33. 33. 1 Value 0.75 0.5 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 33 1500 2,000
    34. 34. 1 0.75 Value A tipping point is reached and things change radically … 0.5 Initially, linear cost scaling actually makes things worse 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 34 1500 2,000
    35. 35. Evolution of Data Storage Scalability Over decades of progress, Unix-based systems have set the standard for compatibility and functionality Linux POSIX Functionality Compatibility ©MapR Technologies - Confidential 35
    36. 36. Evolution of Data Storage Scalability Hadoop achieves much higher Hadoop scalability by trading away essentially all of this compatibility Linux POSIX Functionality Compatibility ©MapR Technologies - Confidential 36
    37. 37. Evolution of Data Storage Scalability Hadoop MapR enhances Apache Hadoop by restoring the compatibility while increasing scalability and performance Linux POSIX Functionality Compatibility ©MapR Technologies - Confidential 37
    38. 38. Introducing MapR MapR offers the technology leading distribution for Hadoop ©MapR Technologies - Confidential 38
    39. 39. The Industry-Leaders Choose MapR in the Cloud Google chose MapR to provide Hadoop on Google Compute Engine Amazon EMR is the largest Hadoop provider in revenue and # of clusters ©MapR Technologies - Confidential 39
    40. 40. MapR Supports Broad Set of Use Cases Leading Retailer Leading Bank    Recommendation Engine Fraud detection and Prevention  Customer Behavior Analysis Brand Monitoring   Customer targeting Viewer Behavioral analytics    Intrusion detection & prevention Forensic analysis  Recommendation Engine Family tree connections    Patient care monitoring    Log analysis HBase  Clickstream Analysis Quality profiling/field failure analysis    Fraud Detection Channel analytics   Customer Revenue Analytics ETL Offload ©MapR Technologies - Confidential  Advertising exchange analysis and optimization  Customer targeting Social media analysis  40   Global threat analytics Virus analysis Customer Sentiment Network Analytics Monitors and measures behavior of online shoppers
    41. 41. MapR MapR The guys with the cool hats ©MapR Technologies - Confidential 41
    42. 42. MapR’s Innovations ©MapR Technologies - Confidential 42
    43. 43. Seamless integration with existing applications  100% POSIX compliant  Industry standard APIs - NFS, ODBC, LDAP, REST  More 3rd party solutions  Proprietary connectors unnecessary  Language neutral ©MapR Technologies - Confidential 43
    44. 44. MapR’s Innovations ©MapR Technologies - Confidential 44
    45. 45. MapR: Lights Out Data Center Ready Reliable Compute Dependable Storage  Automated stateful failover   Automated re-replication   Self-healing from HW and SW failures   Load balancing  Rolling upgrades  No lost jobs or data  99999’s of uptime ©MapR Technologies - Confidential   45 End-to-end checksums Strong consistency Business continuity with snapshots and mirrors Recover to a point in time with snapshots Mirror across sites for disaster recovery
    46. 46. MapR’s Innovations ©MapR Technologies - Confidential 46
    47. 47. Why MapR Is Faster Lockless Storage Service™ Direct Block Device IO Hadoop Direct Shuffle • Eliminates storage contention • Provides throughput at device speed • Exploits MapR-FS architecture to deliver performance using Hadoop Direct Shuffle Client Side Compression • Reduces network overhead using automatic compression C vs Java • Eliminates sporadic Java garbage collection overhead (system written in C) ©MapR Technologies - Confidential 47
    48. 48. Security  MapR is pushing the envelope on Hadoop security  Integrates with Linux security (PAM) –  Strong wire-level authentication and encryption –  Works with any user directory: Active Directory, LDAP, NIS, … Kerberos and non-Kerberos options Fine-grained access control – – – – Full POSIX permissions on files and directories ACLs on tables, column families, columns, cells ACLs on MapReduce jobs and queues Administration ACLs on cluster and volumes ©MapR Technologies - Confidential 48
    49. 49. Bullet-proof NoSQL with Zero Administration Performance Reliability Easy Administration Benefit Features High Performance Over 1 Million ops/sec with 10 Node Cluster Continuous Low Latency No I/O Storms, No Compactions 24x7 Applications Instant Recovery, Online Schema Modification, Snapshots, Mirroring Zero Administration No Processes to Manage, Automated Splits, Self-tuning High Scalability 1 Trillion Tables Low TCO Files and Tables on One Platform ©MapR Technologies - Confidential 49
    50. 50. MapR M7 vs. CDH – Mixed Load (50-50) ©MapR Technologies - Confidential 50
    51. 51. MapR M7 vs. CDH – Mixed Load (50-50) ©MapR Technologies - Confidential 51
    52. 52. MapR MapR The guys with the cool solutions ©MapR Technologies - Confidential 52
    53. 53. MapR MapR The future of the future ©MapR Technologies - Confidential 53
    54. 54. Thank You ©MapR Technologies - Confidential 54

    ×