What is the past future tense of data?
Upcoming SlideShare
Loading in...5
×
 

What is the past future tense of data?

on

  • 1,308 views

This is a vision pitch about where big data is going and why.

This is a vision pitch about where big data is going and why.

Statistics

Views

Total Views
1,308
Views on SlideShare
1,306
Embed Views
2

Actions

Likes
1
Downloads
39
Comments
0

2 Embeds 2

http://www.linkedin.com 1
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • The different kinds of scaling laws have different shape and I think that shape is the key.
  • The value of analytics always increases with more data, but the rate of increase drops dramatically after an initial quick increase.
  • In classical analytics, the cost of doing analytics increases sharply.
  • The result is a net value that has a sharp optimum in the area where value is increasing rapidly and cost is not yet increasing so rapidly.
  • New techniques such as Hadoop result in linear scaling of cost. This is a change in shape and it causes a qualitative change in the way that costs trade off against value to give net value. As technology improves, the slope of this cost line is also changing rapidly over time.
  • This next sequence shows how the net value changes with different slope linear cost models.
  • Notice how the best net value has jumped up significantly
  • And as the line approaches horizontal, the highest net value occurs at dramatically larger data scale.
  • MapR has been selected by two of the companies most experienced with MapReduce technology which is a testament to the technology advanges of MapR’s distribution. Amazon through its Elastic MapReduce service (EMR) hosted over 2 million clusters in the past year. Amazon selected MapR to complement EMR as the only commercial Hadoop distribution being offered, sold and supported as a service by Amazon to its customers. MapR was also selected by Google – the pioneer of MapReduce and the company whose white paper on MapReduce inspired the creation of Hadoop – has also selected MapR to make our distribution available on Google Compute Engine. Hadoop in the cloud makes a great deal of sense: the elastic resource allocation that cloud computing is premised on works well for cluster-based data processing infrastructure used on varying analyses and data sets of indeterminate size. MapR has unique features such as mirroring between sites and multi-tenancy support that further enhance cloud deployments
  • MapR is used today across industries. We have 10 of the Fortune 100 that are using MapR in production. We have leading web 2.0 properties such as leading digital advertising platforms, using MapR.These customers are using MapR in production for a variety of use cases. Examples include one of the largest credit card issuers in the world that has standardized on MapR for fraud and consumer targeting applications.Other examples include a major health care group,national cyber security, and one of the largest retailers in the world. These are all provided by MapR’s complete distribution for Apache Hadoop
  • MapR enables integration by providing industry-standard interfacesMore 3rd party solutions work with MapR than any other distributionProprietary connectors not neededNFSAll file-based applications can read and write dataExamples: Linux utilities, file browsers, Informatica UltraMessagingODBC 3.52All BI applications can leverage HiveExamples: Excel, Crystal Reports, Tableau, MicroStrategyLinux PAMAny authentication provider can be usedExamples: LDAP, Kerberos, 3rd party
  • With MapR Hadoop is Lights out Data Center ReadyMapR provides 5 99999’s of availability including support for rolling upgrades, self –healing and automated stateful failover. MapR is the only distribution that provides these capabilities, MapR also provides dependable data storage with full data protection and business continuity features. MapR provides point in time recovery to protect against application and user errors. There is end to end check summing so data corruption is automatically detected and corrected with MapR’s self healing capabilities. Mirroring across sites is fully supported.All these features support lights out data center operations. Every two weeks an administrator can take a MapR report and a shopping cart full of drives and replace failed drives.

What is the past future tense of data? What is the past future tense of data? Presentation Transcript

  • The Shape of Data to Come it isn’t what we thought it was ©MapR Technologies - Confidential 1
  • Do you remember the future? ©MapR Technologies - Confidential 2
  • ©MapR Technologies - Confidential 3
  • Some things turned out as expected ©MapR Technologies - Confidential 4
  • Guys wearing Fedoras ©MapR Technologies - Confidential 5
  • What about “Big Data”? ©MapR Technologies - Confidential 6
  • Harvard University 6 will have 200 x 10 volumes by 2040 Fremont Rider, 1944 ©MapR Technologies - Confidential 7
  • To cope … only short papers should be published. … not more than 2500 characters counting “space,” punctuation marks, etc. Gray and Ruston in IEEE Transactions on Electronic Computers, 1964 ©MapR Technologies - Confidential 8
  • Remember the guy in the Fedora? ©MapR Technologies - Confidential 9
  • He’s tweeting about this right now ©MapR Technologies - Confidential 10
  • So what is the big data monorail and what is the cool hat? ©MapR Technologies - Confidential 11
  • Data curation Rigid Schemas Engineered Structure ©MapR Technologies - Confidential 12
  • Data curation Rigid Schemas Engineered Structure ©MapR Technologies - Confidential 13
  • Data as-you-find-it Flexible schemas Late binding ©MapR Technologies - Confidential 14
  • Data as-you-find-it Flexible schemas Late binding ©MapR Technologies - Confidential 15
  • ©MapR Technologies - Confidential 16
  • ©MapR Technologies - Confidential 17
  • ©MapR Technologies - Confidential 18
  • ©MapR Technologies - Confidential 19
  • Why is it different? How does it work? ©MapR Technologies - Confidential 20
  • The Conventional Answer More data is being produced more quickly Data sizes are bigger than even a very large computer can hold Cost to create and store continues to decrease ©MapR Technologies - Confidential 21
  • Analytics Scaling Laws  Analytics scaling is all about the 80-20 rule – –  The key to net value is how costs scale – –  Big gains for little initial effort Rapidly diminishing returns Old school – exponential scaling Big data – linear scaling, low constant Cost/performance has changed radically – IF you can use many commodity boxes ©MapR Technologies - Confidential 22
  • Which bytes first? ©MapR Technologies - Confidential 23
  • ©MapR Technologies - Confidential 24
  • 1 Value 0.75 0.5 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 25 1500 2,000
  • 1 Value 0.75 Net value optimum has a sharp peak well before maximum effort 0.5 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 26 1500 2,000
  • But scaling laws are changing both slope and shape ©MapR Technologies - Confidential 27
  • 1 Value 0.75 0.5 More than just a little 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 28 1500 2,000
  • 1 Value 0.75 0.5 They are changing a LOT! 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 29 1500 2,000
  • ©MapR Technologies - Confidential 30
  • ©MapR Technologies - Confidential 31
  • 1 Value 0.75 0.5 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 32 1500 2,000
  • 1 Value 0.75 0.5 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 33 1500 2,000
  • 1 0.75 Value A tipping point is reached and things change radically … 0.5 Initially, linear cost scaling actually makes things worse 0.25 0 0 500 1000 Scale ©MapR Technologies - Confidential 34 1500 2,000
  • Evolution of Data Storage Scalability Over decades of progress, Unix-based systems have set the standard for compatibility and functionality Linux POSIX Functionality Compatibility ©MapR Technologies - Confidential 35
  • Evolution of Data Storage Scalability Hadoop achieves much higher Hadoop scalability by trading away essentially all of this compatibility Linux POSIX Functionality Compatibility ©MapR Technologies - Confidential 36
  • Evolution of Data Storage Scalability Hadoop MapR enhances Apache Hadoop by restoring the compatibility while increasing scalability and performance Linux POSIX Functionality Compatibility ©MapR Technologies - Confidential 37
  • Introducing MapR MapR offers the technology leading distribution for Hadoop ©MapR Technologies - Confidential 38
  • The Industry-Leaders Choose MapR in the Cloud Google chose MapR to provide Hadoop on Google Compute Engine Amazon EMR is the largest Hadoop provider in revenue and # of clusters ©MapR Technologies - Confidential 39
  • MapR Supports Broad Set of Use Cases Leading Retailer Leading Bank    Recommendation Engine Fraud detection and Prevention  Customer Behavior Analysis Brand Monitoring   Customer targeting Viewer Behavioral analytics    Intrusion detection & prevention Forensic analysis  Recommendation Engine Family tree connections    Patient care monitoring    Log analysis HBase  Clickstream Analysis Quality profiling/field failure analysis    Fraud Detection Channel analytics   Customer Revenue Analytics ETL Offload ©MapR Technologies - Confidential  Advertising exchange analysis and optimization  Customer targeting Social media analysis  40   Global threat analytics Virus analysis Customer Sentiment Network Analytics Monitors and measures behavior of online shoppers
  • MapR MapR The guys with the cool hats ©MapR Technologies - Confidential 41
  • MapR’s Innovations ©MapR Technologies - Confidential 42
  • Seamless integration with existing applications  100% POSIX compliant  Industry standard APIs - NFS, ODBC, LDAP, REST  More 3rd party solutions  Proprietary connectors unnecessary  Language neutral ©MapR Technologies - Confidential 43
  • MapR’s Innovations ©MapR Technologies - Confidential 44
  • MapR: Lights Out Data Center Ready Reliable Compute Dependable Storage  Automated stateful failover   Automated re-replication   Self-healing from HW and SW failures   Load balancing  Rolling upgrades  No lost jobs or data  99999’s of uptime ©MapR Technologies - Confidential   45 End-to-end checksums Strong consistency Business continuity with snapshots and mirrors Recover to a point in time with snapshots Mirror across sites for disaster recovery
  • MapR’s Innovations ©MapR Technologies - Confidential 46
  • Why MapR Is Faster Lockless Storage Service™ Direct Block Device IO Hadoop Direct Shuffle • Eliminates storage contention • Provides throughput at device speed • Exploits MapR-FS architecture to deliver performance using Hadoop Direct Shuffle Client Side Compression • Reduces network overhead using automatic compression C vs Java • Eliminates sporadic Java garbage collection overhead (system written in C) ©MapR Technologies - Confidential 47
  • Security  MapR is pushing the envelope on Hadoop security  Integrates with Linux security (PAM) –  Strong wire-level authentication and encryption –  Works with any user directory: Active Directory, LDAP, NIS, … Kerberos and non-Kerberos options Fine-grained access control – – – – Full POSIX permissions on files and directories ACLs on tables, column families, columns, cells ACLs on MapReduce jobs and queues Administration ACLs on cluster and volumes ©MapR Technologies - Confidential 48
  • Bullet-proof NoSQL with Zero Administration Performance Reliability Easy Administration Benefit Features High Performance Over 1 Million ops/sec with 10 Node Cluster Continuous Low Latency No I/O Storms, No Compactions 24x7 Applications Instant Recovery, Online Schema Modification, Snapshots, Mirroring Zero Administration No Processes to Manage, Automated Splits, Self-tuning High Scalability 1 Trillion Tables Low TCO Files and Tables on One Platform ©MapR Technologies - Confidential 49
  • MapR M7 vs. CDH – Mixed Load (50-50) ©MapR Technologies - Confidential 50
  • MapR M7 vs. CDH – Mixed Load (50-50) ©MapR Technologies - Confidential 51
  • MapR MapR The guys with the cool solutions ©MapR Technologies - Confidential 52
  • MapR MapR The future of the future ©MapR Technologies - Confidential 53
  • Thank You ©MapR Technologies - Confidential 54