© 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 
Getting Real With Hadoop 
Jim Scott, Director, Enterprise Strategy & Architecture 
@kingmesal #BigDataEverywhere #Chicago - October 1st, 2014
© 2014 MapR Technologies
© 2014 MapR Technologies
© 2014 MapR Technologies
© 2014 MapR Technologies
© 2014 MapR Technologies
© 2014 MapR Technologies 6 
Can’t We All Just Get Along?
© 2014 MapR Technologies 7 
We Have All Contributed…
The Reality is 
Architecture Matters 
8
© 2014 MapR Technologies 9 
High Availability (HA) Everywhere 
No NameNode architecture 
MapReduce/YARN HA 
NFS HA 
Instant recovery 
Rolling upgrades 
HA is built in 
• Distributed metadata can self-heal 
• No practical limit on # of files 
• Jobs are not impacted by failures 
• Meet your data processing SLAs 
• High throughput and resilience for NFS-based data 
ingestion, import/export and multi-client access 
• Files and tables are accessible within seconds of a node 
failure or cluster restart 
• Upgrade the software with no downtime 
• No special configuration to enable HA 
• All MapR customers operate with HA
© 2014 MapR Technologies
RDBMS Hammer 
© 2014 MapR Technologies 11
© 2014 MapR Technologies 12
Hadoop Hammer 
© 2014 MapR Technologies 13
© 2014 MapR Technologies 
Data Everywhere! 
Social Media 
Messages 
Audio 
Sensors 
Mobile Data 
Email 
Clickstream
Friends don’t let friends 
© 2014 MapR Technologies 
run name nodes.
© 2014 MapR Technologies 16 
Too Many Files!
Friends don’t let friends 
© 2014 MapR Technologies 
run name nodes.
© 2014 MapR Technologies 18 
Volumes 
100K volumes are OK, 
create as many as needed 
Volumes dramatically simplify 
management: 
• Replication factor 
• Scheduled mirroring 
• Scheduled snapshots 
• Data placement control 
• User access and tracking 
• Administrative permissions 
/projects 
/tahoe 
/yosemite 
/user 
/msmith 
/bjohnson
© 2014 MapR Technologies 19 
MapR M7: The Best In-Hadoop Database 
MapR-DB 
 NoSQL Columnar Store 
 Apache HBase API 
 Integrated with Hadoop 
HBase 
JVM 
HDFS 
JVM 
ext3/ext4 
Disks 
Other Distros 
Tables/Files 
Disks 
MapR Enterprise Database Edition (M7) 
The most scalable, enterprise-grade, 
NoSQL database that supports online applications and analytics
Easy Administration 
© 2014 MapR Technologies 20 
Tradeoffs with Other NoSQL Solutions 
Reliability 
24x7 applications with strong 
data consistency 
Performance 
Continuous low latency with 
horizontal scaling 
Easy day-to-day management 
with minimal learning curve
© 2014 MapR Technologies 21 
Consistent, Low Read Latency 
--- M7 Read Latency --- Others Read Latency
MapR Integrates Security into Hadoop 
© 2014 MapR Technologies 
MapR Integrates Security into Hadoop
© 2014 MapR Technologies 23 
Hadoop Security 
Authorization to 
ensure the right 
access to files 
and databases 
Authentication 
for users and 
user-created job 
requests 
Encryption to 
ensure user 
credentials and 
data are always 
secure 
Integration with 
existing security 
infrastructure
© 2014 MapR Technologies 24 
Fine-Grained Access Control 
Full POSIX permissions on files and directories 
ACLs on tables, column families and columns 
ACLs on MapReduce jobs and queues 
Administration ACLs on cluster and volumes 
ACLs for Apache Hive, Apache Drill and Impala
Seamless Integration with Direct Access NFS 
© 2014 MapR Technologies 25 
• MapR is POSIX compliant 
– Random reads/writes 
– Simultaneous reading and writing to a file 
– Compression is automatic and transparent
Seamless Integration with Direct Access NFS 
© 2014 MapR Technologies 26 
• MapR is POSIX compliant 
– Random reads/writes 
– Simultaneous reading and writing to a file 
– Compression is automatic and transparent 
• Industry-standard NFS interface (in 
addition to HDFS API) 
– Stream data into the cluster 
– Leverage thousands of tools and 
applications 
– Easier to use non-Java programming 
languages 
– No need for most proprietary Hadoop 
connectors
© 2014 MapR Technologies 27 
Disaster Recovery: Mirroring 
• Flexible 
– Choose the volumes/directories to mirror 
– You don’t need to mirror the entire cluster 
– Active/active 
• Fast 
– No performance impact 
– Block-level (8KB) deltas 
– Automatic compression 
Production Research 
Production 
WAN 
Datacenter 1 Datacenter 2 
WAN EC2
© 2014 MapR Technologies 28 
Disaster Recovery: Mirroring 
• Flexible 
– Choose the volumes/directories to mirror 
– You don’t need to mirror the entire cluster 
– Active/active 
• Fast 
– No performance impact 
– Block-level (8KB) deltas 
– Automatic compression 
• Safe 
– Point-in-time consistency 
– End-to-end checksums 
• Easy 
– Graceful handling of network issues 
– No third-party software 
– Takes less than two minutes to configure! 
Production Research 
Production 
WAN 
Datacenter 1 Datacenter 2 
WAN EC2
MapR Advantages 
MapR-DB Others 
99.999% uptime ✓ X 
Instant recovery from failures ✓ X 
Continuous low latency (no compactions) ✓ X 
© 2014 MapR Technologies 29 
Zero administration 
(no processes to manage, self-tuning) 
✓ X 
Online data protection (snapshots, mirroring) ✓ X 
Scalability (number of tables supported) Trillion Hundreds
Packages Supported by various distributions 
Red – lacking 
Blue - leading 
© 2014 MapR Technologies 30 
MapR 4.0.1 
(Sep 2014) 
Cloudera 5.1.2 
(Aug 2014) 
Hortonworks 2.1.5 
(Aug 2014) 
Apache Versions 
(Sep 12th, 2014) 
Core Hadoop Hadoop Core, YARN 2.4.1 2.3.0 2.4.0 2.5.1 
Batch Map Reduce MRv1 and MRv2 MRv1 or MRv2 MRv2 MRv2 
Hive 0.12, 0.13 0.12 0.13 0.13 
Tez 0.4 (Dev Preview Only) X 0.4 0.5 
Pig 0.12 0.12 0.12 0.12 
Cascading 2.1.6 X X 2.5 
Spark 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 
Interactive SQL Impala 1.2.3 1.4 X 1.4 
Drill 0.5 X X 0.5 
SparkSQL 1.0.2 X 1.0.1 (Tech Preview only) 1.1 
NoSQL and Search HBase/NoSQL 0.94.2, 0.98.4, MapR-DB 0.98 0.98, Accumulo 1.5.1 HBase 0.98 
Phoenix X X 4.0.0 4.1.0 
AsyncHBase 1.5 X X 1.5 
Search LW (Solr) 2.6.1 , 2.7 Cloudera Search 1.5 X NA 
Machine Learning and 
Graph 
Mahout 0.9 0.9 0.9 0.9 
MLLib/MLBase 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 
GraphX 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 
Streaming/Messaging Spark Streaming 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 
Storm 0.9, 0.9.2 (Certified) X 0.9.1 0.9.2 
Kafka X X 0.8.1.1 (Tech Preview) 0.8.1.1 
Data Integration Sqoop, Sqoop2 1.4.4, 1.99.3 1.4.4, 1.99.3 1.4.4 1.4.5 
Flume 1.5.0 1.5.0 1.4.0 1.5.0 
Knox X X 0.4 0.4 
Coordination Oozie 4.0.1 4.0.0 4.0.0 4.0.1 
Zookeeper 3.4.5 3.4.5 3.4.5 3.4.5 
GUI, Configuration, 
Monitoring 
Management MCS CM Ambari Ambari 
Hue 3.5 3.6 2.5.1 3.6 
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH-Version-and-Packaging-Information/cdhvd_cdh_package_tarball.html?scroll=topic_3_unique_8 
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_releasenotes_hdp_2.1/content/ch_relnotes-hdp-2.1.5-product.html
© 2014 MapR Technologies 
Pick the 
Right Tool 
for the Job
Provisioning 
& 
coordination 
Savannah* 
Workflow 
& Data 
Governance 
MapR Distribution for Apache Hadoop 
Data 
Integration 
& Access 
Hue 
HttpFS 
Flume Knox* Falcon* Whirr 
© 2014 MapR Technologies 32 
APACHE HADOOP AND OSS ECOSYSTEM 
Security 
SQL 
Drill 
SparkSQL 
Impala 
YARN 
Batch 
Spark 
Cascading 
Pig 
Streaming 
Storm* 
Spark 
Streaming 
NoSQL & 
Search 
Solr 
HBase 
Juju 
ML, Graph 
GraphX 
MLLib 
Mahout 
MapReduce 
v1 & v2 
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS 
Tez* 
Accumulo* 
Hive 
Sqoop Sentry* Oozie ZooKeeper 
* Certification/support planned for 2014 
Management 
MapR Data Platform
Provisioning 
& 
coordination 
Savannah* 
Workflow 
& Data 
Governance 
Data 
Integration 
& Access 
Hue 
HttpFS 
Flume Knox* Falcon* Whirr 
NFS HDFS API HBase API JSON API 
© 2014 MapR Technologies 33 
APACHE HADOOP AND OSS ECOSYSTEM 
Security 
SQL 
Drill 
SparkSQL 
Impala 
YARN 
Batch 
Spark 
Cascading 
Pig 
Streaming 
Storm* 
Spark 
Streaming 
NoSQL & 
Search 
Solr 
HBase 
Juju 
ML, Graph 
GraphX 
MLLib 
Mahout 
MapReduce 
v1 & v2 
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS 
Tez* 
Accumulo* 
Hive 
Sqoop Sentry* Oozie ZooKeeper 
MapR Control System 
(Management and Monitoring) 
* Certification/support planned for 2014 
CLI REST API GUI 
MapR Distribution for Apache Hadoop
© 2014 MapR Technologies 
1.65TB 
WITH 298 SERVERS
© 2014 MapR Technologies 35 
1/7th the Hardware Footprint
Forrester Wave™: Big Data Hadoop Solutions, Q1‘14 
February 2014 “The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014” 
© 2014 MapR Technologies 36
© 2014 MapR Technologies
• Pioneering Data Agility for Hadoop 
• Apache open source project 
• Scale-out execution engine for low-latency queries 
• Unified SQL-based API for analytics & operational applications 
© 2014 MapR Technologies 38 
APACHE DRILL 
40+ contributors 
150+ years of experience building 
databases and distributed systems
Drill Supports Schema Discovery On-The-Fly 
Schema Declared In Advance Schema Discovered On-The-Fly 
Schema Schema2 The-Fly 
© 2014 MapR Technologies 39 
• Fixed schema 
• Leverage schema in centralized 
repository (Hive Metastore) 
• Fixed schema, evolving schema or 
schema-less 
• Leverage schema in centralized 
repository or self-describing data 
SCHEMA ON 
WRITE 
SCHEMA 
BEFORE READ 
SCHEMA ON THE 
FLY
© 2014 © 201 M4 aMpaRp RTe Tcehcnhonloogloiegsies 40 
Operational Analytics
© 2014 MapR Technologies 41 
Must Be Able to Scale
© 2014 MapR Technologies 42 
Mobile 
application server 
Real-time ad 
targeting 
Data exploration 
(SQL) 
Real-time and Operational 
Actionable 
Analytics 
Hadoop (MapR M7) 
•User profiles and state 
•User interactions 
•Real-time location data 
•Web and mobile session state 
•Comments/rankings 
Web 
application server 
Customer 360 
dashboard 
Churn analysis 
(predictive analytics) 
Product/service 
optimization and 
personalization
© 2014 MapR Technologies 43 
General Application Monitoring
© 2014 MapR Technologies 44 
Hard Drive Failure Rates
© 2014 MapR Technologies 45 
Recommendation Engines
© 2014 MapR Technologies 46 
20M 
SONGS 
Media Content Recommendation Engine
© 2014 MapR Technologies 
Fraud Detection
© 2014 MapR Technologies 48 
104M 
CARD MEMBERS 
Offer Serving, Credit Risk & Fraud 
More than $600B+
100M 
Data Points 
per second 
Fastest Data Ingest Rates 
© 2014 PEOPLE MapR Technologies 49
© 2014 MapR Technologies 50 
Speed and Intelligence…
Forrester Wave™: NoSQL Key-Value Databases, Q3‘14 
September 2014 “The Forrester Wave™: NoSQL Key-Value Databases, Q3 2014” 
© 2014 MapR Technologies 51
© 2014 MapR Technologies 52 
MapR Editions 
 Control System 
 NFS Access 
 Performance 
 Unlimited Nodes 
 Free 
 All the Features of M5 
 Simplified Administration 
for HBase 
 Increased Performance 
 Consistent Low Latency 
 Unified Snapshots, 
Mirroring 
 Control System 
 NFS Access 
 Performance 
 High Availability 
 Snapshots & Mirroring 
 24 X 7 Support 
 Annual Subscription 
Fastest On-Ramp: 
MapR Sandbox for Hadoop
© 2014 MapR Technologies 
Engage with us! 
@mapr maprtech 
jscott@mapr.com 
MapR 
maprtech 
mapr-technologies

Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

  • 1.
    © 2014 MapRTechno©lo 2g0ie1s4 MapR Technologies Getting Real With Hadoop Jim Scott, Director, Enterprise Strategy & Architecture @kingmesal #BigDataEverywhere #Chicago - October 1st, 2014
  • 2.
    © 2014 MapRTechnologies
  • 3.
    © 2014 MapRTechnologies
  • 4.
    © 2014 MapRTechnologies
  • 5.
    © 2014 MapRTechnologies
  • 6.
    © 2014 MapRTechnologies
  • 7.
    © 2014 MapRTechnologies 6 Can’t We All Just Get Along?
  • 8.
    © 2014 MapRTechnologies 7 We Have All Contributed…
  • 9.
    The Reality is Architecture Matters 8
  • 10.
    © 2014 MapRTechnologies 9 High Availability (HA) Everywhere No NameNode architecture MapReduce/YARN HA NFS HA Instant recovery Rolling upgrades HA is built in • Distributed metadata can self-heal • No practical limit on # of files • Jobs are not impacted by failures • Meet your data processing SLAs • High throughput and resilience for NFS-based data ingestion, import/export and multi-client access • Files and tables are accessible within seconds of a node failure or cluster restart • Upgrade the software with no downtime • No special configuration to enable HA • All MapR customers operate with HA
  • 11.
    © 2014 MapRTechnologies
  • 12.
    RDBMS Hammer ©2014 MapR Technologies 11
  • 13.
    © 2014 MapRTechnologies 12
  • 14.
    Hadoop Hammer ©2014 MapR Technologies 13
  • 15.
    © 2014 MapRTechnologies Data Everywhere! Social Media Messages Audio Sensors Mobile Data Email Clickstream
  • 16.
    Friends don’t letfriends © 2014 MapR Technologies run name nodes.
  • 17.
    © 2014 MapRTechnologies 16 Too Many Files!
  • 18.
    Friends don’t letfriends © 2014 MapR Technologies run name nodes.
  • 19.
    © 2014 MapRTechnologies 18 Volumes 100K volumes are OK, create as many as needed Volumes dramatically simplify management: • Replication factor • Scheduled mirroring • Scheduled snapshots • Data placement control • User access and tracking • Administrative permissions /projects /tahoe /yosemite /user /msmith /bjohnson
  • 20.
    © 2014 MapRTechnologies 19 MapR M7: The Best In-Hadoop Database MapR-DB  NoSQL Columnar Store  Apache HBase API  Integrated with Hadoop HBase JVM HDFS JVM ext3/ext4 Disks Other Distros Tables/Files Disks MapR Enterprise Database Edition (M7) The most scalable, enterprise-grade, NoSQL database that supports online applications and analytics
  • 21.
    Easy Administration ©2014 MapR Technologies 20 Tradeoffs with Other NoSQL Solutions Reliability 24x7 applications with strong data consistency Performance Continuous low latency with horizontal scaling Easy day-to-day management with minimal learning curve
  • 22.
    © 2014 MapRTechnologies 21 Consistent, Low Read Latency --- M7 Read Latency --- Others Read Latency
  • 23.
    MapR Integrates Securityinto Hadoop © 2014 MapR Technologies MapR Integrates Security into Hadoop
  • 24.
    © 2014 MapRTechnologies 23 Hadoop Security Authorization to ensure the right access to files and databases Authentication for users and user-created job requests Encryption to ensure user credentials and data are always secure Integration with existing security infrastructure
  • 25.
    © 2014 MapRTechnologies 24 Fine-Grained Access Control Full POSIX permissions on files and directories ACLs on tables, column families and columns ACLs on MapReduce jobs and queues Administration ACLs on cluster and volumes ACLs for Apache Hive, Apache Drill and Impala
  • 26.
    Seamless Integration withDirect Access NFS © 2014 MapR Technologies 25 • MapR is POSIX compliant – Random reads/writes – Simultaneous reading and writing to a file – Compression is automatic and transparent
  • 27.
    Seamless Integration withDirect Access NFS © 2014 MapR Technologies 26 • MapR is POSIX compliant – Random reads/writes – Simultaneous reading and writing to a file – Compression is automatic and transparent • Industry-standard NFS interface (in addition to HDFS API) – Stream data into the cluster – Leverage thousands of tools and applications – Easier to use non-Java programming languages – No need for most proprietary Hadoop connectors
  • 28.
    © 2014 MapRTechnologies 27 Disaster Recovery: Mirroring • Flexible – Choose the volumes/directories to mirror – You don’t need to mirror the entire cluster – Active/active • Fast – No performance impact – Block-level (8KB) deltas – Automatic compression Production Research Production WAN Datacenter 1 Datacenter 2 WAN EC2
  • 29.
    © 2014 MapRTechnologies 28 Disaster Recovery: Mirroring • Flexible – Choose the volumes/directories to mirror – You don’t need to mirror the entire cluster – Active/active • Fast – No performance impact – Block-level (8KB) deltas – Automatic compression • Safe – Point-in-time consistency – End-to-end checksums • Easy – Graceful handling of network issues – No third-party software – Takes less than two minutes to configure! Production Research Production WAN Datacenter 1 Datacenter 2 WAN EC2
  • 30.
    MapR Advantages MapR-DBOthers 99.999% uptime ✓ X Instant recovery from failures ✓ X Continuous low latency (no compactions) ✓ X © 2014 MapR Technologies 29 Zero administration (no processes to manage, self-tuning) ✓ X Online data protection (snapshots, mirroring) ✓ X Scalability (number of tables supported) Trillion Hundreds
  • 31.
    Packages Supported byvarious distributions Red – lacking Blue - leading © 2014 MapR Technologies 30 MapR 4.0.1 (Sep 2014) Cloudera 5.1.2 (Aug 2014) Hortonworks 2.1.5 (Aug 2014) Apache Versions (Sep 12th, 2014) Core Hadoop Hadoop Core, YARN 2.4.1 2.3.0 2.4.0 2.5.1 Batch Map Reduce MRv1 and MRv2 MRv1 or MRv2 MRv2 MRv2 Hive 0.12, 0.13 0.12 0.13 0.13 Tez 0.4 (Dev Preview Only) X 0.4 0.5 Pig 0.12 0.12 0.12 0.12 Cascading 2.1.6 X X 2.5 Spark 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 Interactive SQL Impala 1.2.3 1.4 X 1.4 Drill 0.5 X X 0.5 SparkSQL 1.0.2 X 1.0.1 (Tech Preview only) 1.1 NoSQL and Search HBase/NoSQL 0.94.2, 0.98.4, MapR-DB 0.98 0.98, Accumulo 1.5.1 HBase 0.98 Phoenix X X 4.0.0 4.1.0 AsyncHBase 1.5 X X 1.5 Search LW (Solr) 2.6.1 , 2.7 Cloudera Search 1.5 X NA Machine Learning and Graph Mahout 0.9 0.9 0.9 0.9 MLLib/MLBase 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 GraphX 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 Streaming/Messaging Spark Streaming 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 Storm 0.9, 0.9.2 (Certified) X 0.9.1 0.9.2 Kafka X X 0.8.1.1 (Tech Preview) 0.8.1.1 Data Integration Sqoop, Sqoop2 1.4.4, 1.99.3 1.4.4, 1.99.3 1.4.4 1.4.5 Flume 1.5.0 1.5.0 1.4.0 1.5.0 Knox X X 0.4 0.4 Coordination Oozie 4.0.1 4.0.0 4.0.0 4.0.1 Zookeeper 3.4.5 3.4.5 3.4.5 3.4.5 GUI, Configuration, Monitoring Management MCS CM Ambari Ambari Hue 3.5 3.6 2.5.1 3.6 http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH-Version-and-Packaging-Information/cdhvd_cdh_package_tarball.html?scroll=topic_3_unique_8 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_releasenotes_hdp_2.1/content/ch_relnotes-hdp-2.1.5-product.html
  • 32.
    © 2014 MapRTechnologies Pick the Right Tool for the Job
  • 33.
    Provisioning & coordination Savannah* Workflow & Data Governance MapR Distribution for Apache Hadoop Data Integration & Access Hue HttpFS Flume Knox* Falcon* Whirr © 2014 MapR Technologies 32 APACHE HADOOP AND OSS ECOSYSTEM Security SQL Drill SparkSQL Impala YARN Batch Spark Cascading Pig Streaming Storm* Spark Streaming NoSQL & Search Solr HBase Juju ML, Graph GraphX MLLib Mahout MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Tez* Accumulo* Hive Sqoop Sentry* Oozie ZooKeeper * Certification/support planned for 2014 Management MapR Data Platform
  • 34.
    Provisioning & coordination Savannah* Workflow & Data Governance Data Integration & Access Hue HttpFS Flume Knox* Falcon* Whirr NFS HDFS API HBase API JSON API © 2014 MapR Technologies 33 APACHE HADOOP AND OSS ECOSYSTEM Security SQL Drill SparkSQL Impala YARN Batch Spark Cascading Pig Streaming Storm* Spark Streaming NoSQL & Search Solr HBase Juju ML, Graph GraphX MLLib Mahout MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Tez* Accumulo* Hive Sqoop Sentry* Oozie ZooKeeper MapR Control System (Management and Monitoring) * Certification/support planned for 2014 CLI REST API GUI MapR Distribution for Apache Hadoop
  • 35.
    © 2014 MapRTechnologies 1.65TB WITH 298 SERVERS
  • 36.
    © 2014 MapRTechnologies 35 1/7th the Hardware Footprint
  • 37.
    Forrester Wave™: BigData Hadoop Solutions, Q1‘14 February 2014 “The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014” © 2014 MapR Technologies 36
  • 38.
    © 2014 MapRTechnologies
  • 39.
    • Pioneering DataAgility for Hadoop • Apache open source project • Scale-out execution engine for low-latency queries • Unified SQL-based API for analytics & operational applications © 2014 MapR Technologies 38 APACHE DRILL 40+ contributors 150+ years of experience building databases and distributed systems
  • 40.
    Drill Supports SchemaDiscovery On-The-Fly Schema Declared In Advance Schema Discovered On-The-Fly Schema Schema2 The-Fly © 2014 MapR Technologies 39 • Fixed schema • Leverage schema in centralized repository (Hive Metastore) • Fixed schema, evolving schema or schema-less • Leverage schema in centralized repository or self-describing data SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY
  • 41.
    © 2014 ©201 M4 aMpaRp RTe Tcehcnhonloogloiegsies 40 Operational Analytics
  • 42.
    © 2014 MapRTechnologies 41 Must Be Able to Scale
  • 43.
    © 2014 MapRTechnologies 42 Mobile application server Real-time ad targeting Data exploration (SQL) Real-time and Operational Actionable Analytics Hadoop (MapR M7) •User profiles and state •User interactions •Real-time location data •Web and mobile session state •Comments/rankings Web application server Customer 360 dashboard Churn analysis (predictive analytics) Product/service optimization and personalization
  • 44.
    © 2014 MapRTechnologies 43 General Application Monitoring
  • 45.
    © 2014 MapRTechnologies 44 Hard Drive Failure Rates
  • 46.
    © 2014 MapRTechnologies 45 Recommendation Engines
  • 47.
    © 2014 MapRTechnologies 46 20M SONGS Media Content Recommendation Engine
  • 48.
    © 2014 MapRTechnologies Fraud Detection
  • 49.
    © 2014 MapRTechnologies 48 104M CARD MEMBERS Offer Serving, Credit Risk & Fraud More than $600B+
  • 50.
    100M Data Points per second Fastest Data Ingest Rates © 2014 PEOPLE MapR Technologies 49
  • 51.
    © 2014 MapRTechnologies 50 Speed and Intelligence…
  • 52.
    Forrester Wave™: NoSQLKey-Value Databases, Q3‘14 September 2014 “The Forrester Wave™: NoSQL Key-Value Databases, Q3 2014” © 2014 MapR Technologies 51
  • 53.
    © 2014 MapRTechnologies 52 MapR Editions  Control System  NFS Access  Performance  Unlimited Nodes  Free  All the Features of M5  Simplified Administration for HBase  Increased Performance  Consistent Low Latency  Unified Snapshots, Mirroring  Control System  NFS Access  Performance  High Availability  Snapshots & Mirroring  24 X 7 Support  Annual Subscription Fastest On-Ramp: MapR Sandbox for Hadoop
  • 54.
    © 2014 MapRTechnologies Engage with us! @mapr maprtech jscott@mapr.com MapR maprtech mapr-technologies