Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

880 views

Published on

Jim Scott, Director of Enterprise Strategy, MapR; Cofounder, CHUG

In this talk, we will take a look back at the short history of Hadoop, along with the trials and tribulation that have come along with this ground-breaking technology. We will explore the reasons why enterprises need to look deeper into their wants and needs and further into the future to prepare for where they are going.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

  1. 1. © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies Getting Real With Hadoop Jim Scott, Director, Enterprise Strategy & Architecture @kingmesal #BigDataEverywhere #Chicago - October 1st, 2014
  2. 2. © 2014 MapR Technologies
  3. 3. © 2014 MapR Technologies
  4. 4. © 2014 MapR Technologies
  5. 5. © 2014 MapR Technologies
  6. 6. © 2014 MapR Technologies
  7. 7. © 2014 MapR Technologies 6 Can’t We All Just Get Along?
  8. 8. © 2014 MapR Technologies 7 We Have All Contributed…
  9. 9. The Reality is Architecture Matters 8
  10. 10. © 2014 MapR Technologies 9 High Availability (HA) Everywhere No NameNode architecture MapReduce/YARN HA NFS HA Instant recovery Rolling upgrades HA is built in • Distributed metadata can self-heal • No practical limit on # of files • Jobs are not impacted by failures • Meet your data processing SLAs • High throughput and resilience for NFS-based data ingestion, import/export and multi-client access • Files and tables are accessible within seconds of a node failure or cluster restart • Upgrade the software with no downtime • No special configuration to enable HA • All MapR customers operate with HA
  11. 11. © 2014 MapR Technologies
  12. 12. RDBMS Hammer © 2014 MapR Technologies 11
  13. 13. © 2014 MapR Technologies 12
  14. 14. Hadoop Hammer © 2014 MapR Technologies 13
  15. 15. © 2014 MapR Technologies Data Everywhere! Social Media Messages Audio Sensors Mobile Data Email Clickstream
  16. 16. Friends don’t let friends © 2014 MapR Technologies run name nodes.
  17. 17. © 2014 MapR Technologies 16 Too Many Files!
  18. 18. Friends don’t let friends © 2014 MapR Technologies run name nodes.
  19. 19. © 2014 MapR Technologies 18 Volumes 100K volumes are OK, create as many as needed Volumes dramatically simplify management: • Replication factor • Scheduled mirroring • Scheduled snapshots • Data placement control • User access and tracking • Administrative permissions /projects /tahoe /yosemite /user /msmith /bjohnson
  20. 20. © 2014 MapR Technologies 19 MapR M7: The Best In-Hadoop Database MapR-DB  NoSQL Columnar Store  Apache HBase API  Integrated with Hadoop HBase JVM HDFS JVM ext3/ext4 Disks Other Distros Tables/Files Disks MapR Enterprise Database Edition (M7) The most scalable, enterprise-grade, NoSQL database that supports online applications and analytics
  21. 21. Easy Administration © 2014 MapR Technologies 20 Tradeoffs with Other NoSQL Solutions Reliability 24x7 applications with strong data consistency Performance Continuous low latency with horizontal scaling Easy day-to-day management with minimal learning curve
  22. 22. © 2014 MapR Technologies 21 Consistent, Low Read Latency --- M7 Read Latency --- Others Read Latency
  23. 23. MapR Integrates Security into Hadoop © 2014 MapR Technologies MapR Integrates Security into Hadoop
  24. 24. © 2014 MapR Technologies 23 Hadoop Security Authorization to ensure the right access to files and databases Authentication for users and user-created job requests Encryption to ensure user credentials and data are always secure Integration with existing security infrastructure
  25. 25. © 2014 MapR Technologies 24 Fine-Grained Access Control Full POSIX permissions on files and directories ACLs on tables, column families and columns ACLs on MapReduce jobs and queues Administration ACLs on cluster and volumes ACLs for Apache Hive, Apache Drill and Impala
  26. 26. Seamless Integration with Direct Access NFS © 2014 MapR Technologies 25 • MapR is POSIX compliant – Random reads/writes – Simultaneous reading and writing to a file – Compression is automatic and transparent
  27. 27. Seamless Integration with Direct Access NFS © 2014 MapR Technologies 26 • MapR is POSIX compliant – Random reads/writes – Simultaneous reading and writing to a file – Compression is automatic and transparent • Industry-standard NFS interface (in addition to HDFS API) – Stream data into the cluster – Leverage thousands of tools and applications – Easier to use non-Java programming languages – No need for most proprietary Hadoop connectors
  28. 28. © 2014 MapR Technologies 27 Disaster Recovery: Mirroring • Flexible – Choose the volumes/directories to mirror – You don’t need to mirror the entire cluster – Active/active • Fast – No performance impact – Block-level (8KB) deltas – Automatic compression Production Research Production WAN Datacenter 1 Datacenter 2 WAN EC2
  29. 29. © 2014 MapR Technologies 28 Disaster Recovery: Mirroring • Flexible – Choose the volumes/directories to mirror – You don’t need to mirror the entire cluster – Active/active • Fast – No performance impact – Block-level (8KB) deltas – Automatic compression • Safe – Point-in-time consistency – End-to-end checksums • Easy – Graceful handling of network issues – No third-party software – Takes less than two minutes to configure! Production Research Production WAN Datacenter 1 Datacenter 2 WAN EC2
  30. 30. MapR Advantages MapR-DB Others 99.999% uptime ✓ X Instant recovery from failures ✓ X Continuous low latency (no compactions) ✓ X © 2014 MapR Technologies 29 Zero administration (no processes to manage, self-tuning) ✓ X Online data protection (snapshots, mirroring) ✓ X Scalability (number of tables supported) Trillion Hundreds
  31. 31. Packages Supported by various distributions Red – lacking Blue - leading © 2014 MapR Technologies 30 MapR 4.0.1 (Sep 2014) Cloudera 5.1.2 (Aug 2014) Hortonworks 2.1.5 (Aug 2014) Apache Versions (Sep 12th, 2014) Core Hadoop Hadoop Core, YARN 2.4.1 2.3.0 2.4.0 2.5.1 Batch Map Reduce MRv1 and MRv2 MRv1 or MRv2 MRv2 MRv2 Hive 0.12, 0.13 0.12 0.13 0.13 Tez 0.4 (Dev Preview Only) X 0.4 0.5 Pig 0.12 0.12 0.12 0.12 Cascading 2.1.6 X X 2.5 Spark 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 Interactive SQL Impala 1.2.3 1.4 X 1.4 Drill 0.5 X X 0.5 SparkSQL 1.0.2 X 1.0.1 (Tech Preview only) 1.1 NoSQL and Search HBase/NoSQL 0.94.2, 0.98.4, MapR-DB 0.98 0.98, Accumulo 1.5.1 HBase 0.98 Phoenix X X 4.0.0 4.1.0 AsyncHBase 1.5 X X 1.5 Search LW (Solr) 2.6.1 , 2.7 Cloudera Search 1.5 X NA Machine Learning and Graph Mahout 0.9 0.9 0.9 0.9 MLLib/MLBase 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 GraphX 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 Streaming/Messaging Spark Streaming 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 Storm 0.9, 0.9.2 (Certified) X 0.9.1 0.9.2 Kafka X X 0.8.1.1 (Tech Preview) 0.8.1.1 Data Integration Sqoop, Sqoop2 1.4.4, 1.99.3 1.4.4, 1.99.3 1.4.4 1.4.5 Flume 1.5.0 1.5.0 1.4.0 1.5.0 Knox X X 0.4 0.4 Coordination Oozie 4.0.1 4.0.0 4.0.0 4.0.1 Zookeeper 3.4.5 3.4.5 3.4.5 3.4.5 GUI, Configuration, Monitoring Management MCS CM Ambari Ambari Hue 3.5 3.6 2.5.1 3.6 http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH-Version-and-Packaging-Information/cdhvd_cdh_package_tarball.html?scroll=topic_3_unique_8 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_releasenotes_hdp_2.1/content/ch_relnotes-hdp-2.1.5-product.html
  32. 32. © 2014 MapR Technologies Pick the Right Tool for the Job
  33. 33. Provisioning & coordination Savannah* Workflow & Data Governance MapR Distribution for Apache Hadoop Data Integration & Access Hue HttpFS Flume Knox* Falcon* Whirr © 2014 MapR Technologies 32 APACHE HADOOP AND OSS ECOSYSTEM Security SQL Drill SparkSQL Impala YARN Batch Spark Cascading Pig Streaming Storm* Spark Streaming NoSQL & Search Solr HBase Juju ML, Graph GraphX MLLib Mahout MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Tez* Accumulo* Hive Sqoop Sentry* Oozie ZooKeeper * Certification/support planned for 2014 Management MapR Data Platform
  34. 34. Provisioning & coordination Savannah* Workflow & Data Governance Data Integration & Access Hue HttpFS Flume Knox* Falcon* Whirr NFS HDFS API HBase API JSON API © 2014 MapR Technologies 33 APACHE HADOOP AND OSS ECOSYSTEM Security SQL Drill SparkSQL Impala YARN Batch Spark Cascading Pig Streaming Storm* Spark Streaming NoSQL & Search Solr HBase Juju ML, Graph GraphX MLLib Mahout MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Tez* Accumulo* Hive Sqoop Sentry* Oozie ZooKeeper MapR Control System (Management and Monitoring) * Certification/support planned for 2014 CLI REST API GUI MapR Distribution for Apache Hadoop
  35. 35. © 2014 MapR Technologies 1.65TB WITH 298 SERVERS
  36. 36. © 2014 MapR Technologies 35 1/7th the Hardware Footprint
  37. 37. Forrester Wave™: Big Data Hadoop Solutions, Q1‘14 February 2014 “The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014” © 2014 MapR Technologies 36
  38. 38. © 2014 MapR Technologies
  39. 39. • Pioneering Data Agility for Hadoop • Apache open source project • Scale-out execution engine for low-latency queries • Unified SQL-based API for analytics & operational applications © 2014 MapR Technologies 38 APACHE DRILL 40+ contributors 150+ years of experience building databases and distributed systems
  40. 40. Drill Supports Schema Discovery On-The-Fly Schema Declared In Advance Schema Discovered On-The-Fly Schema Schema2 The-Fly © 2014 MapR Technologies 39 • Fixed schema • Leverage schema in centralized repository (Hive Metastore) • Fixed schema, evolving schema or schema-less • Leverage schema in centralized repository or self-describing data SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY
  41. 41. © 2014 © 201 M4 aMpaRp RTe Tcehcnhonloogloiegsies 40 Operational Analytics
  42. 42. © 2014 MapR Technologies 41 Must Be Able to Scale
  43. 43. © 2014 MapR Technologies 42 Mobile application server Real-time ad targeting Data exploration (SQL) Real-time and Operational Actionable Analytics Hadoop (MapR M7) •User profiles and state •User interactions •Real-time location data •Web and mobile session state •Comments/rankings Web application server Customer 360 dashboard Churn analysis (predictive analytics) Product/service optimization and personalization
  44. 44. © 2014 MapR Technologies 43 General Application Monitoring
  45. 45. © 2014 MapR Technologies 44 Hard Drive Failure Rates
  46. 46. © 2014 MapR Technologies 45 Recommendation Engines
  47. 47. © 2014 MapR Technologies 46 20M SONGS Media Content Recommendation Engine
  48. 48. © 2014 MapR Technologies Fraud Detection
  49. 49. © 2014 MapR Technologies 48 104M CARD MEMBERS Offer Serving, Credit Risk & Fraud More than $600B+
  50. 50. 100M Data Points per second Fastest Data Ingest Rates © 2014 PEOPLE MapR Technologies 49
  51. 51. © 2014 MapR Technologies 50 Speed and Intelligence…
  52. 52. Forrester Wave™: NoSQL Key-Value Databases, Q3‘14 September 2014 “The Forrester Wave™: NoSQL Key-Value Databases, Q3 2014” © 2014 MapR Technologies 51
  53. 53. © 2014 MapR Technologies 52 MapR Editions  Control System  NFS Access  Performance  Unlimited Nodes  Free  All the Features of M5  Simplified Administration for HBase  Increased Performance  Consistent Low Latency  Unified Snapshots, Mirroring  Control System  NFS Access  Performance  High Availability  Snapshots & Mirroring  24 X 7 Support  Annual Subscription Fastest On-Ramp: MapR Sandbox for Hadoop
  54. 54. © 2014 MapR Technologies Engage with us! @mapr maprtech jscott@mapr.com MapR maprtech mapr-technologies

×