• Like
  • Save
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 

20140228 - Singapore - BDAS - Ensuring Hadoop Production Success

on

  • 559 views

 

Statistics

Views

Total Views
559
Views on SlideShare
555
Embed Views
4

Actions

Likes
1
Downloads
4
Comments
0

2 Embeds 4

https://twitter.com 3
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    20140228 - Singapore - BDAS - Ensuring Hadoop Production Success 20140228 - Singapore - BDAS - Ensuring Hadoop Production Success Presentation Transcript

    • © 2014 MapR Technologies, confidential
    • TREND 1 Hadoop is Providing Value Across Organizations ENTERPRISE DATA HUB • Multi-structured data staging & archive • ETL / DW optimization • Mainframe optimization • Data exploration MARKETING ANALYTICS • Recommendation engines & targeting • Ad optimization • Pricing analysis • Lead scoring RISK ANALYTICS • Network security monitoring • Security information & event management • Fraudulent behavioral analysis OPERATIONS INTELLIGENCE • Supply chain & logistics • System log analysis • Manufacturing quality assurance • Preventative maintenance • Sensor analysis © 2014 MapR Technologies, confidential
    • Sellers Cloud Advertising Automation Cloud Buyers Cloud 90B AD AUCTIONS per day © 2014 MapR Technologies, confidential 3
    • TREND 2 Organizations Have Many Workload-specific Systems ENTERPRISE USERS • Mission-critical reliability • Transaction guarantees • Deep security • Real-time performance • Backup and recovery OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS • Interactive SQL • Rich analytics • Mixed workload management • Data governance • Security • Backup and recovery © 2014 MapR Technologies, confidential
    • REALITY Hadoop Can Relieve the Pressure from Enterprise Systems ENTERPRISE USERS OPERATIONAL SYSTEMS Keys for Production Success • Data protection and recovery • Inter-operability • Read-write performance • Supports operations and analytics ANALYTICAL SYSTEMS • • • • • Data staging Archive Data transformation Data exploration Streaming, interactions © 2014 MapR Technologies, confidential
    • Fortune 100 Financial Services Company 104M CARD MEMBERS © 2014 MapR Technologies, confidential 6
    • REALITY 2 Most Hadoop Projects are Still Science Experiments Number of Companies Cluster Size Development/Testing Focus: Educ/Svc 1st Production Use Case 1 – 10 Nodes Wide-scale Production 10 – 2000 Nodes © 2014 MapR Technologies, confidential
    • Largest Biometric Database in the World 1.2B PEOPLE PEOPLE 8 © 2014 MapR Technologies, confidential 8
    • REALITY 3 Going Big Requires a Rock-Solid Architecture FOUNDATION © 2014 MapR Technologies, confidential
    • REALITY 3 Going Big Requires a Rock-Solid Architecture Enterprise-grade Multi-tenancy High Performance Open Standards for Interoperability Data Protection Operational & Analytical FOUNDATION © 2014 MapR Technologies, confidential
    • MapR Distribution for Hadoop APACHE HADOOP ECOSYSTEM Hive/ Stinger/ Tez Drill Impala Shark Hue ... Flume Mahout Cascading Solr Spark Storm Sentry Zookeeper Management Sqoop Whirr Pig YARN MapReduce Oozie HBase • High availability • Standard file access • Data protection • Standard database • Disaster recovery access Patent • Pluggable services MAPR-FS • Performance 2X-5X MAPR-FS Pending• Broad developer FILES support Enterprise-grade Performance • Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators • Enterprise security authorization • Wire-level authentication • Data governance MapR Data Platform MapR Data Platform MapR Data Platform MapR Data Platform Multi-tenancy Data Protection • Ability to support predictive analytics, real-time database operations,MAPR-DB and MAPR-DB support high arrival TABLES rate data Inter-operability • Unit of work framework to provide transactional integrity Operational & Analytical © 2014 MapR Technologies, confidential
    • Apache Hadoop NameNode High Availability (HA) NAS Appliance HDFS HA A B C D AA A E BB Primary NameNode NameNode NameNode B HDFS Federation D E F B E C F D DA D E F NameNode F C CC NameNode NameNode F Standby NameNode NameNode NameNode DataNode Single point NameNode Only one activeof failure Multiple single points of failure w/o HA Limited to 50-200 million files Needs 20 NameNodes Performance bottleneck for 1 Billion files E DataNode DataNode DataNode DataNode DataNode Performance bottleneck Commercial NASNAS needed Commercial possibly needed Metadata must fit in memory DataNode DataNode DataNode Double the block reports Performance bottleneck HDFS-based Distributions © 2014 MapR Technologies, confidential
    • No NameNode Architecture A B C D E F NameNode No special config to enable HA Up to 1T files (> 5000x advantage) DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode Automatic failover & re-replication Metadata is persisted to disk Significantly less hardware & OpEx Higher performance © 2014 MapR Technologies, confidential
    • Comparative Study of Hadoop Distributions: I/O Performance Read and Write Throughput Benchmarks IDH 2.4.1 262 276 212 465 MB per Second MB per Second 475 HDP 1.3 MapR M5 2.1.3 59 DFSIO Read Throughput CDH 4.3 69 64 DFSIO Write Throughput Source: Flux7 Labs Study, October 2013 © 2014 MapR Technologies, confidential
    • World Record Performance NEW MINUTESORT WORLD RECORD With a Fraction of the Hardware 1.65 TB IN 1 MINUTE 298 NODES PREVIOUS RECORD: 1.6 TB with 2200 nodes © 2014 MapR Technologies, confidential
    • Hbase Apps: High Performance with Consistent Low Latency --- M7 Read Latency --- Others Read Latency © 2014 MapR Technologies, confidential
    • MapR M7: The Best In-Hadoop Database HBase JVM NoSQL Columnar Store  Apache HBase API  In-Hadoop database  HDFS JVM ext3/ext4 Tables/Files Disks Disks Other Distros MapR M7 The most scalable, enterprise-grade, NoSQL database that supports online applications and analytics © 2014 MapR Technologies, confidential
    • MapR M7: The Best In-Hadoop Database Hbase Interface BigData Application JVM HDFS Interface NoSQL Columnar Store  Apache HBase API  In-Hadoop database  JVM ext3/ext4 Tables/Files Disks Disks Other Distros MapR M7 The most scalable, enterprise-grade, NoSQL database that supports online applications and analytics © 2014 MapR Technologies, confidential
    • Opportunity to Revolutionize Enterprise Data Architecture From Redundant Processing Silos and Data Science Experiments… © 2014 MapR Technologies, confidential
    • The Production Enterprise BigData Platform … to Consolidated Operational and Analytical Workloads © 2014 MapR Technologies, confidential
    • Q&A Engage with us! @allenday, @mapr linkedin.com/in/allenday allenday@mapr.com tsheng@mapr.com mdarling@mapr.com © 2014 MapR Technologies, confidential