More Related Content
Similar to 20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
Similar to 20140228 - Singapore - BDAS - Ensuring Hadoop Production Success (20)
More from Allen Day, PhD (20)
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
- 2. TREND
1
Hadoop is Providing Value Across Organizations
ENTERPRISE
DATA HUB
• Multi-structured
data staging & archive
• ETL / DW optimization
• Mainframe
optimization
• Data exploration
MARKETING
ANALYTICS
• Recommendation
engines & targeting
• Ad optimization
• Pricing analysis
• Lead scoring
RISK
ANALYTICS
• Network security
monitoring
• Security information &
event management
• Fraudulent behavioral
analysis
OPERATIONS
INTELLIGENCE
• Supply chain & logistics
• System log analysis
• Manufacturing quality
assurance
• Preventative
maintenance
• Sensor analysis
© 2014 MapR Technologies, confidential
- 4. TREND
2
Organizations Have Many Workload-specific Systems
ENTERPRISE
USERS
• Mission-critical
reliability
• Transaction
guarantees
• Deep security
• Real-time performance
• Backup and recovery
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
• Interactive SQL
• Rich analytics
• Mixed workload
management
• Data governance
• Security
• Backup and recovery
© 2014 MapR Technologies, confidential
- 5. REALITY
Hadoop Can Relieve the Pressure from Enterprise Systems
ENTERPRISE
USERS
OPERATIONAL
SYSTEMS
Keys for Production Success
• Data protection and recovery
• Inter-operability
• Read-write performance
• Supports operations and
analytics
ANALYTICAL
SYSTEMS
•
•
•
•
•
Data staging
Archive
Data transformation
Data exploration
Streaming, interactions
© 2014 MapR Technologies, confidential
- 7. REALITY
2
Most Hadoop Projects are Still Science Experiments
Number of
Companies
Cluster Size
Development/Testing
Focus: Educ/Svc
1st Production
Use Case
1 – 10 Nodes
Wide-scale
Production
10 – 2000 Nodes
© 2014 MapR Technologies, confidential
- 10. REALITY
3
Going Big Requires a Rock-Solid Architecture
Enterprise-grade
Multi-tenancy
High Performance
Open Standards
for Interoperability
Data Protection
Operational &
Analytical
FOUNDATION
© 2014 MapR Technologies, confidential
- 11. MapR Distribution for Hadoop
APACHE HADOOP ECOSYSTEM
Hive/
Stinger/
Tez
Drill
Impala
Shark
Hue
...
Flume
Mahout
Cascading
Solr
Spark
Storm
Sentry
Zookeeper
Management
Sqoop
Whirr
Pig
YARN
MapReduce
Oozie
HBase
• High availability
• Standard file access
• Data protection
• Standard database
• Disaster recovery
access
Patent • Pluggable services
MAPR-FS
• Performance 2X-5X
MAPR-FS
Pending• Broad developer
FILES
support
Enterprise-grade
Performance
• Ability to logically
divide a cluster to
support different
use cases, job
types, user groups,
and administrators
• Enterprise security
authorization
• Wire-level
authentication
• Data governance
MapR Data Platform
MapR Data Platform
MapR Data Platform
MapR Data Platform
Multi-tenancy
Data
Protection
• Ability to support
predictive analytics,
real-time database
operations,MAPR-DB
and
MAPR-DB
support high arrival
TABLES
rate data
Inter-operability
• Unit of work
framework to provide
transactional
integrity
Operational &
Analytical
© 2014 MapR Technologies, confidential
- 12. Apache Hadoop NameNode High Availability (HA)
NAS
Appliance
HDFS HA
A
B
C
D
AA
A
E
BB
Primary NameNode
NameNode
NameNode
B
HDFS
Federation
D
E
F
B E C F D
DA
D
E
F
NameNode
F
C
CC
NameNode
NameNode
F
Standby NameNode
NameNode
NameNode
DataNode
Single point NameNode
Only one activeof failure
Multiple single points
of failure w/o HA
Limited to 50-200 million files
Needs 20 NameNodes
Performance bottleneck
for 1 Billion files
E
DataNode
DataNode
DataNode
DataNode
DataNode
Performance bottleneck
Commercial NASNAS needed
Commercial possibly needed
Metadata must fit in memory
DataNode
DataNode
DataNode
Double the block reports
Performance bottleneck
HDFS-based Distributions
© 2014 MapR Technologies, confidential
- 13. No NameNode Architecture
A
B
C
D
E
F
NameNode
No special config to enable HA
Up to 1T files (> 5000x advantage)
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
Automatic failover & re-replication
Metadata is persisted to disk
Significantly less hardware & OpEx
Higher performance
© 2014 MapR Technologies, confidential
- 14. Comparative Study of Hadoop Distributions: I/O Performance
Read and Write Throughput Benchmarks
IDH 2.4.1
262
276
212
465
MB per Second
MB per Second
475
HDP 1.3
MapR M5 2.1.3
59
DFSIO Read Throughput
CDH 4.3
69
64
DFSIO Write Throughput
Source: Flux7 Labs Study, October 2013
© 2014 MapR Technologies, confidential
- 15. World Record Performance
NEW MINUTESORT WORLD RECORD
With a Fraction of the Hardware
1.65 TB
IN 1 MINUTE
298 NODES
PREVIOUS RECORD:
1.6 TB with 2200 nodes
© 2014 MapR Technologies, confidential
- 16. Hbase Apps: High Performance with Consistent Low Latency
--- M7 Read Latency
--- Others Read Latency
© 2014 MapR Technologies, confidential
- 17. MapR M7: The Best In-Hadoop Database
HBase
JVM
NoSQL Columnar Store
Apache HBase API
In-Hadoop database
HDFS
JVM
ext3/ext4
Tables/Files
Disks
Disks
Other Distros
MapR M7
The most scalable, enterprise-grade,
NoSQL database that supports online applications and analytics
© 2014 MapR Technologies, confidential
- 18. MapR M7: The Best In-Hadoop Database
Hbase
Interface
BigData Application
JVM
HDFS
Interface
NoSQL Columnar Store
Apache HBase API
In-Hadoop database
JVM
ext3/ext4
Tables/Files
Disks
Disks
Other Distros
MapR M7
The most scalable, enterprise-grade,
NoSQL database that supports online applications and analytics
© 2014 MapR Technologies, confidential
- 19. Opportunity to Revolutionize Enterprise Data Architecture
From Redundant Processing Silos and Data Science Experiments…
© 2014 MapR Technologies, confidential
- 20. The Production Enterprise BigData Platform
… to Consolidated Operational and Analytical Workloads
© 2014 MapR Technologies, confidential
- 21. Q&A
Engage with us!
@allenday, @mapr
linkedin.com/in/allenday
allenday@mapr.com
tsheng@mapr.com
mdarling@mapr.com
© 2014 MapR Technologies, confidential