Huawei Best Practices for Big Data Infrastructure Needs
1. This document is offered compliments of
BSP Media Group. www.bspmediagroup.com
All rights reserved.
2. HUAWEI TECHNOLOGIES CO., LTD.
FROM BIG DATA TO BIG VALUE
INFRASTRUCTURE NEEDS
AND HUAWEI BEST PRACTICE
HU YUHAI
MARKETING DIRECTOR, BIG DATA & CLOUD STORAGE
HUAWEI ENTERPRISE IT
NOVEMBER 2013
1
4. DATA LANDSCAPE CONTINUES TO EVOLVE
Data Volume
Captured and processed
Data Velocity
Of ingest and time
sensitivity for analysis
Satellite
Images
Data Variability Data format
Sensors
Email
BioInformatics
Documents
OLTP
Web Logs
BUSINESS PROCESS
Generated
STRUCTURED DATA
Social
Video
Audio
HUMAN Generated
UNSTRUCTURED DATA
1990
3
M2m Log
Files
2000
2008
2013
MACHINE Generated
SEMI-STRUCTURED DATA
5. BIG DATA ANALYTICS DATA FLOWS
Capture
Store
Process
Insight
OLTP
…
CRM
Terabytes
ERP
SCM
OLTP DB
SAN
ETL
MPP DW
Human
…
Machine
Web Logs
4
Petabytes
NAS
MPP Data Store
Converged Compute & Storage
Exabytes
6. EXAMPLE FOR “EXABYTE” REQUIREMENT
5
"CERN is hitting the technology limits for resource-intensive simulations
and analysis. Our collaboration with Huawei shows an exciting new
approach, where their novel architecture extends the capabilities in
preparation for the Exascale data rates and volumes we expect in the
future." said Bob Jones, head of CERN OpenLAB
7. INFRASTRUCTURE REQUIREMENTS
EXISTING INFRASTRUCTURE DOESN’T SCALE !
Scale capacity on demand
Scale bandwidth on demand
High throughput ingest
Process data in place near real-time
Cost effective, follows Moore’s Law
Scaling in every dimension is key !
6
8. INFRASTRUCTURE NEEDS
Scale-out distributed storage platforms
‒ Bring the computation to the data
‒ Can’t move Petabytes around network
‒ High throughput streaming workloads
‒ Batch oriented processing
Colum-oriented NOSQL and MPP databases
‒ Flexible schemas, massive scale
Real time analytics requires massive flows
‒ New platforms combine real-time with batch
‒ Trigger on events and process historical data
7
9. HUAWEI STRATEGY ON BIG DATA
Intelligent Application Awareness
Multi protocol Interface
Openness and cooperation
Natively support Multi-workload
Integrated Storage, analysis and archiving functions
Huawei
Strategy
Data full life cycle management
Infrastructure is Key of Big Data
Scale out and X86 architecture, all IP based
Fully symmetric and distributed file system
“Build the Most Efficient Big Data Platform”
8
10. HUAWEI ENTERPRISE-LEVEL BIG DATA PLATFORM
M&E
TELECOM
BANKING
WORKLOAD
High Performance
Store and Archive
STANDARD
EXPOSURE
NFS/CIFS/HDFS
Query and Retrieval
for Structured Data
SQL
GOVERMENT
Analysis Processing
for Unstructured Data
MR/HBASE
MPP DB
ENGINE
ENTERPRISE HADOOP
ENGINE
NATIVE
INTERFACE
HDFS
ENERGY
EB-level Storage
Resource Pool Mgmt
HTTP/S3
OBJECT STORAGE
ENGINE
• World Leading Performance and Scalability Storage
Platform as the Infrastructure.
OCEANSTOR
BIG DATA
FRAMEWORK
NATIVE
INTERFACE
• Natively Integrated HADOOP, MPP DB, OBJECT
Engine, Efficient Data Loading and Processing.
DISTRIBUTED
LOAD
QUOTA
STORAGE
RAID
BALANCE
MGMT
TIERING
• End-To-End Data Protection and Life CycleSYSTEM
Mgmt.
“HIGH SCALABILITY” DISTRIBUTED STORAGE
9
12. ENTERPRISE-LEVEL HADOOP PLATFORM
Customized Hadoop
‒ Reliability improvements
‒ Redundancy, Failover, SPoF elimination
‒ Security/privacy improvements
‒ Encryption of data and metadata, KERBEROS access
control
‒ Management simplification
‒ GUI platform management tools, role-based admin
‒ All Hadoop tools, such as HIVE, PIG, etc.
Innovative DR Solution
‒ DR site up to 1000km
Special VM instances for Hadoop processing
11
16. UDS: EB-LEVEL MASSIVE STORAGE SYSTEM
Universal Distributed Storage
Web
Disk
Space
Lease
Active
Archive
Centralized
Backup
• Native object storage,
decentralized architecture
ARM based high density hardware
SoD client
0 / 2 128
Pm
Hash
(key)
P0
P1
P2
DHT ring
P10
P3
P9
P4
P8
P7
• Unlimited Scalability: EB-level
capacity
P5
P6
• Extreme Reliability: 99.9999%
data durability
• Low TCO: Energy saving HW &
Zero-Touch design
DHT: Highly Available Key Value Store
2.1 PB
60%
45%
Capacity / rack
15
40 GB
Output BW / rack
Energy saving
Reduced TCO
17. HUAWEI IT BUSINESS COVERAGE
Applications
Servers
Big
Data
Storage
Converged
Infrastructure
Cloud
Computing
Data Center Facilities
16
Management
Distributed Cloud Data Center
18. KEEP YOUR COMPETITIVE ADVANTAGE
Big data is here
Big data presents new challenges to infrastructure
Be careful with an open source Hadoop
Implementing a robust foundation and careful selection
of tools can allow you to benefit from big data
17