SlideShare a Scribd company logo
1 of 56
Download to read offline
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop  Present  –  Open  Enterprise  Hadoop
Yifeng  Jiang
Solutions  Engineer,  Hortonworks,  inc.
July  26,  2015  
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
自己紹介
蒋  逸峰  (Yifeng  Jiang)
•  Solutions  Engineer  @  Hortonworks  Japan
•  HBase  book  author
•  ⽇日本に来て10年年経ちました…
•  趣味は⼭山登り
•  Twitter:  @uprush
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Ageda
•  Hadoop  Core  Updates
•  Data  Access  in  Hadoop
•  Hadoop  Security
•  Hadoop  Management
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Present
Enterprise Ready Hadoop
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoopコミュニティのアクティビティ
Number  of  Issues  Resolved Number  of  Line  of  Code  Increased
http://ajisakaa.blogspot.jp
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Open  Leadership
Code  Contributed  in  2014  by  Organization
Hortonworks
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
専門家集団: 開発に深く携わるコア・メンバーにより構成
沿革
2011年6月:
Yahoo! で初代の Hadoop 開発を手がけたアーキテクト、デベロッパー、
オペレータ 24名によって創立
2014年12月:
社員数600を超えるHadoopの専門家集団に成長
Apache Project Committers
PMC
Members
Hadoop 27 21
Pig 5 5
Hive 18 6
Tez 16 15
HBase 6 4
Phoenix 4 4
Accumulo 2 2
Storm 3 2
Slider 11 11
Falcon 5 3
Flume 1 1
Sqoop 1 1
Ambari 36 28
Oozie 3 2
Zookeeper 2 1
Knox 13 3
Ranger 11 n/a
TOTAL 164 109
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks Data Platform 2.2 Stack
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Core
HDFS + YARN: Data Operating System
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDFS
Scalable & Efficient Data Lake Storage
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDFS: more Efficient Data Lake Storage
•  HDFS  NFS  Gateway
–  Mount  HDFS  path
•  Erasure  Coding  (under  dev)
–  Reduce  storage  cost  from  3x  to  1.4x
•  Tiered  Storage
–  DataNode  becomes  collection  of  tiered  storages
–  DISK,  SSD,  RAM,  ARCHIVAL
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Storage Growth Challenges
• Some cluster storage need grows very fast
– High volumes of data
– More users and new use cases to Hadoop
• Only way to grow storage is add more
nodes
Page 12Architecting the Future of Big Data
Cluster Storage and Compute Capacity
Cluster Storage Utilization
Compute Utilization
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Archival Storage Scenario
Data Usage
Hot - Less than 7 days with very high
usage
Warm – Less than 1 month and used ~20
times per month
Cold – Less than 3 months and used 5
times per month
Frozen - 3 months to 7 years and used
approximately 2 times per year
Ebay
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
0 10 20 30 40 50 60 70 80
Temperature of Data
Hadoop	
  
TIME (Data Age)
FrequencyofDataUsage(perMonth)
Cold Data
Hot Data
Warm Data
Cold Data
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Archival Storage for Cost Efficiency
Scale Storage independently from Compute.
Archival Storage Tier
•  Deploy storage dense hardware nodes
•  Utilize storage policies for datasets:
•  Hot, Warm, Cold
•  Achieve ~4x lower price point per GB Cluster Storage Capacity
Cluster Storage Utilization
Compute Utilization
Cluster Compute Capacity
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDFS Storage Architecture - Before
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDFS Storage Architecture - Now
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Storage Policy: SSD & Hot
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
DISK
DISK
DISK
DISK
DISK
DISK
HDP Cluster
A
DISK
DISK
DISK
A A
SSD
All replicas on SSDDataSet A
(e.g., HBase)
Hot
All replicas on
DISK
DataSet B
(others)
B B B
I2.8x I2.8x I2.8x d2.8x d2.8x d2.8x
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Storage  Policy:  実際にやってみる
Ambariにて、HDFS  Configuration  Groups  作成
•  I2⽤用グループ
•  D2⽤用グループ
Ambariにて、GroupsごとにDataNodeストレージタイプ、パスを定義
dfs.datanode.data.dir を下記に設定
•  I2  group:  [SSD]/hadoop/hdfs/data1,[SSD]/hadoop/hdfs/data2,…
•  D2  group:  [DISK]/hadoop/hdfs/data1,[DISK]/hadoop/hdfs/data2,…
HDFS再起動
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Storage Policyを設定してみる
$ hdfs dfs -mkdir /hbase

$ hdfs dfsadmin -setStoragePolicy /hbase ALL_SSD
Set storage policy ALL_SSD on /hbase

$ hdfs dfsadmin -getStoragePolicy /ssd
The storage policy of /ssd:
BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD], creationFallbacks=[DISK], 
replicationFallbacks=[DISK]}
HBaseのデータをすべてSSD(i2)に保存
•  /hbase  配下を  ALL_̲SSD  に設定
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDFS: Next Step
•  Erasure  Code  GA
•  Ozone:  an  object  store  in  HDFS
HDFS-7285 HDFS-7240
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
YARN
Extends Hadoop into Data OS
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Recap: What’s YARN
Cluster Resource Management
•  Resource sharing
–  Capacity scheduler
–  Fair Sharing: pluggable queue policies new
•  Isolation
–  Memory, CPU
–  Node labels new
•  Workload types
–  Batch, interactive, in-memory
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Storm Storm
StormStorm
Exclusive Node Labels enable Isolated Partitions
S
App
Storm
Configure
Partitions
Storm
B
App
Exclusive Labels
enforce Isolation
S S
nodes
labels
S S
HDP 2.2
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Spark Spark
SparkSpark
Non-Exclusive Node Labels
S
App
Spark
Configure non-
exclusive labels
Spark
B
App
Schedule if free
capacity
S S
nodes
labels
S S
B
YARN-3214
HDP 2.3
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Working with Labels
Ambari YARN Guided Configuration: Enable node labels
YARN CLI: Create and assign labels
ResourceManager UI: View Node Labels in Cluster
Capacity Scheduler View: Define workload management policy with labels
$ yarn rmadmin -addToClusterNodeLabels ”spark(exclusive=false)”
$ yarn cluster -list-node-labels
$ yarn rmadmin -replaceLabelsOnNode ”node5=spark”
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
YARN: Next Step
Disk & network isolation
•  Just isolation – enforce equal sharing of Disk and Network I/O across
containers running on node
•  Current in technical preview of HDP 2.3
•  Disk resource: Local Disk Iops… not HDFS read/writes
•  Network resource: Outbound only bandwidth (mbits/sec)
YARN-2619 YARN-2140
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Access Innovation
SQL, Spark, Stream Processing, Search
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive:  Enterprise  SQL  at  Hadoop  Scale
Native transactions
•  Delivered: Insert, Update, Delete
Performance: 100x faster
•  ORC File
•  Hive on Tez
•  Cost Based Optimizer
•  Vertorized SQL engine
28
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive: Next Step
SQL Enhancement
•  Transactions: BEGIN, COMMIT, ROLLBACK
•  SQL 2011 Analytics
Performance
•  Sub-second response: LLAP, HBase as metastore, etc.
Apache  Hiveの今とこれから
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Spark Features – HDP 2.3.x & Spark 1.3.1
Supported
•  Spark Core
•  MLlib
•  Spark on YARN
•  Kerberos
•  Ambari support
Tech Preview
•  SparkSQL*
•  Spark Streaming
•  DataFrame
•  Spark ML Pipeline API
Unsupported
•  GraphX
•  BlinkDB
•  Spark Standalone/
Mesos
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Resource Management
YARN for multi-tenant, diverse workloads with predictable SLAs
Tiered Memory Storage
HDFS in-memory tier – External BlockStore for RDD Cache
SparkSQL & Hive for SQL
Interop with modern Metastore/HS2, optimized ORC support,
advanced analytics e.g. Geospatial
Spark & NoSQL
Deep integration with HBase via DataSources/Catalyst for
Predicate/Aggregate Pushdown
Connect The Dots – Algorithms to Use-Cases
Higher-level ML Abstractions – E.g. OneVsRest
Validation, tuning, pipeline assembly... e.g. GeoSpatial
Spark and Hadoop – How Can We Do Better?
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Ease of Use
Apache Zeppelin for interactive notebooks
Metadata & Governance
Apache Atlas for metadata & Apache Falcon support for
Spark pipelines
Security & Operations
Apache Ranger managed authorization and deployment/
management via Apache Ambari
Deployable Anywhere
Linux, Windows, on-premises or cloud
Self-Service Spark in the Cloud
Easy launch of Data Science clusters via Cloudbreak and
Ambari – for Azure, AWS, GCP, OpenStack, Docker
Spark and Hadoop – How Can We Do Better?
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Platform Innovation for Data Access
An integrated scalable platform
for data access powered by
HDP
•  Limitless storage
•  Deep analytics
•  Real-time access
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Security
End to End Security in Hadoop
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Five Security Requirements
Authentication
Kerberos
Authorization Audit
Encryption
HDP  2.3
Security  support
RANGER
HDFS
Hadoop  Security  
Overview
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDFS
Fully Secure Flow –End to End Security
HiveServer 2
A B C
KDC
Use Hive ST,
submit query
Hive gets
Namenode (NN)
service ticket
6.Hive creates
map reduce
using NN ST
Ranger
3.Knox gets
service ticket for
Hive
4.Knox calls as
proxy user
1.Original
request w/user
id/password
Client gets
query result
SSL
O/JDBC
Client
SSL SASL
SSL SSL
SSL
LDAP
2.Knox
Authenticates
user/pass
Ranger Sync users/groups
from LDAP
5. Ranger AuthZ
Apache
Knox
Apache
Knox
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Ranger:  Central Security Administration
37
•  Table/column
access control
•  Audit logging
•  Flexible
definition
Control group/
user permissions
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Management
Ambari: Hadoop for Everyone, 100% Open Source
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
What’s Apache Ambari?
100% open source
operational platform to
provision, manage and
monitor Hadoop clusters
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Ambari Mission
Easy	
  opera,on	
  at	
  
scale	
  
Large	
  scale	
  cluster	
  install,	
  manage	
  and	
  monitor	
  
Efficient	
  and	
  scale	
  at	
  scale	
  
Easy	
  to	
  extend	
  with	
  
community	
  
Innovate	
  with	
  community	
  
Integrate	
  with	
  enterprise	
  so:ware	
  
Accelerate	
  new	
  feature	
  and	
  adop=on	
  
Centralized	
  
management	
  for	
  
the	
  whole	
  Hadoop	
  
stack	
  
	
  
Access	
  point	
  for	
  all	
  Hadoop	
  users,	
  not	
  just	
  cluster	
  management	
  
Easy	
  of	
  use	
  
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Ambari 2.1 HDP Stack High Availability
HDP Stack Mode Ambari 2.0 Ambari 2.1
HDFS: NameNode HDP 2.0+
Active/
Standby
YARN: ResourceManager HDP 2.1+
Active/
Standby
HBase: HBaseMaster HDP 2.1+ Multi-master
Hive: HiveServer2 HDP 2.1+ Multi-instance
Hive: Hive Metastore HDP 2.1+ Multi-instance
Hive: WebHCat Server HDP 2.1+ Multi-instance
Oozie: Oozie Server HDP 2.1+ Multi-instance
Storm: Nimbus Server HDP 2.3 Multi-instance
Ranger: AdminServer HDP 2.3 Multi-instance
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Install Wizard
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Guided Configs for HDFS
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Guided Configs for YARN & MapReduce
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Enable Features in YARN
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cluster Dashboard
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Service Dashboard
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Service Manage - HDFS
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Host Manage
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Monitor & Alert
Email
SNMP
Notifications
Script new
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
User Views – HDFS File View
Files View
Browse HDFS file system.
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
User Views – YARN CS, Tez
Capacity Scheduler View
Browse + manage YARN queues
Tez View
View information related to Tez jobs
that are executing on the cluster.
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
User Views – Pig, Hive
Pig View
Author and execute Pig
Scripts.
Hive View
Author, execute and debug
Hive queries.
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Summary
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Open  Enterprise  Hadoop
Hadoop/YARN-powered data operating system
100% open source, multi-tenant data platform for
any application, any data set, anywhere.
Built on a centralized architecture of
shared enterprise services
•  Scalable  tiered  storage
•  Resource  and  workload  management
•  Trusted  data  governance  &  metadata  management
•  Consistent  operations
•  Comprehensive  security
•  Developer  APIs  and  tools
YARN: data operating system
Governance Security
Operations
Resource management
Data access: batch, interactive, real-time
Storage
Commodity Appliance Cloud
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank  you
Yifeng  Jiang,  Solutions  Engineer,  Hortonworks
@uprush

More Related Content

What's hot

Bringing Real-Time to the Enterprise with Hortonworks DataFlow
Bringing Real-Time to the Enterprise with Hortonworks DataFlowBringing Real-Time to the Enterprise with Hortonworks DataFlow
Bringing Real-Time to the Enterprise with Hortonworks DataFlowDataWorks Summit
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsDataWorks Summit
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduDataWorks Summit
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jDataWorks Summit
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?DataWorks Summit
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDataWorks Summit
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalDataWorks Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitDataWorks Summit
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep DiveHortonworks
 
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNHadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNDataWorks Summit
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Running a container cloud on YARN
Running a container cloud on YARNRunning a container cloud on YARN
Running a container cloud on YARNDataWorks Summit
 

What's hot (20)

Bringing Real-Time to the Enterprise with Hortonworks DataFlow
Bringing Real-Time to the Enterprise with Hortonworks DataFlowBringing Real-Time to the Enterprise with Hortonworks DataFlow
Bringing Real-Time to the Enterprise with Hortonworks DataFlow
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4j
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposal
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNHadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
 
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Ansible + Hadoop
Ansible + HadoopAnsible + Hadoop
Ansible + Hadoop
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Running a container cloud on YARN
Running a container cloud on YARNRunning a container cloud on YARN
Running a container cloud on YARN
 

Similar to Hadoop Present - Open Enterprise Hadoop

Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitDataWorks Summit
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudDataWorks Summit/Hadoop Summit
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014Hortonworks
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016alanfgates
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...DataWorks Summit
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...Big Data Spain
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...DataWorks Summit
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Hortonworks
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinAlex Zeltov
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - finalHortonworks
 

Similar to Hadoop Present - Open Enterprise Hadoop (20)

Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
 

More from Yifeng Jiang

introduction-to-apache-kafka
introduction-to-apache-kafkaintroduction-to-apache-kafka
introduction-to-apache-kafkaYifeng Jiang
 
Hive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big DataHive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big DataYifeng Jiang
 
Introduction to Streaming Analytics Manager
Introduction to Streaming Analytics ManagerIntroduction to Streaming Analytics Manager
Introduction to Streaming Analytics ManagerYifeng Jiang
 
HDF 3.0 IoT Platform for Everyone
HDF 3.0 IoT Platform for EveryoneHDF 3.0 IoT Platform for Everyone
HDF 3.0 IoT Platform for EveryoneYifeng Jiang
 
Hortonworks Data Cloud for AWS 1.11 Updates
Hortonworks Data Cloud for AWS 1.11 UpdatesHortonworks Data Cloud for AWS 1.11 Updates
Hortonworks Data Cloud for AWS 1.11 UpdatesYifeng Jiang
 
Introduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWSIntroduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWSYifeng Jiang
 
Real-time Analytics in Financial
Real-time Analytics in FinancialReal-time Analytics in Financial
Real-time Analytics in FinancialYifeng Jiang
 
sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16Yifeng Jiang
 
Yifeng hadoop-present-public
Yifeng hadoop-present-publicYifeng hadoop-present-public
Yifeng hadoop-present-publicYifeng Jiang
 
Hive-sub-second-sql-on-hadoop-public
Hive-sub-second-sql-on-hadoop-publicHive-sub-second-sql-on-hadoop-public
Hive-sub-second-sql-on-hadoop-publicYifeng Jiang
 
Yifeng spark-final-public
Yifeng spark-final-publicYifeng spark-final-public
Yifeng spark-final-publicYifeng Jiang
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveYifeng Jiang
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghaiYifeng Jiang
 
Apache Hiveの今とこれから
Apache Hiveの今とこれからApache Hiveの今とこれから
Apache Hiveの今とこれからYifeng Jiang
 
Hadoop Trends & Hadoop on EC2
Hadoop Trends & Hadoop on EC2Hadoop Trends & Hadoop on EC2
Hadoop Trends & Hadoop on EC2Yifeng Jiang
 
Apache Ambari Overview -- Hadoop for Everyone
Apache Ambari Overview -- Hadoop for EveryoneApache Ambari Overview -- Hadoop for Everyone
Apache Ambari Overview -- Hadoop for EveryoneYifeng Jiang
 
HDP Security Overview
HDP Security OverviewHDP Security Overview
HDP Security OverviewYifeng Jiang
 

More from Yifeng Jiang (20)

introduction-to-apache-kafka
introduction-to-apache-kafkaintroduction-to-apache-kafka
introduction-to-apache-kafka
 
Hive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big DataHive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big Data
 
Introduction to Streaming Analytics Manager
Introduction to Streaming Analytics ManagerIntroduction to Streaming Analytics Manager
Introduction to Streaming Analytics Manager
 
HDF 3.0 IoT Platform for Everyone
HDF 3.0 IoT Platform for EveryoneHDF 3.0 IoT Platform for Everyone
HDF 3.0 IoT Platform for Everyone
 
Hortonworks Data Cloud for AWS 1.11 Updates
Hortonworks Data Cloud for AWS 1.11 UpdatesHortonworks Data Cloud for AWS 1.11 Updates
Hortonworks Data Cloud for AWS 1.11 Updates
 
Spark Security
Spark SecuritySpark Security
Spark Security
 
Introduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWSIntroduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWS
 
Real-time Analytics in Financial
Real-time Analytics in FinancialReal-time Analytics in Financial
Real-time Analytics in Financial
 
sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Yifeng hadoop-present-public
Yifeng hadoop-present-publicYifeng hadoop-present-public
Yifeng hadoop-present-public
 
Hive-sub-second-sql-on-hadoop-public
Hive-sub-second-sql-on-hadoop-publicHive-sub-second-sql-on-hadoop-public
Hive-sub-second-sql-on-hadoop-public
 
Yifeng spark-final-public
Yifeng spark-final-publicYifeng spark-final-public
Yifeng spark-final-public
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
 
Apache Hiveの今とこれから
Apache Hiveの今とこれからApache Hiveの今とこれから
Apache Hiveの今とこれから
 
HDFS Deep Dive
HDFS Deep DiveHDFS Deep Dive
HDFS Deep Dive
 
Hadoop Trends & Hadoop on EC2
Hadoop Trends & Hadoop on EC2Hadoop Trends & Hadoop on EC2
Hadoop Trends & Hadoop on EC2
 
Apache Ambari Overview -- Hadoop for Everyone
Apache Ambari Overview -- Hadoop for EveryoneApache Ambari Overview -- Hadoop for Everyone
Apache Ambari Overview -- Hadoop for Everyone
 
HDP Security Overview
HDP Security OverviewHDP Security Overview
HDP Security Overview
 

Recently uploaded

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 

Recently uploaded (20)

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 

Hadoop Present - Open Enterprise Hadoop

  • 1. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop  Present  –  Open  Enterprise  Hadoop Yifeng  Jiang Solutions  Engineer,  Hortonworks,  inc. July  26,  2015  
  • 2. © Hortonworks Inc. 2011 – 2015. All Rights Reserved 自己紹介 蒋  逸峰  (Yifeng  Jiang) •  Solutions  Engineer  @  Hortonworks  Japan •  HBase  book  author •  ⽇日本に来て10年年経ちました… •  趣味は⼭山登り •  Twitter:  @uprush
  • 3. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Ageda •  Hadoop  Core  Updates •  Data  Access  in  Hadoop •  Hadoop  Security •  Hadoop  Management
  • 4. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop Present Enterprise Ready Hadoop
  • 5. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoopコミュニティのアクティビティ Number  of  Issues  Resolved Number  of  Line  of  Code  Increased http://ajisakaa.blogspot.jp
  • 6. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Open  Leadership Code  Contributed  in  2014  by  Organization Hortonworks
  • 7. © Hortonworks Inc. 2011 – 2015. All Rights Reserved 専門家集団: 開発に深く携わるコア・メンバーにより構成 沿革 2011年6月: Yahoo! で初代の Hadoop 開発を手がけたアーキテクト、デベロッパー、 オペレータ 24名によって創立 2014年12月: 社員数600を超えるHadoopの専門家集団に成長 Apache Project Committers PMC Members Hadoop 27 21 Pig 5 5 Hive 18 6 Tez 16 15 HBase 6 4 Phoenix 4 4 Accumulo 2 2 Storm 3 2 Slider 11 11 Falcon 5 3 Flume 1 1 Sqoop 1 1 Ambari 36 28 Oozie 3 2 Zookeeper 2 1 Knox 13 3 Ranger 11 n/a TOTAL 164 109
  • 8. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hortonworks Data Platform 2.2 Stack
  • 9. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop Core HDFS + YARN: Data Operating System
  • 10. © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDFS Scalable & Efficient Data Lake Storage
  • 11. © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDFS: more Efficient Data Lake Storage •  HDFS  NFS  Gateway –  Mount  HDFS  path •  Erasure  Coding  (under  dev) –  Reduce  storage  cost  from  3x  to  1.4x •  Tiered  Storage –  DataNode  becomes  collection  of  tiered  storages –  DISK,  SSD,  RAM,  ARCHIVAL
  • 12. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Storage Growth Challenges • Some cluster storage need grows very fast – High volumes of data – More users and new use cases to Hadoop • Only way to grow storage is add more nodes Page 12Architecting the Future of Big Data Cluster Storage and Compute Capacity Cluster Storage Utilization Compute Utilization
  • 13. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Archival Storage Scenario Data Usage Hot - Less than 7 days with very high usage Warm – Less than 1 month and used ~20 times per month Cold – Less than 3 months and used 5 times per month Frozen - 3 months to 7 years and used approximately 2 times per year Ebay 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 0 10 20 30 40 50 60 70 80 Temperature of Data Hadoop   TIME (Data Age) FrequencyofDataUsage(perMonth) Cold Data Hot Data Warm Data Cold Data
  • 14. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Archival Storage for Cost Efficiency Scale Storage independently from Compute. Archival Storage Tier •  Deploy storage dense hardware nodes •  Utilize storage policies for datasets: •  Hot, Warm, Cold •  Achieve ~4x lower price point per GB Cluster Storage Capacity Cluster Storage Utilization Compute Utilization Cluster Compute Capacity
  • 15. © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDFS Storage Architecture - Before
  • 16. © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDFS Storage Architecture - Now
  • 17. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Storage Policy: SSD & Hot SSD SSD SSD SSD SSD SSD SSD SSD SSD DISK DISK DISK DISK DISK DISK HDP Cluster A DISK DISK DISK A A SSD All replicas on SSDDataSet A (e.g., HBase) Hot All replicas on DISK DataSet B (others) B B B I2.8x I2.8x I2.8x d2.8x d2.8x d2.8x
  • 18. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Storage  Policy:  実際にやってみる Ambariにて、HDFS  Configuration  Groups  作成 •  I2⽤用グループ •  D2⽤用グループ Ambariにて、GroupsごとにDataNodeストレージタイプ、パスを定義 dfs.datanode.data.dir を下記に設定 •  I2  group:  [SSD]/hadoop/hdfs/data1,[SSD]/hadoop/hdfs/data2,… •  D2  group:  [DISK]/hadoop/hdfs/data1,[DISK]/hadoop/hdfs/data2,… HDFS再起動
  • 19. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Storage Policyを設定してみる $ hdfs dfs -mkdir /hbase $ hdfs dfsadmin -setStoragePolicy /hbase ALL_SSD Set storage policy ALL_SSD on /hbase $ hdfs dfsadmin -getStoragePolicy /ssd The storage policy of /ssd: BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD], creationFallbacks=[DISK], replicationFallbacks=[DISK]} HBaseのデータをすべてSSD(i2)に保存 •  /hbase  配下を  ALL_̲SSD  に設定
  • 20. © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDFS: Next Step •  Erasure  Code  GA •  Ozone:  an  object  store  in  HDFS HDFS-7285 HDFS-7240
  • 21. © Hortonworks Inc. 2011 – 2015. All Rights Reserved YARN Extends Hadoop into Data OS
  • 22. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Recap: What’s YARN Cluster Resource Management •  Resource sharing –  Capacity scheduler –  Fair Sharing: pluggable queue policies new •  Isolation –  Memory, CPU –  Node labels new •  Workload types –  Batch, interactive, in-memory
  • 23. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Storm Storm StormStorm Exclusive Node Labels enable Isolated Partitions S App Storm Configure Partitions Storm B App Exclusive Labels enforce Isolation S S nodes labels S S HDP 2.2
  • 24. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Spark Spark SparkSpark Non-Exclusive Node Labels S App Spark Configure non- exclusive labels Spark B App Schedule if free capacity S S nodes labels S S B YARN-3214 HDP 2.3
  • 25. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Working with Labels Ambari YARN Guided Configuration: Enable node labels YARN CLI: Create and assign labels ResourceManager UI: View Node Labels in Cluster Capacity Scheduler View: Define workload management policy with labels $ yarn rmadmin -addToClusterNodeLabels ”spark(exclusive=false)” $ yarn cluster -list-node-labels $ yarn rmadmin -replaceLabelsOnNode ”node5=spark”
  • 26. © Hortonworks Inc. 2011 – 2015. All Rights Reserved YARN: Next Step Disk & network isolation •  Just isolation – enforce equal sharing of Disk and Network I/O across containers running on node •  Current in technical preview of HDP 2.3 •  Disk resource: Local Disk Iops… not HDFS read/writes •  Network resource: Outbound only bandwidth (mbits/sec) YARN-2619 YARN-2140
  • 27. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Access Innovation SQL, Spark, Stream Processing, Search
  • 28. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive:  Enterprise  SQL  at  Hadoop  Scale Native transactions •  Delivered: Insert, Update, Delete Performance: 100x faster •  ORC File •  Hive on Tez •  Cost Based Optimizer •  Vertorized SQL engine 28
  • 29. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive: Next Step SQL Enhancement •  Transactions: BEGIN, COMMIT, ROLLBACK •  SQL 2011 Analytics Performance •  Sub-second response: LLAP, HBase as metastore, etc. Apache  Hiveの今とこれから
  • 30. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Spark Features – HDP 2.3.x & Spark 1.3.1 Supported •  Spark Core •  MLlib •  Spark on YARN •  Kerberos •  Ambari support Tech Preview •  SparkSQL* •  Spark Streaming •  DataFrame •  Spark ML Pipeline API Unsupported •  GraphX •  BlinkDB •  Spark Standalone/ Mesos
  • 31. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Resource Management YARN for multi-tenant, diverse workloads with predictable SLAs Tiered Memory Storage HDFS in-memory tier – External BlockStore for RDD Cache SparkSQL & Hive for SQL Interop with modern Metastore/HS2, optimized ORC support, advanced analytics e.g. Geospatial Spark & NoSQL Deep integration with HBase via DataSources/Catalyst for Predicate/Aggregate Pushdown Connect The Dots – Algorithms to Use-Cases Higher-level ML Abstractions – E.g. OneVsRest Validation, tuning, pipeline assembly... e.g. GeoSpatial Spark and Hadoop – How Can We Do Better? Storage YARN: Data Operating System Governance Security Operations Resource Management
  • 32. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Ease of Use Apache Zeppelin for interactive notebooks Metadata & Governance Apache Atlas for metadata & Apache Falcon support for Spark pipelines Security & Operations Apache Ranger managed authorization and deployment/ management via Apache Ambari Deployable Anywhere Linux, Windows, on-premises or cloud Self-Service Spark in the Cloud Easy launch of Data Science clusters via Cloudbreak and Ambari – for Azure, AWS, GCP, OpenStack, Docker Spark and Hadoop – How Can We Do Better? Storage YARN: Data Operating System Governance Security Operations Resource Management
  • 33. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Platform Innovation for Data Access An integrated scalable platform for data access powered by HDP •  Limitless storage •  Deep analytics •  Real-time access
  • 34. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Security End to End Security in Hadoop
  • 35. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Five Security Requirements Authentication Kerberos Authorization Audit Encryption HDP  2.3 Security  support RANGER HDFS Hadoop  Security   Overview
  • 36. © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDFS Fully Secure Flow –End to End Security HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket 6.Hive creates map reduce using NN ST Ranger 3.Knox gets service ticket for Hive 4.Knox calls as proxy user 1.Original request w/user id/password Client gets query result SSL O/JDBC Client SSL SASL SSL SSL SSL LDAP 2.Knox Authenticates user/pass Ranger Sync users/groups from LDAP 5. Ranger AuthZ Apache Knox Apache Knox
  • 37. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Ranger:  Central Security Administration 37 •  Table/column access control •  Audit logging •  Flexible definition Control group/ user permissions
  • 38. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop Management Ambari: Hadoop for Everyone, 100% Open Source
  • 39. © Hortonworks Inc. 2011 – 2015. All Rights Reserved What’s Apache Ambari? 100% open source operational platform to provision, manage and monitor Hadoop clusters
  • 40. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Ambari Mission Easy  opera,on  at   scale   Large  scale  cluster  install,  manage  and  monitor   Efficient  and  scale  at  scale   Easy  to  extend  with   community   Innovate  with  community   Integrate  with  enterprise  so:ware   Accelerate  new  feature  and  adop=on   Centralized   management  for   the  whole  Hadoop   stack     Access  point  for  all  Hadoop  users,  not  just  cluster  management   Easy  of  use  
  • 41. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Ambari 2.1 HDP Stack High Availability HDP Stack Mode Ambari 2.0 Ambari 2.1 HDFS: NameNode HDP 2.0+ Active/ Standby YARN: ResourceManager HDP 2.1+ Active/ Standby HBase: HBaseMaster HDP 2.1+ Multi-master Hive: HiveServer2 HDP 2.1+ Multi-instance Hive: Hive Metastore HDP 2.1+ Multi-instance Hive: WebHCat Server HDP 2.1+ Multi-instance Oozie: Oozie Server HDP 2.1+ Multi-instance Storm: Nimbus Server HDP 2.3 Multi-instance Ranger: AdminServer HDP 2.3 Multi-instance
  • 42. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Install Wizard
  • 43. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Guided Configs for HDFS
  • 44. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Guided Configs for YARN & MapReduce
  • 45. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Enable Features in YARN
  • 46. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cluster Dashboard
  • 47. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Service Dashboard
  • 48. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Service Manage - HDFS
  • 49. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Host Manage
  • 50. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Monitor & Alert Email SNMP Notifications Script new
  • 51. © Hortonworks Inc. 2011 – 2015. All Rights Reserved User Views – HDFS File View Files View Browse HDFS file system.
  • 52. © Hortonworks Inc. 2011 – 2015. All Rights Reserved User Views – YARN CS, Tez Capacity Scheduler View Browse + manage YARN queues Tez View View information related to Tez jobs that are executing on the cluster.
  • 53. © Hortonworks Inc. 2011 – 2015. All Rights Reserved User Views – Pig, Hive Pig View Author and execute Pig Scripts. Hive View Author, execute and debug Hive queries.
  • 54. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Summary
  • 55. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Open  Enterprise  Hadoop Hadoop/YARN-powered data operating system 100% open source, multi-tenant data platform for any application, any data set, anywhere. Built on a centralized architecture of shared enterprise services •  Scalable  tiered  storage •  Resource  and  workload  management •  Trusted  data  governance  &  metadata  management •  Consistent  operations •  Comprehensive  security •  Developer  APIs  and  tools YARN: data operating system Governance Security Operations Resource management Data access: batch, interactive, real-time Storage Commodity Appliance Cloud
  • 56. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Thank  you Yifeng  Jiang,  Solutions  Engineer,  Hortonworks @uprush