Your SlideShare is downloading. ×
0
Big Data Computations
Using Elastic Data
Processing in
OpenStack Cloud
Sergey Lukjanov (Mirantis)
Alexander Ignatov (Miran...
Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
OpenStack Data Processing: Sahara
Mission: To provide a scalable data processing
stack and associated management interface...
Hadoop - Big Data Platform
© http://hortonworks.com/hadoop/yarn/
Trends
http://www.google.com/trends/
Architecture overview
Data
Sources
Savanna
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
Data
...
Sahara status
• Official integrated OpenStack project
• Supported Hadoop distros:
• Vanilla Apache Hadoop
• Hortonworks Da...
Contributors
Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
Elastic Data Processing
• EDP - API for executing MapReduce jobs on
Hadoop clusters (similar to AWS EMR)
• Supported data ...
EDP Use Cases
• Simplified task executions. You don’t need to
know Hadoop!
• Bursty workload: ad-hoc queries requiring a
s...
EDP - Data Sources
Swift Sahara EDP
INPUT
OUTPUT
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
swift://some_container/INPUT
swif...
EDP - Job Binaries
Swift
Sahara DB
Sahara EDP
internal-db://script.pig
swift://some_container/mapreduce.jar
1. Pig, Hive s...
EDP - Job Execution. Step 1
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
EDP - Job Execution. Step 2
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
JobTracker
Oozie
Hadoop
VM
Hadoop
VM
Hadoop
VM
EDP - Job Execution. Step 3
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
JobTracker
Oozie
Ex...
EDP - Job Execution. Step 4
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
JobTracker
Oozie
EDP - Job Execution. Step 5
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
workflow.xm
l
1. Jo...
EDP - Job Execution. Step 6
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
workflow.xm
l
Data ...
EDP - Job Execution. Step 7
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
workflow.xm
l
1. Jo...
Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
EDP BigPetStore Demo
BigPetStore is now part of Apache BigTop
• Test/demo laboratory for all things Hadoop
• Actively deve...
EDP BigPetStore Demo
What are we going to do?
• Generate 1M records of pet supply purchases
• Clean the data (“dirty CSV”)...
EDP BigPetStore Sample Data
Generated Data (first job)
$ hadoop fs -cat bigpetstore/gen/part-r-00000 | more
BigPetStore,st...
EDP BigPetStore Sample Data
Summed Data For Products by State (3rd job)
$ hadoop fs -cat bigpetstore/analyze_rel/part-r-00...
What Next for EDP
Potential Areas for Development within EDP
• Pluggable Job Execution Model
• Allows Sahara to run jobs w...
Design Summit Sessions
7 Sessions: Thursday 1:30 - Friday 10:30
http://goo.gl/lQXtUS
Q&A
Thank you!
Upcoming SlideShare
Loading in...5
×

Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud

611

Published on

Transcript of "Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud"

  1. 1. Big Data Computations Using Elastic Data Processing in OpenStack Cloud Sergey Lukjanov (Mirantis) Alexander Ignatov (Mirantis) Trevor McKay (Red Hat)
  2. 2. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  3. 3. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  4. 4. OpenStack Data Processing: Sahara Mission: To provide a scalable data processing stack and associated management interfaces. • provision and operate Hadoop clusters • schedule and operate Hadoop jobs
  5. 5. Hadoop - Big Data Platform © http://hortonworks.com/hadoop/yarn/
  6. 6. Trends http://www.google.com/trends/
  7. 7. Architecture overview Data Sources Savanna Python Client RESTAPI Cluster Configuration Manager Horizon Keystone Auth Data Access Layer Swift Savanna Pages Hadoop VM Vendors Plugins Hadoop VM Hadoop VM Hadoop VM Resources Orchestration Manager Job Sources Job Manager Heat Nova Glance Cinder Neutron Trove DB
  8. 8. Sahara status • Official integrated OpenStack project • Supported Hadoop distros: • Vanilla Apache Hadoop • Hortonworks Data Platform • Intel Distribution • Cloudera Distribution in blueprint • Included into OpenStack distros: • RDO - openstack.redhat.com • Mirantis OpenStack - software.mirantis.com
  9. 9. Contributors
  10. 10. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  11. 11. Elastic Data Processing • EDP - API for executing MapReduce jobs on Hadoop clusters (similar to AWS EMR) • Supported data sources: Swift, HDFS, Ceph • Supported job types: Java actions, MapReduce, MapReduce.Streaming, Pig, Hive • Oozie for Hadoop jobs workflow management • Supports both Hadoop 1 & 2 • Job executions on transient clusters
  12. 12. EDP Use Cases • Simplified task executions. You don’t need to know Hadoop! • Bursty workload: ad-hoc queries requiring a significant resource only for short time period • Utilization of free IaaS capacity for Hadoop tasks
  13. 13. EDP - Data Sources Swift Sahara EDP INPUT OUTPUT Hadoop VM Hadoop VM Hadoop VM Hadoop VM swift://some_container/INPUT swift://some_container/OUTPUT
  14. 14. EDP - Job Binaries Swift Sahara DB Sahara EDP internal-db://script.pig swift://some_container/mapreduce.jar 1. Pig, Hive scripts 2. Executable Jar files 3. Pluggable binaries and libraries
  15. 15. EDP - Job Execution. Step 1 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig
  16. 16. EDP - Job Execution. Step 2 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig JobTracker Oozie Hadoop VM Hadoop VM Hadoop VM
  17. 17. EDP - Job Execution. Step 3 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM JobTracker Oozie Execute a job
  18. 18. EDP - Job Execution. Step 4 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM JobTracker Oozie
  19. 19. EDP - Job Execution. Step 5 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM workflow.xm l 1. Job-specific configurations 2. URLs to binaries 3. URLs for data sources 4. Credentials JobTracker Oozie
  20. 20. EDP - Job Execution. Step 6 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM workflow.xm l Data Processing OUTPUT 1. Job-specific configurations 2. URLs to binaries 3. URLs for data sources 4. Credentials JobTracker Oozie
  21. 21. EDP - Job Execution. Step 7 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM workflow.xm l 1. Job-specific configurations 2. URLs to binaries 3. URLs for data sources 4. Credentials Data Processing OUTPUT JobTracker Oozie
  22. 22. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  23. 23. EDP BigPetStore Demo BigPetStore is now part of Apache BigTop • Test/demo laboratory for all things Hadoop • Actively developed with integration testing • Generates and processes data of arbitrary size • git clone git://git.apache.org/bigtop.git • Filed under bigtop/bigtop-bigpetstore
  24. 24. EDP BigPetStore Demo What are we going to do? • Generate 1M records of pet supply purchases • Clean the data (“dirty CSV”) • Extract cumulative counts by state • Demonstrates Sahara EDP objects • Job Binaries • Jobs (Java and Pig) • Data Sources
  25. 25. EDP BigPetStore Sample Data Generated Data (first job) $ hadoop fs -cat bigpetstore/gen/part-r-00000 | more BigPetStore,storeCode_AK,1 deanna,booker,Sun Jan 18 20:50:06 GMT+00:00 1970,7.5,cat-food BigPetStore,storeCode_AK,10 erica,buck,Thu Dec 25 16:29:28 GMT+00:00 1969,10.5,dog-food Cleaned Data (second job) $ hadoop fs -cat bigpetstore/clean/part-m-00000 | more BigPetStore storeCode_AK 1 deanna booker Sun Jan 18 20:50:06 GMT+00:00 1970 7.5 cat-food BigPetStore storeCode_AK 10 erica buck Thu Dec 25 16:29:28 GMT+00:00 1969 10.5 dog-food
  26. 26. EDP BigPetStore Sample Data Summed Data For Products by State (3rd job) $ hadoop fs -cat bigpetstore/analyze_rel/part-r-00000 | more US-AK cat-food 24837 US-AK dog-food 24994 US-AK fuzzy-collar 25145 US-AK antelope-caller 25024 US-AZ cat-food 25106 US-AZ dog-food 25064 US-AZ leather-collar 24870 US-AZ snake-bite ointment 24960
  27. 27. What Next for EDP Potential Areas for Development within EDP • Pluggable Job Execution Model • Allows Sahara to run jobs with additional execution engines • Current Oozie offerings become one of multiple options • Expand Capabilities via Oozie • Support upload of user-written Oozie workflows • Support for coordinated jobs • Enhanced Usability • Better Error Reporting • User Experience (UI, CLI, API) Please, send us your feedback! Ideas are always welcome • #openstack-sahara on freenode • openstack-dev@lists.openstack.org with [openstack-dev][sahara] subject
  28. 28. Design Summit Sessions 7 Sessions: Thursday 1:30 - Friday 10:30 http://goo.gl/lQXtUS
  29. 29. Q&A
  30. 30. Thank you!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×