Your SlideShare is downloading. ×
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud

582

Published on

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
582
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
57
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Big Data Computations Using Elastic Data Processing in OpenStack Cloud Sergey Lukjanov (Mirantis) Alexander Ignatov (Mirantis) Trevor McKay (Red Hat)
  • 2. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  • 3. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  • 4. OpenStack Data Processing: Sahara Mission: To provide a scalable data processing stack and associated management interfaces. • provision and operate Hadoop clusters • schedule and operate Hadoop jobs
  • 5. Hadoop - Big Data Platform © http://hortonworks.com/hadoop/yarn/
  • 6. Trends http://www.google.com/trends/
  • 7. Architecture overview Data Sources Savanna Python Client RESTAPI Cluster Configuration Manager Horizon Keystone Auth Data Access Layer Swift Savanna Pages Hadoop VM Vendors Plugins Hadoop VM Hadoop VM Hadoop VM Resources Orchestration Manager Job Sources Job Manager Heat Nova Glance Cinder Neutron Trove DB
  • 8. Sahara status • Official integrated OpenStack project • Supported Hadoop distros: • Vanilla Apache Hadoop • Hortonworks Data Platform • Intel Distribution • Cloudera Distribution in blueprint • Included into OpenStack distros: • RDO - openstack.redhat.com • Mirantis OpenStack - software.mirantis.com
  • 9. Contributors
  • 10. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  • 11. Elastic Data Processing • EDP - API for executing MapReduce jobs on Hadoop clusters (similar to AWS EMR) • Supported data sources: Swift, HDFS, Ceph • Supported job types: Java actions, MapReduce, MapReduce.Streaming, Pig, Hive • Oozie for Hadoop jobs workflow management • Supports both Hadoop 1 & 2 • Job executions on transient clusters
  • 12. EDP Use Cases • Simplified task executions. You don’t need to know Hadoop! • Bursty workload: ad-hoc queries requiring a significant resource only for short time period • Utilization of free IaaS capacity for Hadoop tasks
  • 13. EDP - Data Sources Swift Sahara EDP INPUT OUTPUT Hadoop VM Hadoop VM Hadoop VM Hadoop VM swift://some_container/INPUT swift://some_container/OUTPUT
  • 14. EDP - Job Binaries Swift Sahara DB Sahara EDP internal-db://script.pig swift://some_container/mapreduce.jar 1. Pig, Hive scripts 2. Executable Jar files 3. Pluggable binaries and libraries
  • 15. EDP - Job Execution. Step 1 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig
  • 16. EDP - Job Execution. Step 2 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig JobTracker Oozie Hadoop VM Hadoop VM Hadoop VM
  • 17. EDP - Job Execution. Step 3 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM JobTracker Oozie Execute a job
  • 18. EDP - Job Execution. Step 4 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM JobTracker Oozie
  • 19. EDP - Job Execution. Step 5 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM workflow.xm l 1. Job-specific configurations 2. URLs to binaries 3. URLs for data sources 4. Credentials JobTracker Oozie
  • 20. EDP - Job Execution. Step 6 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM workflow.xm l Data Processing OUTPUT 1. Job-specific configurations 2. URLs to binaries 3. URLs for data sources 4. Credentials JobTracker Oozie
  • 21. EDP - Job Execution. Step 7 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM workflow.xm l 1. Job-specific configurations 2. URLs to binaries 3. URLs for data sources 4. Credentials Data Processing OUTPUT JobTracker Oozie
  • 22. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  • 23. EDP BigPetStore Demo BigPetStore is now part of Apache BigTop • Test/demo laboratory for all things Hadoop • Actively developed with integration testing • Generates and processes data of arbitrary size • git clone git://git.apache.org/bigtop.git • Filed under bigtop/bigtop-bigpetstore
  • 24. EDP BigPetStore Demo What are we going to do? • Generate 1M records of pet supply purchases • Clean the data (“dirty CSV”) • Extract cumulative counts by state • Demonstrates Sahara EDP objects • Job Binaries • Jobs (Java and Pig) • Data Sources
  • 25. EDP BigPetStore Sample Data Generated Data (first job) $ hadoop fs -cat bigpetstore/gen/part-r-00000 | more BigPetStore,storeCode_AK,1 deanna,booker,Sun Jan 18 20:50:06 GMT+00:00 1970,7.5,cat-food BigPetStore,storeCode_AK,10 erica,buck,Thu Dec 25 16:29:28 GMT+00:00 1969,10.5,dog-food Cleaned Data (second job) $ hadoop fs -cat bigpetstore/clean/part-m-00000 | more BigPetStore storeCode_AK 1 deanna booker Sun Jan 18 20:50:06 GMT+00:00 1970 7.5 cat-food BigPetStore storeCode_AK 10 erica buck Thu Dec 25 16:29:28 GMT+00:00 1969 10.5 dog-food
  • 26. EDP BigPetStore Sample Data Summed Data For Products by State (3rd job) $ hadoop fs -cat bigpetstore/analyze_rel/part-r-00000 | more US-AK cat-food 24837 US-AK dog-food 24994 US-AK fuzzy-collar 25145 US-AK antelope-caller 25024 US-AZ cat-food 25106 US-AZ dog-food 25064 US-AZ leather-collar 24870 US-AZ snake-bite ointment 24960
  • 27. What Next for EDP Potential Areas for Development within EDP • Pluggable Job Execution Model • Allows Sahara to run jobs with additional execution engines • Current Oozie offerings become one of multiple options • Expand Capabilities via Oozie • Support upload of user-written Oozie workflows • Support for coordinated jobs • Enhanced Usability • Better Error Reporting • User Experience (UI, CLI, API) Please, send us your feedback! Ideas are always welcome • #openstack-sahara on freenode • openstack-dev@lists.openstack.org with [openstack-dev][sahara] subject
  • 28. Design Summit Sessions 7 Sessions: Thursday 1:30 - Friday 10:30 http://goo.gl/lQXtUS
  • 29. Q&A
  • 30. Thank you!

×