Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
MANILA* AND SAHARA*: CROSSING
THE DESERT TO THE BIG DATA OASIS
Ethan Gafford, Red Hat
Jeff Applewhite, NetApp
Malini Bhand...
AGENDA
• Introduction
• Sahara Overview
• Manila Overview
• The goal for Sahara and Manila integration
• The approaches
•M...
Sahara: The Problem
Hadoop* (and Spark*, Storm*…) clusters are difficult to configure
Commodity hardware is cheap but requ...
Sahara: The Solution
Put it in a cloud!
Then have easy-to-use, standardized interfaces:
● To create clusters (reliably and...
Sahara: The API
5
Intel NetApp RedHat
Sahara: Architecture
6
Intel NetApp RedHat
Manila
7
Intel NetApp RedHat
Manila Overview
Manila Overview
8
Intel NetApp RedHat
Manila Share and Access APIs
Operation CLI Command Description
Create manila create Create a Manila share of specified siz...
Manila & Sahara
NetApp driver enabled*
10
Intel NetApp RedHat
The Goal for Sahara and Manila
Integration
To support as many as storage backends and protocols in Sahara as possible
11
I...
Sahara Data Processing Model in Kilo*
Host
Virtual Cluster
VM1 VM2
Computing Task
HDFS
Computing Task
HDFS
PATTERN 1:
Inte...
Sahara Data Processing Model in Liberty* and the future
PATTERN 4:
External HDFS via Manila*
PATTERN 5:
Local Storage with...
Manila HDFS Driver
Use Manila HDFS Driver as external storage in Sahara
14Intel NetApp RedHat
Data Node Data Node Data Node
Name Node
Manila*
Share
Compute2Compute1 Compute3
VM1 VM2 VM3 VM4
Tenant B
VM5 VM6
HDFS Driv...
Enable HDFS Driver in Manila
Step 1: Set up Manila configuration
• /etc/manila/manila.conf
• Make sure the login username ...
Add external HDFS as a Data Source in Sahara
• Make the user account - “hdfs” has been set up in HDFS side
• Sahara will u...
NFS Share Mounting
Binary storage and input / output data from Manila-provisioned NFS shares
18Intel NetApp RedHat
The Feature
• Mount Manila NFS shares to:
• All nodes in cluster
• Specific node groups (NN, etc.)
• Currently NFS-only
• ...
Use Case: Binary Data Storage
• “Job binaries”: *.jar, *.pig, etc.
•Comparatively small size
•Initial location irrelevant ...
Gluster Node Gluster Node Gluster Node
Manila*
Share
Compute2Compute1 Compute3
VM1 VM2 VM3 VM4
Tenant B
VM5 VM6
Any Driver...
Workflow: NFS Binary Storage and Input Data
1. Create manila NFS share
2. Place binary file on share at /absolute/path/to/...
Automatic Mounting
• API field necessary to mount for non-EDP users
• Sahara’s EDP API mounts needed shares to a long-
sta...
Automatic Mounting: Under the Hood
Framework Job Binaries Data Sources
All (Universal flow, per
cluster node)
Check to ens...
Screenshots
25Intel NetApp RedHat
26
Intel NetApp RedHat
27
Intel NetApp RedHat
28
Intel NetApp RedHat
29
Intel NetApp RedHat
NetApp Hadoop NFS Connector
Future Proposal: Use NetApp Hadoop NFS
Connector in Sahara
30Intel NetApp RedHat
31
NetApp NFS Connector - Architecture Overview
● NFS Client written in Java
● Implements the Hadoop filesystem API
● No c...
NFS Node NFS Node NFS Node
Manila
Share
Compute2Compute1 Compute3
VM1 VM2 VM3 VM4 VM5 VM6
NFS Driver
Sahara + Manila + Net...
33
● Deployment Choices
○ NFS(v3)
○ HDFS + NFS
● Open Source
● Snapshot, Flexclone
Snapmirror, and
Manila Disaster
Recover...
NetApp Hadoop NFS Plugin
Use NetApp NFS Connector to run Hadoop on your existing data
• $ hadoop jar <path-to-examples> ja...
Summary
● The choices:
a) Manila HDFS Driver
b) Manila NFS Share Mount
https://www.netapp.com/us/media/tr-4464.pdf
a) NetA...
36
http://netapp.github.io
For more information:
Participating in the Intel Passport Program?
37
Are you playing? Be sure to get your Passport
Stamp for attending this ses...
THANK YOU!
38Intel NetApp RedHat
Upcoming SlideShare
Loading in …5
×

20151027 sahara + manila final

471 views

Published on

Sahara and Manila: Crossing the Desert to the Big Data Oasis. This is presented by Intel, NetApp, and Red Hat in OpenStack Tokyo Summit.

Published in: Software
  • Be the first to comment

  • Be the first to like this

20151027 sahara + manila final

  1. 1. MANILA* AND SAHARA*: CROSSING THE DESERT TO THE BIG DATA OASIS Ethan Gafford, Red Hat Jeff Applewhite, NetApp Malini Bhandaru, Intel covering for Weiting Chen
  2. 2. AGENDA • Introduction • Sahara Overview • Manila Overview • The goal for Sahara and Manila integration • The approaches •Manila HDFS Driver •Manila NFS Share Mount •Manila + NetApp NFS Connector for Hadoop • Conclusion • Q&A 2Intel NetApp RedHat
  3. 3. Sahara: The Problem Hadoop* (and Spark*, Storm*…) clusters are difficult to configure Commodity hardware is cheap but requires frequent (costly) maintenance Reliable hardware is expensive, and a fixed-size cluster will cause contention Demand for data processing varies over time within an organization Baremetal clusters go down, and can be a single point of failure Hadoop dev is very difficult without a real cluster TL;DR: Data processing clusters are harder to provision and maintain than they should be, and it hurts. 3 Intel NetApp RedHat
  4. 4. Sahara: The Solution Put it in a cloud! Then have easy-to-use, standardized interfaces: ● To create clusters (reliably and repeatedly) ● To scale clusters ● To run data processing jobs ● On any popular data processing framework ● With sensible defaults that just work ● And sophisticated configuration management for expert users That's OpenStack* Sahara. 4 Intel NetApp RedHat
  5. 5. Sahara: The API 5 Intel NetApp RedHat
  6. 6. Sahara: Architecture 6 Intel NetApp RedHat Manila
  7. 7. 7 Intel NetApp RedHat Manila Overview
  8. 8. Manila Overview 8 Intel NetApp RedHat
  9. 9. Manila Share and Access APIs Operation CLI Command Description Create manila create Create a Manila share of specified size; optional name, availability zone, share type, share network, source snapshot Delete manila delete Delete an existing Manila share; the manila force-delete command may be required if the Manila share is in an error state Edit manila metadata Set or unset metadata on a Manila share List manila list List all Manila shares Show manila show Show details about a Manila share Operation CLI Command Description Allow manila access-allow Allow access to the specified share for the specified access type and value (IP address or IP network address in CIDR notation or Windows user name). Deny manila access-deny Deny access to the specified share for the specified access type and value (IP address or IP network address in CIDR notation or Windows user name). List manila access-list List all Manila share access rules 9 Intel NetApp RedHat
  10. 10. Manila & Sahara NetApp driver enabled* 10 Intel NetApp RedHat
  11. 11. The Goal for Sahara and Manila Integration To support as many as storage backends and protocols in Sahara as possible 11 Intel NetApp RedHat
  12. 12. Sahara Data Processing Model in Kilo* Host Virtual Cluster VM1 VM2 Computing Task HDFS Computing Task HDFS PATTERN 1: Internal HDFS in the same node Host Virtual Cluster VM1 VM2 Computing Task HDFS PATTERN 2: Internal HDFS in different nodes Host Virtual Cluster VM1 Computing Task Swift* PATTERN 3: Swift* Host 12Intel NetApp RedHat Compute and data reside together in the same instance in your Hadoop cluster. Compute and data reside in different instances. This is an elastic way to manage Hadoop clusters. In order to persist data, Sahara supports Swift to stream the data directly.
  13. 13. Sahara Data Processing Model in Liberty* and the future PATTERN 4: External HDFS via Manila* PATTERN 5: Local Storage with Diverse Storage Backend in Manila PATTERN 6: NFS Host Virtual Cluster VM1 Computing Task Host Manila Service HDFS Driver HDFS Host Virtual Cluster VM1 Computing Task Host Manila Service NFS Driver (Extensible) GlusterFS Local Volume Host Virtual Cluster VM1 Computing Task NFS Host NetApp* Hadoop NFS Connector Manila Service NFS Driver This feature will be implemented in Mitaka 13Intel NetApp RedHat Sahara can support external HDFS by using the HDFS driver in Manila. Use local storage in Hadoop and remote mount any type of file storage in Manila. NetApp Hadoop NFS Connector can bring the NFS capability into Hadoop.
  14. 14. Manila HDFS Driver Use Manila HDFS Driver as external storage in Sahara 14Intel NetApp RedHat
  15. 15. Data Node Data Node Data Node Name Node Manila* Share Compute2Compute1 Compute3 VM1 VM2 VM3 VM4 Tenant B VM5 VM6 HDFS Driver Use Case: Manila HDFS Driver Use Case ● Use external HDFS either in the same node w/ compute service or in a physical cluster Rationales For Use ● Use Manila HDFS driver to connect with HDFS ● Manila would help to create HDFS share The Advantages ● Use existing HDFS cluster ● Centralized managing HDFS via Manila Limitations ● Only support non-secured HDFS due to account management issue between OpenStack and Hadoop Reference: https://blueprints.launchpad.net/manila/+spec/hdfs-driver Tenant A Step1 Step2 Step3 User A User A User B HDFS HDFS HDFS 15 Intel NetApp RedHat
  16. 16. Enable HDFS Driver in Manila Step 1: Set up Manila configuration • /etc/manila/manila.conf • Make sure the login username and password are correct • Manila service needs to use the user to login HDFS and create the share folder by individual user Step 2: Restart Manila Service Reference: http://docs.openstack.org/developer/manila/devref/hdfs_native_driver.html 16 share_driver = manila.share.drivers.hdfs.hdfs_native.HDFSNativeShareDriver hdfs_namenode_ip = the IP address of the HDFS namenode. Only single namenode is supported now. hdfs_namenode_port = the port of the HDFS namenode service hdfs_ssh_port = HDFS namenode SSH port hdfs_ssh_name = HDFS namenode SSH login name hdfs_ssh_pw = HDFS namenode SSH login password, this parameter is not necessary, if the following hdfs_ssh_private_key is configured hdfs_ssh_private_key = Path to the HDFS namenode private key to ssh login … manila.conf example Intel NetApp RedHat
  17. 17. Add external HDFS as a Data Source in Sahara • Make the user account - “hdfs” has been set up in HDFS side • Sahara will use “hdfs” user to access external HDFS by default. You can still set up your own user account in Sahara as well. • Add external HDFS Location as a data source in Sahara Limitation No need for user account setup since currently it can only support non- secured HDFS 17Intel NetApp RedHat
  18. 18. NFS Share Mounting Binary storage and input / output data from Manila-provisioned NFS shares 18Intel NetApp RedHat
  19. 19. The Feature • Mount Manila NFS shares to: • All nodes in cluster • Specific node groups (NN, etc.) • Currently NFS-only • Extensible to other share types • API (see right) • Path and access defaults shown • Only id field needed shares: {[ { “id”: “uuid”, “path”: “/mnt/uuid”, “access_level”: “rw” } ]} 19Intel NetApp RedHat
  20. 20. Use Case: Binary Data Storage • “Job binaries”: *.jar, *.pig, etc. •Comparatively small size •Initial location irrelevant to perf • Previous storage options in Sahara •Swift (still available) •Sahara DB (as blobs in SQL table) • Rationales for NFS storage •Version control directly on storage FS •Long-term storage for use by transient clusters 20Intel NetApp RedHat
  21. 21. Gluster Node Gluster Node Gluster Node Manila* Share Compute2Compute1 Compute3 VM1 VM2 VM3 VM4 Tenant B VM5 VM6 Any Drivers Use Case: Input / Output Data Previous options in Sahara ● Cluster-internal HDFS ● External HDFS ● Swift Rationales for use ● Standard FS access to data ● Convenient in many cases Data copy necessary ● Similar to built-in hadoop fs -put operation ● Irrelevant in heavily reduced output or small input case ● In large input case, network transfer is a consideration Reference: https://blueprints.launchpad.net/sahara/+spec/manila-as-a-data- source Tenant A LocalLocal LocalLocalLocalLocal Step1 Gluster-Volume Gluster-Volume Gluster-Volume Use GlusterFS as an example Step2 Step3 21 Intel NetApp RedHat
  22. 22. Workflow: NFS Binary Storage and Input Data 1. Create manila NFS share 2. Place binary file on share at /absolute/path/to/binary.jar 3. Create sahara job binary object with path reference manila://share_uuid/absolute/path/to/binary.jar 4. Utilize job binary in job template (per normal) 5. Create sahara data source with path reference manila://share_uuid/absolute/path/to/input_dir 6. Run job from template using data source 22Intel NetApp RedHat
  23. 23. Automatic Mounting • API field necessary to mount for non-EDP users • Sahara’s EDP API mounts needed shares to a long- standing cluster when a job references any data source or binary on that share • Uses defaults for permissions: rw and path: /mnt/share_uuid/ 23Intel NetApp RedHat
  24. 24. Automatic Mounting: Under the Hood Framework Job Binaries Data Sources All (Universal flow, per cluster node) Check to ensure required shares are mounted. If not: 1) Install nfs-common (Debian*) or nfs-utils (Red Hat) if not present 2) Get remote path for share UUID from Manila 3) Manila: access-allow for each required ip in cluster (if access does not exist) 4) mount -t nfs %(access_arg)s %(remote_path)s %(local_path)s All (Universal flow) Translate manila://uuid/absolute/path to /local_path/absolute/path Translate manila://uuid/absolute/path to file:///local_path/absolute/path Hadoop (w/ Oozie) hadoop fs -copy-from-local into workflow directory; referenced as filesystem paths in workflow Use file URL in Oozie workflow document (as named job parameter or positional argument) Spark Referenced by local filesystem path in spark- submit call Use file URL in spark-submit call (as positional argument) Storm Referenced as filesystem paths in storm jar call Use file URL in storm jar call (as positional argument) 24Intel NetApp RedHat
  25. 25. Screenshots 25Intel NetApp RedHat
  26. 26. 26 Intel NetApp RedHat
  27. 27. 27 Intel NetApp RedHat
  28. 28. 28 Intel NetApp RedHat
  29. 29. 29 Intel NetApp RedHat
  30. 30. NetApp Hadoop NFS Connector Future Proposal: Use NetApp Hadoop NFS Connector in Sahara 30Intel NetApp RedHat
  31. 31. 31 NetApp NFS Connector - Architecture Overview ● NFS Client written in Java ● Implements the Hadoop filesystem API ● No changes to Hadoop framework ● No changes to user programs ● Eliminates copying data into HDFS ● Optimized performance for NFS access Intel NetApp RedHat
  32. 32. NFS Node NFS Node NFS Node Manila Share Compute2Compute1 Compute3 VM1 VM2 VM3 VM4 VM5 VM6 NFS Driver Sahara + Manila + NetApp NFS Connector How to use 1. Use Manila to expose the NFS share 2. NetApp Hadoop NFS Connector as “interface” to shared data The Advantages ● NFS is one of the most common storage protocols used in IT ● A direct way to communicate and process data instead of using HDFS Reference: https://blueprints.launchpad.net/sahara/+spec/nfs-as-a-data- source NetApp NFS Driver NetApp NFS Driver NetApp NFS Driver NetApp NFS Driver NetApp NFS Driver NetApp NFS Driver Step1 NFS Folder NFS Folder NFS Folder Step2 Step3 32 Intel NetApp RedHat Tenant BTenant A Use Case ● NFS protocol to access data for Hadoop
  33. 33. 33 ● Deployment Choices ○ NFS(v3) ○ HDFS + NFS ● Open Source ● Snapshot, Flexclone Snapmirror, and Manila Disaster Recovery (Mitaka) Intel NetApp RedHat NetApp NFS Connector
  34. 34. NetApp Hadoop NFS Plugin Use NetApp NFS Connector to run Hadoop on your existing data • $ hadoop jar <path-to-examples> jar terasort nfs://<nfs-server- hostname>:2049/tera/in /tera/out • $ hadoop jar <path-to-examples> jar terasort nfs://<nfs-server- hostname>:2049/tera/in nfs://<nfs-server-hostname>:2049/tera/out Reference: 1. http://www.netapp.com/us/solutions/big-data/nfs-connector-hadoop.aspx 2. https://github.com/NetApp/NetApp-Hadoop-NFS-Connector 34Intel NetApp RedHat
  35. 35. Summary ● The choices: a) Manila HDFS Driver b) Manila NFS Share Mount https://www.netapp.com/us/media/tr-4464.pdf a) NetApp NFS Connector for Hadoop https://github.com/NetApp/NetApp-Hadoop-NFS-Connector 35 Intel NetApp RedHat Sahara and Manila: Access the Big Data Oasis
  36. 36. 36 http://netapp.github.io For more information:
  37. 37. Participating in the Intel Passport Program? 37 Are you playing? Be sure to get your Passport Stamp for attending this session! See me or my helper in the back at the end! Not Playing yet? What are you waiting for? See me or my helper in the back at the end and we can get you started! Don’t forget to return your stamped passport to the Intel Booth #H3 to enter our raffle drawing! 3 Stamps = 1 Raffle Ticket Intel NetApp RedHat
  38. 38. THANK YOU! 38Intel NetApp RedHat

×