2012 09-08-josug-jeff
Upcoming SlideShare
Loading in...5

2012 09-08-josug-jeff






Total Views
Views on SlideShare
Embed Views



4 Embeds 40

http://xzheng.net 29
https://twitter.com 6
https://www.facebook.com 3
http://www.facebook.com 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

2012 09-08-josug-jeff 2012 09-08-josug-jeff Presentation Transcript

  • OSC 2012 Tokyo openstack Open source software to build public and private clouds. Hadoop on OpenStack Swift - Experiment of using swift as storage for Apache Hadoop 2012.09.08 OpenStack Japan Zheng Xu 1
  • Self introduction ● Software designer(engineer) for embedded system and web system(60%hobbit, 40%job). ● Major: openstack, linux, web browser, html, epub, OSS ● Contact ● @xz911 ● https://www.facebook.com/xuzheng2001 2
  • Abstract● This slide is to introduce how to use OpenStack Swift as storage service for Apache Hadoop instead of HDFS(which is storage service of Hadoop project).● This slide is based on http://bigdatacraft.com/archives/349, and really appreciate Constantine Peresypkin and David Gruzman for providing their idea and implementation. 3
  • Agenda● OpenStack Swift● Apache Hadoop and HDFS● Experiment of replacing HDFS by OpenStack Swift 4
  • What is OpenStack and Swift From http://www.openstack.org/ 5
  • What is OpenStack and Swift User Application http Proxy Server Proxy Server http Account Server Account Server Account Server httpContainer Server Container Server Container ServerObject Server Object Server Object Server Object Server 6
  • What is OpenStack and Swift● OpenSource written in Python● diversity ● Swift can be a part of OpenStack or an individual service it self.● zones, devices, partitions, and replicas● No SPOF 7
  • Agenda● OpenStack Swift● Apache Hadoop and HDFS● Experiment of replacing HDFS by OpenStack Swift 8
  • Apache Hadoop and HDFS From http://hadoop.apache.org/ 9
  • Apache Hadoop and HDFS User Application Map-Reduce Name Node Hive Data Node Data Node Data Node 10
  • Agenda● OpenStack Swift● Apache Hadoop and HDFS● Experiment of replacing HDFS by OpenStack Swift 11
  • Experiment (Concept) User Application Map-Reduce Name Node Hive Data Node Data Node Data Node 12
  • Experiment (Concept) User Application Map-Reduce java-cloudfiles java-cloudfiles Hive http java-cloudfiles Data Node Data Node Swift 13
  • Experiment (Software) ● Swift v1.6 ● https://github.com/openstack/swift.git ● r21616cf, Jul 25 ● Java Client java-cloudfiles ● https://github.com/rackspace/java-cloudfiles ● r0807fa6, Jun 4 ● Apache Hadoop ● 1.0.3 ● Swift fs for Apache Hadoop(just part of following source code) ● https://github.com/Dazo-org/hadoop-common.git (branch-0.20-security- 205.swift ) 14
  • Experiment (infra) 15
  • Experiment(install swift)● Install swift based on http://docs.openstack.org/developer/swift/development_saio.html● Do not forget to set bind_ip of proxy-server.conf ● in my case ● Suppose we have username as "test:tester" with password as "testing", the account name is AUTH_test and have some container based on steps in above Url. 16
  • Experiment (cloudfiles)● Run "ant compile"● Change cloudfiles.properties to following # Auth info auth_url= auth_token_name=X-Auth-Token #auth_user_header=X-Storage-User #auth_pass_header=X-Storage-Pass # user properties username=test:tester password=testing # cloudfs properties version=v1 connection_timeout=15000 17
  • Experiment(cloudfiles)● Connect cloudfiles to swift(this is option) ● Change cloudfiles.sh as following and run it to try connection with swift #!/bin/sh export CLASSPATH=lib/httpcore-4.1.4.jar:lib/commons-cli- 1.1.jar:lib/httpclient-4.1.3.jar:lib/commons-lang- 2.4.jar:lib/junit.jar:lib/commons-codec-1.3.jar:lib/commons-io- 1.4.jar:lib/commons-logging-1.1.1.jar:lib/log4j-1.2.15.jar:dist/java- cloudfiles.jar:. java com.rackspacecloud.client.cloudfiles.sample.FilesCli $@ 18
  • Experiment (cloudfiles)● Packaging java-cloudfiles to jar file for Apache Hadoop (clone java-cloudfiles to ~/java- cloudfiles) ● We need to put *.properties into java-cloudfiles.jar $ ant package $ cd cloudfiles/dist $ cp ../*.properties . $ rm java-cloudfiles.jar $ jar cvf java-cloudfiles.jar ./* 19
  • Experiment (hadoop)● Prepare ● download hadoop to ~/hadoop-1.0.3 (newest stable version of original hadoop) and git clone https://github.com/Dazo-org/hadoop-common.git to ~/hadoop-common (old hadoop source code with swift fs plugin) ● At ~/hadoop-1.0.3 (copy java-cloudfiles and related library to hadoop lib folder) – cd lib;cp ~/java-cloudfiles/cloudfiles/dist/java-cloudfiles.jar . – cp ~/java-cloudfiles/lib/httpc* . 20
  • Experiment (setting hadoop)● ./hadoop-1.0.3/src/core/core-default.xml ● Add following to make hadoop can recognize handle "swift://" schema to SwiftFileSystem class <property> <name>fs.swift.impl</name> <value>org.apache.hadoop.fs.swift.SwiftFileSystem</value> <description>The FileSystem for swift: uris.</description> </property> 21
  • Experiment (hadoop)● Copy implementation for swift fs to hadoop 1.0.3 and build ● cp -R ../hadoop- common/src/core/org/apache/hadoop/fs/swift ./src/core/org/apache/hadoop/fs ● ant 22
  • Experiment(hadoop setting)● ./conf/core-site.xml (part1) ● Add following property for example <property> <name>fs.swift.userName</name> <value>test:tester</value> </property> 23
  • Experiment (hadoop setting)● ./conf/core-site.xml (part2) ● Add following property for example <property> <name>fs.swift.userPassword</name> <value>testing</value> </property> <property> <name>fs.swift.acccountname</name> <value>AUTH_test</value> </property> 24
  • Experiment (hadoop setting)● ./conf/core-site.xml (part3) ● Add following property for example <property> <name>fs.swift.authUrl</name> <value></value> </property> <property> <name>fs.default.name</name> <value>swift://</value> </property> 25
  • Experiment (check swift fs)● At this time, we should can list account information via following command ● ./bin/hadoop -fs -ls / ● or ./bin/hadoop fs -put ./conf/core-site.xml /test_container/core-site.xml (test_container is a test container created after swift installed) 26
  • Finally● We installed swift for storage service of hadoop● We built origin java-cloudfiles and created packages for hadoop● We copied fs.swift plugin from https://github.com/Dazo-org/hadoop-common.git to new hadoop source tree and build hadoop● We set up core-site.xml of hadoop to connect to swift via java-cloudfiles 27
  • Thank you for listening. 28