ClassCloud: switch your PC Classroom into Cloud Testbed

  • 1,834 views
Uploaded on

Cloud Computing is a growing research topic in recent years. The key concept of Cloud Computing is to provide a resource sharing model based on virtualization, distributed file system, parallel …

Cloud Computing is a growing research topic in recent years. The key concept of Cloud Computing is to provide a resource sharing model based on virtualization, distributed file system, parallel algorithm and web services. But how can we provide a testbed for cloud computing related training courses? In this talk we will share our experience to build cloud computing testbed for virtualization, high throughput computing and bioinformatics applications. It covers lots of open source projects, such as DRBL, Xen, Hadoop and bioinformatics related applications.

In short, Diskless Remote Boot in Linux (DRBL) provides a diskless or systemless environment for client machines. It works on Debian, Ubuntu, Mandriva, Red Hat, Fedora, CentOS and SuSE. DRBL uses distributed hardware resources and makes it possible for clients to fully access local hardware.

Xen is one of open source hypervisor for linux kernel. It had been used in Amazon EC2 production environment to provide cloud service model (1) — "Infrastructure as a Service (IaaS)". In this talk, we will show you how DRBL can help on fast deployment of Xen playground in classroom.

Hadoop is becoming the well-known open source cloud computing technology developed by Apache community. It is very power tool for data mining. It had been used in Yahoo and Facebook production environment to provide cloud service model (2) — "Platform as a Service (PaaS)". It’s easy to setup single hadoop node but difficult to manage a hadoop cluster. In this talk, we will show you how DRBL can help on fast deployment and management.

Most bioinformatics applications are open source, such as R, Bioconductor, BLAST, Clustal, PipMaker, Phylip, etc. But it also require traditional cluster job submission. In this talk we will show you how DRBL can help to build a testbed of bioinformatics research and provide cloud service model (3) — "Software as a Service (SaaS)". In this talk, we will cover how to:

- 1. Use DRBL to deploy Xen virtual cluster (drbl-xen)
- 2. Use DRBL to deploy Hadoop cluster (drbl-hadoop)
- 3. Use DRBL to deploy bioinformatics cluster (drbl-biocluster)

A live demonstration about drbl-hadoop and drbl-biocluster will be done in the talk, too.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,834
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
135
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. ClassCloud: switch your PC classroom into Cloud Computing Testbed for Scientific Education Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw
  • 2. ClassCloud: turn your PC classroom into Cloud Testbed for Education PART 1 : ( 50 % ) What is Cloud Computing? PART 2 : ( 25 % ) What is DRBL? PART 3 : ( 25 % ) How we use DRBL to deploy Cloud ? - IaaS : Virtaulization (DRBL-Xen) - PaaS : Data Processing (DRBL-Hadoop) - SaaS : Bioinformatics (DRBL-biocluster)
  • 3. Part 1 : the trend of Cloud Computing Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw 3
  • 4. What is Cloud Computing ? Could we have a simple definition ? Is it about buying NEW Hardware and Software? Is it a trap to another bubble economy ? Cloud Computing is as simple as 5..4..3..2..1... 4
  • 5. National Definition of Cloud Computing 5 Characteristics Detail definition: http://csrc.nist.gov/ 4 Deployment Models groups/SNS/cloud- computing/cloud- def-v15.doc 3 Service Models On-demand self-service. Rapid elasticity Broad network access Measured Service Resource pooling 5
  • 6. 4 Deployment Models of Cloud Computing Dynamic Resource Public Cloud Provisioning between Public Data multiple clouds Non-sensitive Target Market is S.M.B. Hybrid Enterprise is Cloud key market Sensitive Data Community Cloud Data for Sharing Private Cloud Academia 6
  • 7. 3 Service Models of Cloud Computing IaaS Infrastructure as a Service PaaS Platform as a Service SaaS Software as a Service 7
  • 8. 2 R&D directions : Cloud or Device d l ou C e ic Centerized , Enterprise D ev Diversify , SMB 8
  • 9. One key spirit of Cloud Computing Anytime Key spirit of Cloud ~ Everything as a Service !! Anywhere With Any Devices Accessing Services via Network Cloud Computing =~ Network Computing 9
  • 10. CIO 2010 : Virtualization, Cloud and Web 2.0 10 Source: Gartner Executive Programs : “ Leading in Times of Transition: The 2010 CIO Agenda ”
  • 11. Is Cloud the trend of next 10 years ? Is Cloud too HOT in Asia-Pacific Area ?! 11
  • 12. Brief History of Computing Source: http://mmdays.com/2008/02/14/cloud-computing/ Mainframe PC / Linux Internet Virtual Org. Data Explode Super Cluster Distributed Grid Cloud Computer Parallel Computing Computing Computing 12
  • 13. 2007 Data Explore Top 1 : Human Genomics – 7000 PB / Year Top 2 : Digital Photos – 1000 PB+/ Year Top 3 : E-mail (no Spam) – 300 PB+ / Year Source: http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf 13 Source: http://lib.stanford.edu/files/see_pasig_dic.pdf
  • 14. How can we build our Private Cloud ?? Public Cloud Public Data Non-sensitive Target Market is S.M.B. Hybrid Enterprise is Cloud key market Sensitive Data Community Cloud Data for Sharing Private Cloud Academia 14
  • 15. Reference Cloud Architecture Application User-Level Social Computing, Enterprise, ISV,… Programming User-Level Web 2.0, Mashups, Workflows, … Middleware SaaS Management Qos Neqotiation, Ddmission Control, PaaS Pricing, SLA Management, Metering… Core Middleware IaaS Virtualization VM, VM management and Deployment Physical Hardware System Level 15 Infrastructure: Computer, Storage, Network
  • 16. Open Source for Private Cloud Application eyeOS, Nutch, ICAS, Social Computing, Enterprise, ISV,… X-RIME, ... Programming Hadoop (MapReduce), Web 2.0, Mashups, Workflows, … Sector/Sphere, AppScale Management OpenNebula, Enomaly, Qos Neqotiation, Ddmission Control, Eucalyptus , OpenQRM, ... Pricing, SLA Management, Metering… Virtualization Xen, KVM, VirtualBox, VM, VM management and Deployment QEMU, OpenVZ, ... Physical Hardware 16 Infrastructure: Computer, Storage, Network
  • 17. Part 2 : Introduction to DRBL Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw 17
  • 18. What is DRBL ?? • Diskless Remote Boot in Linux • Network is cheap, and our time is expansive • In simple words, DRBL is ..... – Replace IDE/SATA cable with network cable – 40+ student PCs connected to one DRBL server Diskfull PC = + + Diskless PC Server source: http://www.mren.com.tw
  • 19. At First, We have “ 4 + 1 ” PC Cluster It'd better be Manage 2 n Scheduler
  • 20. Then, We connect 5 PCs with Gigabit Ethernet Switch 10/100/1000 GiE Switch MBps Add 1 NIC WAN for WAN
  • 21. Compute Nodes 4 Compute Nodes will communicate via LAN Switch. Only Manage Node have Internet Access for Security! WAN Manage Node
  • 22. Compute Nodes Messaging Account Mgnt. Basic MPICH SSHD NIS YP System GCC GNU Libc Setup Bash for Perl Kernel Module Linux Kernel Cluster Boot Loader
  • 23. On Manage Node, We need to install Scheduler and Network File System for sharing Files with Compute Node Job Mgnt. Messaging Account Mgnt. OpenPBS MPICH SSHD NIS YP File Sharing GCC GNU Libc NFS Bash Perl Kernel Module Extra Linux Kernel Boot Loader
  • 24. 1st, We install Base System of GNU/ Linux on Management Node. You can choose: Redhat, Fedora, CentOS, Mandriva, Ubuntu, Debian, ... GNU Libc Kernel Module Linux Kernel Boot Loader
  • 25. 2nd, We install DRBL package and configure it as DRBL Server. There are lots of service needed: SSHD, DHCPD, TFTPD, NFS Server, NIS Server, YP Server ... Network Booting Account Mgnt. NFS TFTPD DHCPD SSHD NIS YP Perl Bash GNU Libc DRBL Server based on existing Kernel Module Open Source and Linux Kernel keep Hacking! Boot Loader
  • 26. After running “drblsrv -i” & “drblpush -i”, there will be pxelinux, vmlinux-pex, initrd-pxe in TFTPROOT, and different configuration files for each Compute Node in NFSROOT NFS TFTPD DHCPD SSHD NIS YP Config. Files GNU Libc Ex. hostname initrd-pxe Kernel Module vmlinuz-pxe Linux Kernel pxelinux Boot Loader
  • 27. 3nd, We enable PXE function in BIOS configuration. BIOS PXE BIOS PXE BIOS PXE BIOS PXE NFS TFTPD DHCPD SSHD NIS YP Config. Files GNU Libc Ex. hostname initrd-pxe Kernel Module vmlinuz-pxe Linux Kernel pxelinux Boot Loader
  • 28. While Booting, PXE will query IP address from DHCPD. BIOS PXE BIOS PXE BIOS PXE BIOS PXE NFS TFTPD DHCPD SSHD NIS YP Config. Files GNU Libc Ex. hostname initrd-pxe Kernel Module vmlinuz-pxe Linux Kernel pxelinux Boot Loader
  • 29. While Booting, PXE will query IP address from DHCPD. IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP Config. Files GNU Libc Ex. hostname initrd-pxe Kernel Module vmlinuz-pxe Linux Kernel pxelinux Boot Loader
  • 30. After PXE get its IP address, it will download booting files from TFTPD. IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP Config. Files GNU Libc Ex. hostname initrd-pxe Kernel Module vmlinuz-pxe Linux Kernel pxelinux Boot Loader
  • 31. initrd initrd initrd initrd vmlinuz vmlinuz vmlinuz vmlinuz pxelinux pxelinux pxelinux pxelinux IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP Config. Files GNU Libc Ex. hostname initrd-pxe Kernel Module vmlinuz-pxe Linux Kernel pxelinux Boot Loader
  • 32. initrd initrd initrd initrd vmlinuz vmlinuz vmlinuz vmlinuz pxelinux pxelinux pxelinux pxelinux IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP Config. Files GNU Libc After downloading booting Ex. hostname files, initrd-pxe in initrd-pxe will config scripts Kernel Module NFSROOT for each Compute Node. vmlinuz-pxe Linux Kernel pxelinux Boot Loader
  • 33. Config. 1 Config. 2 Config. 3 Config. 4 initrd initrd initrd initrd vmlinuz vmlinuz vmlinuz vmlinuz pxelinux pxelinux pxelinux pxelinux IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP Config. Files GNU Libc Ex. hostname initrd-pxe Kernel Module vmlinuz-pxe Linux Kernel pxelinux Boot Loader
  • 34. Perl Perl Perl Perl Bash Bash Bash Bash SSHD SSHD SSHD SSHD Applications and Services will also deployed to each Compute Node via NFS .... NFS TFTPD DHCPD SSHD NIS YP Perl Bash DRBL Server
  • 35. SSHD SSHD SSHD SSHD With the help of NIS and YP, You can login each Compute Node with the Same ID / PASSWORD stored in DRBL Server! SSH Client NFS TFTPD DHCPD SSHD NIS YP DRBL Server
  • 36. Part 3 : How we use DRBL to deploy Cloud Testbed ? Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw 36
  • 37. Building IaaS using DRBL-Xen Application eyeOS, Nutch, ICAS, Social Computing, Enterprise, ISV,… X-RIME, ... Programming Hadoop (MapReduce), Web 2.0, Mashups, Workflows, … Sector/Sphere, AppScale Management OpenNebula, Enomaly, Qos Neqotiation, Ddmission Control, Eucalyptus , OpenQRM, ... Pricing, SLA Management, Metering… Virtualization Xen, KVM, VirtualBox, VM, VM management and Deployment QEMU, OpenVZ, ... Physical Hardware 37 Infrastructure: Computer, Storage, Network
  • 38. Virtualization ?? Emulator ?? Virtual Hardware / OS QEMU mame4iphone Mac4Lin 38 Physical Hardware / OS
  • 39. What is Virtualization ?? Application Virtualization Ex. VMWare ThinApp Source: http://en.wikipedia.org/wiki/Virtualization Desktop Virtualization Client Virtualization Ex. XenDesktop Presentation Virtualization Ex. VNC, M$ RDP Database Virtualization OS-level Virtualization Ex. Xen, KVM Data Virtualization Network Virtualization Ex. OpenFlow Storage Virtualization Ex. NetApp 39
  • 40.    Open Cloud #1:   Eucalyptus • http://open.eucalyptus.com/ • It was a research project of UCSB, USA • Now Eucalyptus System provide technical supports. • It designed to help user to build their own Amazon EC2 • Its feature is compatible with existing EC2 client. • Ubuntu Enterprise Cloud powered by Eucalyptus in 9.04 • You can register trail account at http://open.eucalyptus.com/ • Cons:you might need to type commands in some case
  • 41.    Open Cloud #2:   OpenNebula • http://www.opennebula.org • Sponsor by European Union FP7 • Turn Physical Cluster into Virtual Cluster • manage status, scheduling and migration of virtual cluster • Ubuntu 9.04 provide package of opennebula • Cons:You need to type commands to check or migration
  • 42. Building IaaS using DRBL-Xen • DRBL-Xen is still need more work to intergrate into DRBL • Manual procedure could be found at – http://trac.nchc.org.tw/grid/wiki/jazz/DRBL_Xen
  • 43. Building PaaS using DRBL-Hadoop Application eyeOS, Nutch, ICAS, Social Computing, Enterprise, ISV,… X-RIME, ... Programming Hadoop (MapReduce), Web 2.0, Mashups, Workflows, … Sector/Sphere, AppScale Management OpenNebula, Enomaly, Qos Neqotiation, Ddmission Control, Eucalyptus , OpenQRM, ... Pricing, SLA Management, Metering… Virtualization Xen, KVM, VirtualBox, VM, VM management and Deployment QEMU, OpenVZ, ... Physical Hardware 43 Infrastructure: Computer, Storage, Network
  • 44.    Open Cloud #3:   Hadoop • http://hadoop.apache.org • Hadoop is Apache Top Level Project • Major sponsor is Yahoo! • Developed by Doug Cutting • Written by Java, it provides HDFS and MapReduce API • Used in Yahoo since year 2006 • It had been deploy to 4000+ nodes in Yahoo • Design to process dataset in Petabyte • Facebook、Last.fm、Joost are also powered by Hadoop
  • 45.    Open Cloud #4:   Sector / Sphere • http://sector.sourceforge.net/ • Developed by National Center for Data Mining, USA • Written by C/C++, so performance is better than Hadoop • Provide file system similar to Google File System and MapReduce API • Based on UDT which enhance the network performance • Open Cloud Consortium provide Open Cloud Testbed and develop MalStone toolkit for benchmark
  • 46. Building PaaS using DRBL-Hadoop • Used in http://hadoop.nchc.org.tw • drbl-hadoop – mount local disk for HDFS and MapReduce svn co http://trac.nchc.org.tw/pub/grid/drbl-hadoop • hadoop-register – web interface with ssh applet svn co http://trac.nchc.org.tw/pub/cloud/hadoop-register
  • 47. Demo : hadoop.nchc.org.tw for multi-users • DRBL Server x 1 (hadoop) • DRBL Client x 19 (hadoop101~hadoop119) • Based on Cloudera Debian package and enhance security setting and permission for multi-users.
  • 48. Building SaaS using DRBL-biocluster • Need more time to package related software. • drbl-biocluster – batch script of Debian to install bioinformatics related softwares • svn co http://trac.nchc.org.tw/pub/grid/drbl-biocluster • Including DRBL 、 MPICH2 、 R 、 Rmpi 、 BioCondoctor 、 Ganglia 、 Nagios 、 AutoFACT 、 BLAST 、 SIM4 、 Clustal 、 PipMaker 、 Phylip 、 Eland 、 Velvet 、 Bowtie 、 SOAP
  • 49. Attribution-Noncommercial-Share Alike 3.0 Taiwan http://creativecommons.org/licenses/by-nc-sa/3.0/tw/ These slides could be distributed by Creative Commons License. 49
  • 50. Questions? Slides - http://trac.nchc.org.tw/cloud Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw