Sector - Presentation at Cloud Computing & Its Applications 2009

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Sector - Presentation at Cloud Computing & Its Applications 2009 - Presentation Transcript

    1. Sector: An Open Source Cloud for Data Intensive Computing
      Robert Grossman
      University of Illinois at ChicagoOpen Data Group
      October 20, 2009
    2. Part 1. Sector
      2
      http://sector.sourceforge.net
    3. Sector Overview
      Sector is fastest open source large data cloud
      As measured by MalStone & Terasort
      Sector is easy to program
      Supports UDFs, MapReduce & Python over streams
      Sector is secure
      A HIPAA compliant Sector cloud is being set up
      Sector is reliable
      Sector v1.24 has a backup master node server
      3
    4. About Sector
      YunhongGu from the Laboratory for Advanced Computing at the University of Illinois at Chicago is the Lead Developer of Sector.
      Sector is open source (BSD License) and available from sector.sourceforge.net
      The current version is 1.24a
      4
    5. Target Configurations
      Sector is designed to run on racks of commodity computers
      Typical rack configuration today (Oct, 2009)
      Rack of 32 quad-core 1U computers
      Each computer has 4 x 1TB disks
      Each computer has 1 Gbps connection to a top of a rack switch
      Sometimes these are called Raywulf clusters
      5
    6. Google’s Large Data Cloud
      Compute Services
      Data Services
      Storage Services
      6
      Applications
      Google’s MapReduce
      Google’s BigTable
      Google File System (GFS)
      Google’s Stack
    7. Hadoop’s Large Data Cloud
      Compute Services
      Storage Services
      7
      Applications
      Hadoop’sMapReduce
      Data Services
      Hadoop Distributed File System (HDFS)
      Hadoop’s Stack
    8. Sector’s Large Data Cloud
      8
      Applications
      Compute Services
      Sphere’s UDFs
      Data Services
      Sector’s Distributed File System (SDFS)
      Storage Services
      UDP-based Data Transport Protocol (UDT)
      Routing & Transport Services
      Sector’s Stack
    9. Comparing Sector and Hadoop
      9
    10. Terasort - Sector vsHadoop Performance
      Sector/Sphere 1.24a, Hadoop 0.20.1 with no replication on Phase 2 of Open Cloud Testbed with co-located racks.
    11. MalStone (OCC-Developed Benchmark)
      Sector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack. Data consisted of 20 nodes with 500 million 100-byte records / node.
    12. How Do You Program A Data Center?
      12
    13. Idea 1 – Support UDF’s Over Data Center
      Think of MapReduce as
      Map acting on (text) records
      With fixed Shuffle and Sort
      Followed by Reducing acting on (text) records
      We generalize this framework as follows:
      Support a sequence of User Defined Functions (UDF) acting on segments (=chunks) of files.
      MapReduce is one special case consisting of a user defined Map, a system-defined shuffle and sort, and a user defined reduce
      In both cases, framework takes care of assigning nodes to process data, restarting failed processes, etc.
      13
    14. Applying UDF using Sector/Sphere
      14
      1. Split data
      Application
      Sphere Client
      Input
      stream
      SPE
      SPE
      SPE
      2. Locate & schedule Sphere Processing Engine (SPE)
      3. Collect results
      Output
      stream
    15. Sector Programming Model
      Sector dataset consists of one or more physical files
      Sphere applies User Defined Functions over streams of data consisting of data segments
      Data segments can be data records, collections of data records, or files
      Example of UDFs: Map function, Reduce function, Split function for CART, etc.
      Outputs of UDFs can be returned to originating node, written to local node, or shuffled to another node.
      15
    16. How Do Move Data in a Cloud & Between Clouds?
      16
      Option 1: Use TCP and close your eyes.
      Option 2: ?????
    17. Idea 2: Sector is Built on Top of UDT
      17
      • UDT is a specialized network transport protocol.
      • UDT can take advantage of wide area high performance 10 Gbps network
      • Sector is a wide area distributed file system built over UDT.
      • Sector is layered over the native file system (vs being a block-based file system).
    18. UDT Has Been Downloaded 25,000+ Times
      18
      Sterling Commerce
      udt.sourceforge.net
      Movie2Me
      Globus
      Power Folder
      Nifty TV
      http://udt.sourceforge.net
    19. (x)
      UDT
      Scalable TCP
      HighSpeed TCP
      AIMD (TCP NewReno)
      x
      Alternatives to TCP – Decreasing Increases AIMD Protocols
      increase of packet sending rate x
      decrease factor
    20. UDT Makes Wide Area Clouds Possible
      Using UDT, Sector can take advantage of wide area high performance networks (10+ Gbps)
      20
      10 Gbps per application
    21. What About Security?
      21
    22. Idea 3: Add Security From the Start
      Security Server
      Security server maintains information about users and slaves.
      User access control: password and client IP address.
      File level access control.
      Messages are encrypted over SSL. Certificate is used for authentication.
      Sector is HIPAA capable.
      Master
      Client
      SSL
      SSL
      AAA
      data
      Slaves
    23. For More Information About Sector
      YunhongGu and Robert L Grossman, Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data, Philosophical Transactions of the Royal Society A, Volume 367, Number 1897, pages 2429--2445, 2009
      http://arxiv.org/abs/0809.1181
      http://rsta.royalsocietypublishing.org/content/367/1897/2429
      23
    24. For Related Information
      Related information can be found at:
      blog.rgrossman.com
      www.rgrossman.com
      24
    25. Sector Sponsors

    + Robert GrossmanRobert Grossman, 1 month ago

    custom

    194 views, 1 favs, 0 embeds more stats

    This is a presentation about Sector that I gave at more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 194
      • 194 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 13
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories