Sector - Presentation at Cloud Computing & Its Applications 2009
 

Like this? Share it with your network

Share

Sector - Presentation at Cloud Computing & Its Applications 2009

on

  • 1,582 views

This is a presentation about Sector that I gave at the Cloud Computing and Its Applications (CCA 09) Workshop that took place in Chicago on October 20, 2009. Sector is an open source cloud computing ...

This is a presentation about Sector that I gave at the Cloud Computing and Its Applications (CCA 09) Workshop that took place in Chicago on October 20, 2009. Sector is an open source cloud computing framework designed for data intensive computing.

Statistics

Views

Total Views
1,582
Views on SlideShare
1,579
Embed Views
3

Actions

Likes
2
Downloads
43
Comments
0

1 Embed 3

http://www.slideshare.net 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Sector - Presentation at Cloud Computing & Its Applications 2009 Presentation Transcript

  • 1. Sector: An Open Source Cloud for Data Intensive Computing
    Robert Grossman
    University of Illinois at ChicagoOpen Data Group
    October 20, 2009
  • 2. Part 1. Sector
    2
    http://sector.sourceforge.net
  • 3. Sector Overview
    Sector is fastest open source large data cloud
    As measured by MalStone & Terasort
    Sector is easy to program
    Supports UDFs, MapReduce & Python over streams
    Sector is secure
    A HIPAA compliant Sector cloud is being set up
    Sector is reliable
    Sector v1.24 has a backup master node server
    3
  • 4. About Sector
    YunhongGu from the Laboratory for Advanced Computing at the University of Illinois at Chicago is the Lead Developer of Sector.
    Sector is open source (BSD License) and available from sector.sourceforge.net
    The current version is 1.24a
    4
  • 5. Target Configurations
    Sector is designed to run on racks of commodity computers
    Typical rack configuration today (Oct, 2009)
    Rack of 32 quad-core 1U computers
    Each computer has 4 x 1TB disks
    Each computer has 1 Gbps connection to a top of a rack switch
    Sometimes these are called Raywulf clusters
    5
  • 6. Google’s Large Data Cloud
    Compute Services
    Data Services
    Storage Services
    6
    Applications
    Google’s MapReduce
    Google’s BigTable
    Google File System (GFS)
    Google’s Stack
  • 7. Hadoop’s Large Data Cloud
    Compute Services
    Storage Services
    7
    Applications
    Hadoop’sMapReduce
    Data Services
    Hadoop Distributed File System (HDFS)
    Hadoop’s Stack
  • 8. Sector’s Large Data Cloud
    8
    Applications
    Compute Services
    Sphere’s UDFs
    Data Services
    Sector’s Distributed File System (SDFS)
    Storage Services
    UDP-based Data Transport Protocol (UDT)
    Routing & Transport Services
    Sector’s Stack
  • 9. Comparing Sector and Hadoop
    9
  • 10. Terasort - Sector vsHadoop Performance
    Sector/Sphere 1.24a, Hadoop 0.20.1 with no replication on Phase 2 of Open Cloud Testbed with co-located racks.
  • 11. MalStone (OCC-Developed Benchmark)
    Sector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack. Data consisted of 20 nodes with 500 million 100-byte records / node.
  • 12. How Do You Program A Data Center?
    12
  • 13. Idea 1 – Support UDF’s Over Data Center
    Think of MapReduce as
    Map acting on (text) records
    With fixed Shuffle and Sort
    Followed by Reducing acting on (text) records
    We generalize this framework as follows:
    Support a sequence of User Defined Functions (UDF) acting on segments (=chunks) of files.
    MapReduce is one special case consisting of a user defined Map, a system-defined shuffle and sort, and a user defined reduce
    In both cases, framework takes care of assigning nodes to process data, restarting failed processes, etc.
    13
  • 14. Applying UDF using Sector/Sphere
    14
    1. Split data
    Application
    Sphere Client
    Input
    stream
    SPE
    SPE
    SPE
    2. Locate & schedule Sphere Processing Engine (SPE)
    3. Collect results
    Output
    stream
  • 15. Sector Programming Model
    Sector dataset consists of one or more physical files
    Sphere applies User Defined Functions over streams of data consisting of data segments
    Data segments can be data records, collections of data records, or files
    Example of UDFs: Map function, Reduce function, Split function for CART, etc.
    Outputs of UDFs can be returned to originating node, written to local node, or shuffled to another node.
    15
  • 16. How Do Move Data in a Cloud & Between Clouds?
    16
    Option 1: Use TCP and close your eyes.
    Option 2: ?????
  • 17. Idea 2: Sector is Built on Top of UDT
    17
    • UDT is a specialized network transport protocol.
    • 18. UDT can take advantage of wide area high performance 10 Gbps network
    • 19. Sector is a wide area distributed file system built over UDT.
    • 20. Sector is layered over the native file system (vs being a block-based file system).
  • UDT Has Been Downloaded 25,000+ Times
    18
    Sterling Commerce
    udt.sourceforge.net
    Movie2Me
    Globus
    Power Folder
    Nifty TV
    http://udt.sourceforge.net
  • 21. (x)
    UDT
    Scalable TCP
    HighSpeed TCP
    AIMD (TCP NewReno)
    x
    Alternatives to TCP – Decreasing Increases AIMD Protocols
    increase of packet sending rate x
    decrease factor
  • 22. UDT Makes Wide Area Clouds Possible
    Using UDT, Sector can take advantage of wide area high performance networks (10+ Gbps)
    20
    10 Gbps per application
  • 23. What About Security?
    21
  • 24. Idea 3: Add Security From the Start
    Security Server
    Security server maintains information about users and slaves.
    User access control: password and client IP address.
    File level access control.
    Messages are encrypted over SSL. Certificate is used for authentication.
    Sector is HIPAA capable.
    Master
    Client
    SSL
    SSL
    AAA
    data
    Slaves
  • 25. For More Information About Sector
    YunhongGu and Robert L Grossman, Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data, Philosophical Transactions of the Royal Society A, Volume 367, Number 1897, pages 2429--2445, 2009
    http://arxiv.org/abs/0809.1181
    http://rsta.royalsocietypublishing.org/content/367/1897/2429
    23
  • 26. For Related Information
    Related information can be found at:
    blog.rgrossman.com
    www.rgrossman.com
    24
  • 27. Sector Sponsors