Your SlideShare is downloading. ×
0
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Sector - Presentation at Cloud Computing & Its Applications 2009

835

Published on

This is a presentation about Sector that I gave at the Cloud Computing and Its Applications (CCA 09) Workshop that took place in Chicago on October 20, 2009. Sector is an open source cloud computing …

This is a presentation about Sector that I gave at the Cloud Computing and Its Applications (CCA 09) Workshop that took place in Chicago on October 20, 2009. Sector is an open source cloud computing framework designed for data intensive computing.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
835
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
44
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Sector: An Open Source Cloud for Data Intensive Computing<br />Robert Grossman<br />University of Illinois at ChicagoOpen Data Group<br />October 20, 2009 <br />
  • 2. Part 1. Sector<br />2<br />http://sector.sourceforge.net<br />
  • 3. Sector Overview<br />Sector is fastest open source large data cloud<br />As measured by MalStone & Terasort<br />Sector is easy to program<br />Supports UDFs, MapReduce & Python over streams<br />Sector is secure<br />A HIPAA compliant Sector cloud is being set up<br />Sector is reliable<br />Sector v1.24 has a backup master node server<br />3<br />
  • 4. About Sector <br />YunhongGu from the Laboratory for Advanced Computing at the University of Illinois at Chicago is the Lead Developer of Sector.<br />Sector is open source (BSD License) and available from sector.sourceforge.net<br />The current version is 1.24a<br />4<br />
  • 5. Target Configurations<br />Sector is designed to run on racks of commodity computers<br />Typical rack configuration today (Oct, 2009)<br />Rack of 32 quad-core 1U computers<br />Each computer has 4 x 1TB disks<br />Each computer has 1 Gbps connection to a top of a rack switch<br />Sometimes these are called Raywulf clusters<br />5<br />
  • 6. Google’s Large Data Cloud<br />Compute Services<br />Data Services<br />Storage Services<br />6<br />Applications<br />Google’s MapReduce<br />Google’s BigTable<br />Google File System (GFS)<br />Google’s Stack<br />
  • 7. Hadoop’s Large Data Cloud<br />Compute Services<br />Storage Services<br />7<br />Applications<br />Hadoop’sMapReduce<br />Data Services<br />Hadoop Distributed File System (HDFS)<br />Hadoop’s Stack<br />
  • 8. Sector’s Large Data Cloud<br />8<br />Applications<br />Compute Services<br />Sphere’s UDFs<br />Data Services<br />Sector’s Distributed File System (SDFS)<br />Storage Services<br />UDP-based Data Transport Protocol (UDT)<br />Routing & Transport Services<br />Sector’s Stack<br />
  • 9. Comparing Sector and Hadoop<br />9<br />
  • 10. Terasort - Sector vsHadoop Performance<br />Sector/Sphere 1.24a, Hadoop 0.20.1 with no replication on Phase 2 of Open Cloud Testbed with co-located racks.<br />
  • 11. MalStone (OCC-Developed Benchmark)<br />Sector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack. Data consisted of 20 nodes with 500 million 100-byte records / node.<br />
  • 12. How Do You Program A Data Center?<br />12<br />
  • 13. Idea 1 – Support UDF’s Over Data Center<br />Think of MapReduce as<br />Map acting on (text) records<br />With fixed Shuffle and Sort<br />Followed by Reducing acting on (text) records<br />We generalize this framework as follows:<br />Support a sequence of User Defined Functions (UDF) acting on segments (=chunks) of files.<br />MapReduce is one special case consisting of a user defined Map, a system-defined shuffle and sort, and a user defined reduce<br />In both cases, framework takes care of assigning nodes to process data, restarting failed processes, etc.<br />13<br />
  • 14. Applying UDF using Sector/Sphere<br />14<br />1. Split data<br />Application<br />Sphere Client<br />Input <br />stream<br />SPE<br />SPE<br />SPE<br />2. Locate & schedule Sphere Processing Engine (SPE)<br />3. Collect results<br />Output<br />stream<br />
  • 15. Sector Programming Model<br />Sector dataset consists of one or more physical files<br />Sphere applies User Defined Functions over streams of data consisting of data segments<br />Data segments can be data records, collections of data records, or files<br />Example of UDFs: Map function, Reduce function, Split function for CART, etc.<br />Outputs of UDFs can be returned to originating node, written to local node, or shuffled to another node.<br />15<br />
  • 16. How Do Move Data in a Cloud & Between Clouds?<br />16<br />Option 1: Use TCP and close your eyes.<br />Option 2: ?????<br />
  • 17. Idea 2: Sector is Built on Top of UDT<br />17<br /><ul><li>UDT is a specialized network transport protocol.
  • 18. UDT can take advantage of wide area high performance 10 Gbps network
  • 19. Sector is a wide area distributed file system built over UDT.
  • 20. Sector is layered over the native file system (vs being a block-based file system).</li></li></ul><li>UDT Has Been Downloaded 25,000+ Times<br />18<br />Sterling Commerce<br />udt.sourceforge.net<br />Movie2Me<br />Globus<br />Power Folder<br />Nifty TV<br />http://udt.sourceforge.net<br />
  • 21. (x)<br />UDT<br />Scalable TCP<br />HighSpeed TCP<br />AIMD (TCP NewReno)<br />x<br />Alternatives to TCP – Decreasing Increases AIMD Protocols<br />increase of packet sending rate x<br />decrease factor<br />
  • 22. UDT Makes Wide Area Clouds Possible<br />Using UDT, Sector can take advantage of wide area high performance networks (10+ Gbps)<br />20<br />10 Gbps per application<br />
  • 23. What About Security?<br />21<br />
  • 24. Idea 3: Add Security From the Start<br />Security Server<br />Security server maintains information about users and slaves.<br />User access control: password and client IP address.<br />File level access control.<br />Messages are encrypted over SSL. Certificate is used for authentication.<br />Sector is HIPAA capable.<br />Master<br />Client<br />SSL<br />SSL<br />AAA<br />data<br />Slaves<br />
  • 25. For More Information About Sector<br />YunhongGu and Robert L Grossman, Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data, Philosophical Transactions of the Royal Society A, Volume 367, Number 1897, pages 2429--2445, 2009<br />http://arxiv.org/abs/0809.1181<br />http://rsta.royalsocietypublishing.org/content/367/1897/2429<br />23<br />
  • 26. For Related Information<br />Related information can be found at:<br />blog.rgrossman.com<br />www.rgrossman.com<br />24<br />
  • 27. Sector Sponsors<br />

×