This is a presentation about Sector that I gave at the Cloud Computing and Its Applications (CCA 09) Workshop that took place in Chicago on October 20, 2009. Sector is an open source cloud computing framework designed for data intensive computing.
3. Sector Overview Sector is fastest open source large data cloud As measured by MalStone & Terasort Sector is easy to program Supports UDFs, MapReduce & Python over streams Sector is secure A HIPAA compliant Sector cloud is being set up Sector is reliable Sector v1.24 has a backup master node server 3
4. About Sector YunhongGu from the Laboratory for Advanced Computing at the University of Illinois at Chicago is the Lead Developer of Sector. Sector is open source (BSD License) and available from sector.sourceforge.net The current version is 1.24a 4
5. Target Configurations Sector is designed to run on racks of commodity computers Typical rack configuration today (Oct, 2009) Rack of 32 quad-core 1U computers Each computer has 4 x 1TB disks Each computer has 1 Gbps connection to a top of a rack switch Sometimes these are called Raywulf clusters 5
6. Google’s Large Data Cloud Compute Services Data Services Storage Services 6 Applications Google’s MapReduce Google’s BigTable Google File System (GFS) Google’s Stack
7. Hadoop’s Large Data Cloud Compute Services Storage Services 7 Applications Hadoop’sMapReduce Data Services Hadoop Distributed File System (HDFS) Hadoop’s Stack
8. Sector’s Large Data Cloud 8 Applications Compute Services Sphere’s UDFs Data Services Sector’s Distributed File System (SDFS) Storage Services UDP-based Data Transport Protocol (UDT) Routing & Transport Services Sector’s Stack
10. Terasort - Sector vsHadoop Performance Sector/Sphere 1.24a, Hadoop 0.20.1 with no replication on Phase 2 of Open Cloud Testbed with co-located racks.
11. MalStone (OCC-Developed Benchmark) Sector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack. Data consisted of 20 nodes with 500 million 100-byte records / node.
13. Idea 1 – Support UDF’s Over Data Center Think of MapReduce as Map acting on (text) records With fixed Shuffle and Sort Followed by Reducing acting on (text) records We generalize this framework as follows: Support a sequence of User Defined Functions (UDF) acting on segments (=chunks) of files. MapReduce is one special case consisting of a user defined Map, a system-defined shuffle and sort, and a user defined reduce In both cases, framework takes care of assigning nodes to process data, restarting failed processes, etc. 13
15. Sector Programming Model Sector dataset consists of one or more physical files Sphere applies User Defined Functions over streams of data consisting of data segments Data segments can be data records, collections of data records, or files Example of UDFs: Map function, Reduce function, Split function for CART, etc. Outputs of UDFs can be returned to originating node, written to local node, or shuffled to another node. 15
16. How Do Move Data in a Cloud & Between Clouds? 16 Option 1: Use TCP and close your eyes. Option 2: ?????
17.
18. UDT can take advantage of wide area high performance 10 Gbps network
19. Sector is a wide area distributed file system built over UDT.
20.
21. (x) UDT Scalable TCP HighSpeed TCP AIMD (TCP NewReno) x Alternatives to TCP – Decreasing Increases AIMD Protocols increase of packet sending rate x decrease factor
22. UDT Makes Wide Area Clouds Possible Using UDT, Sector can take advantage of wide area high performance networks (10+ Gbps) 20 10 Gbps per application
24. Idea 3: Add Security From the Start Security Server Security server maintains information about users and slaves. User access control: password and client IP address. File level access control. Messages are encrypted over SSL. Certificate is used for authentication. Sector is HIPAA capable. Master Client SSL SSL AAA data Slaves
25. For More Information About Sector YunhongGu and Robert L Grossman, Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data, Philosophical Transactions of the Royal Society A, Volume 367, Number 1897, pages 2429--2445, 2009 http://arxiv.org/abs/0809.1181 http://rsta.royalsocietypublishing.org/content/367/1897/2429 23
26. For Related Information Related information can be found at: blog.rgrossman.com www.rgrossman.com 24