My Other Computer Is A Data Center: The Sector Perspective on Big DataJuly 25, 20101RobertGrossmanLaboratory for Advanced ComputingUniversity of Illinois at ChicagoOpen Data GroupInstitute for Genomics & Systems BiologyUniversity of Chicago
What Is Small Data?100 million movie ratings480 thousand customers17,000 moviesFrom 1998 to 2005Less than 2 GB data.Fits into memory, but very sophisticated models required to win.
What is Large Data?One simple definition is consider a dataset large if it is to big to fit into a databaseSearch engines, social networks, and other Internet applications generate large datasets in this sense.
Part 1What’s Different About Data Center Computing?4
Data center scale computing provides storage and computational resources at the scale and with the reliability of a data center.
A very nice recent book by Barroso and Holzle
7Scale is new
Elastic, Usage Based Pricing Is New8costs the same as1 computer in a rack for 120 hours120 computers in  three racks for 1 hour
Simplicity of the Parallel Programming Framework is New9A new programmer can develop a program to process a container full of data with less than day of training using MapReduce.
Elastic CloudsLarge Data CloudsGoal: Minimize cost of virtualized machines & provide on-demand.  HPCGoal: Maximize data (with matching compute) and control cost.Goal: Minimize latency and control heat.
200310x-100x197610x-100xdatascience1670250xsimulation science160930xexperimental science
12
13
Part 2.  Stacks for Big Data14
The Google Data StackThe Google File System (2003)MapReduce: Simplified Data Processing… (2004)BigTable: A Distributed Storage System… (2006)15
Map-Reduce ExampleInput is file with one document per recordUser specifies map functionkey = document URLValue = terms that document contains“it”, 1“was”, 1“the”, 1“best”, 1(“doc cdickens”,“it was the best of times”)map
Example (cont’d)MapReduce library gathers together all pairs with the same key value (shuffle/sort phase)The user-defined reduce function combines all the values associated with the same keykey = “it”values = 1, 1“it”, 2“was”, 2“best”, 1“worst”, 1key = “was”values = 1, 1reducekey = “best”values = 1key = “worst”values = 1
Applying MapReduce to the Data in Storage Cloudmap/shufflereduce18
Google’s Large Data CloudCompute ServicesData ServicesStorage Services19ApplicationsGoogle’s MapReduceGoogle’s BigTableGoogle File System (GFS)Google’s Stack
Hadoop’s Large Data CloudApplicationsCompute Services20Hadoop’sMapReduceData ServicesNoSQL DatabasesHadoop Distributed File System (HDFS)Storage ServicesHadoop’s Stack
Amazon Style Data CloudLoad BalancerSimple Queue Service21SDBEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstancesEC2 InstancesS3 Storage Services
Evolution of NoSQL DatabasesStandard architecture for simple web apps:Front end load balanced web serversBusiness logic layer in the middle Backend databaseDatabases do not scale well with very large numbers of users or very large amounts of dataAlternatives includeSharded (partitioned) databasesmaster-slave databasesmemcache22
NoSQL SystemsSuggests No SQL support, also Not Only SQLOne or more of the ACID properties not supportedJoins generally not supportedUsually flexible schemasSome well known examples: Google’s BigTable, Amazon’s S3 & Facebook’s CassandraSeveral recent open source systems23
Different Types of NoSQL SystemsDistributed Key-Value SystemsAmazon’s S3 Key-Value Store (Dynamo)VoldemortColumn-based SystemsBigTableHBaseCassandraDocument-based systemsCouchDB24
Cassandra vsMySQLComparisonMySQL > 50 GB Data Writes Average : ~300 msReads Average : ~350 msCassandra > 50 GB DataWrites Average : 0.12 msReads Average : 15 msSource: AvinashLakshman, PrashantMalik, Cassandra Structured Storage System over a P2P Network, static.last.fm/johan/nosql-20090611/cassandra_nosql.pdf
CAP TheoremProposed by Eric Brewer, 2000Three properties of a system: consistency, availability and partitionsYou can have at most two of these three properties for any shared-data systemScale out requires partitionsMost large web-based systems choose availability over consistency26Reference: Brewer, PODC 2000; Gilbert/Lynch, SIGACT News 2002
Eventual ConsistencyAll updates eventually propagate through the system and all nodes will eventually be consistent (assuming no more updates)Eventually, a node is either updated or removed from service.  Can be implemented with Gossip protocolAmazon’s Dynamo popularized this approachSometimes this is called BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID27
Part 3.  Sector Architecture28
Design Objectives1. Provide Internet scale data storage for large dataSupport multiple data centers connected by high speed wide networks2. Simplify data intensive computing for a larger class of problems than covered by MapReduceSupport applying User Defined Functions to the data managed by a storage cloud, with transparent load balancing and fault tolerance
Sector’s Large Data Cloud30ApplicationsCompute ServicesSphere’s UDFsData ServicesSector’s Distributed File System (SDFS)Storage ServicesRouting & Transport ServicesUDP-based Data Transport Protocol (UDT)Sector’s Stack
Apply User Defined Functions (UDF) to Files in Storage Cloudmap/shufflereduce31UDFUDF
System ArchitectureUser accountData protectionSystem SecurityMetadataSchedulingService providerSystem access toolsApp. Programming InterfacesSecurity ServerMastersClientsSSLSSLDataUDTEncryption optionalslavesslavesStorage and Processing
33
Terasort BenchmarkSector/Sphere 1.24a, Hadoop 0.20.1 with no replication on Phase 2 of Open Cloud Testbed with co-located racks.
MalStone36entitiessitesdk-2dk-1dktime
MalStoneSector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack.  Data consisted of 20 nodes with 500 million 100-byte records / node.
Parallel Data Processing using SphereInput data segment is processing by Sphere Processing Engine (SPE) on each slave in parallel.Results can be either stored in local node or sent to multiple bucket files on different nodesAll inputs and outputs are Sector files. Sphere processes can combined by communicating via Sector files.
DisksInput Segments Files not split into blocks
 Directory directives
 In-memory objects UDFBucket WritersOutput SegmentsDisks
Application-Aware File SystemFiles are not split into blocksUsers supply appropriate sized files to SectorDirectory and File FamilySector will keep related files together during upload and replication (e.g., a directory of files, data and index for a database table, etc.)Allow users to specify file location, if necessaryBetter data localityno need to move blocks for applications that process files or even multiple files as minimum input unitIn-memory objectStore repeatedly-accessed data in memory
Sector SummarySector is fastest open source large data cloudAs measured by MalStone & TerasortSector is easy to programUDFs, MapReduce & Python over streamsSector does not require extensive tuningSector is secureA HIPAA compliant Sector cloud is being launchedSector is reliableSector supports multiple active master node servers41
Part 4. Sector Applications
App 1: Bionimbus43www.cistrack.org  & www.bionimbus.org
App 2: Bulk Download of the SDSS44LLPR = local / long distance performance
 Sector LLPR varies between 0.61 and 0.98 Recent Sloan Digital Sky Survey (SDSS) data release is 14 TB in size.
App 3: Anomalies in Network Data45

My Other Computer is a Data Center: The Sector Perspective on Big Data

  • 1.
    My Other ComputerIs A Data Center: The Sector Perspective on Big DataJuly 25, 20101RobertGrossmanLaboratory for Advanced ComputingUniversity of Illinois at ChicagoOpen Data GroupInstitute for Genomics & Systems BiologyUniversity of Chicago
  • 2.
    What Is SmallData?100 million movie ratings480 thousand customers17,000 moviesFrom 1998 to 2005Less than 2 GB data.Fits into memory, but very sophisticated models required to win.
  • 3.
    What is LargeData?One simple definition is consider a dataset large if it is to big to fit into a databaseSearch engines, social networks, and other Internet applications generate large datasets in this sense.
  • 4.
    Part 1What’s DifferentAbout Data Center Computing?4
  • 5.
    Data center scalecomputing provides storage and computational resources at the scale and with the reliability of a data center.
  • 6.
    A very nicerecent book by Barroso and Holzle
  • 7.
  • 8.
    Elastic, Usage BasedPricing Is New8costs the same as1 computer in a rack for 120 hours120 computers in three racks for 1 hour
  • 9.
    Simplicity of theParallel Programming Framework is New9A new programmer can develop a program to process a container full of data with less than day of training using MapReduce.
  • 10.
    Elastic CloudsLarge DataCloudsGoal: Minimize cost of virtualized machines & provide on-demand. HPCGoal: Maximize data (with matching compute) and control cost.Goal: Minimize latency and control heat.
  • 11.
  • 12.
  • 13.
  • 14.
    Part 2. Stacks for Big Data14
  • 15.
    The Google DataStackThe Google File System (2003)MapReduce: Simplified Data Processing… (2004)BigTable: A Distributed Storage System… (2006)15
  • 16.
    Map-Reduce ExampleInput isfile with one document per recordUser specifies map functionkey = document URLValue = terms that document contains“it”, 1“was”, 1“the”, 1“best”, 1(“doc cdickens”,“it was the best of times”)map
  • 17.
    Example (cont’d)MapReduce librarygathers together all pairs with the same key value (shuffle/sort phase)The user-defined reduce function combines all the values associated with the same keykey = “it”values = 1, 1“it”, 2“was”, 2“best”, 1“worst”, 1key = “was”values = 1, 1reducekey = “best”values = 1key = “worst”values = 1
  • 18.
    Applying MapReduce tothe Data in Storage Cloudmap/shufflereduce18
  • 19.
    Google’s Large DataCloudCompute ServicesData ServicesStorage Services19ApplicationsGoogle’s MapReduceGoogle’s BigTableGoogle File System (GFS)Google’s Stack
  • 20.
    Hadoop’s Large DataCloudApplicationsCompute Services20Hadoop’sMapReduceData ServicesNoSQL DatabasesHadoop Distributed File System (HDFS)Storage ServicesHadoop’s Stack
  • 21.
    Amazon Style DataCloudLoad BalancerSimple Queue Service21SDBEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstanceEC2 InstancesEC2 InstancesS3 Storage Services
  • 22.
    Evolution of NoSQLDatabasesStandard architecture for simple web apps:Front end load balanced web serversBusiness logic layer in the middle Backend databaseDatabases do not scale well with very large numbers of users or very large amounts of dataAlternatives includeSharded (partitioned) databasesmaster-slave databasesmemcache22
  • 23.
    NoSQL SystemsSuggests NoSQL support, also Not Only SQLOne or more of the ACID properties not supportedJoins generally not supportedUsually flexible schemasSome well known examples: Google’s BigTable, Amazon’s S3 & Facebook’s CassandraSeveral recent open source systems23
  • 24.
    Different Types ofNoSQL SystemsDistributed Key-Value SystemsAmazon’s S3 Key-Value Store (Dynamo)VoldemortColumn-based SystemsBigTableHBaseCassandraDocument-based systemsCouchDB24
  • 25.
    Cassandra vsMySQLComparisonMySQL >50 GB Data Writes Average : ~300 msReads Average : ~350 msCassandra > 50 GB DataWrites Average : 0.12 msReads Average : 15 msSource: AvinashLakshman, PrashantMalik, Cassandra Structured Storage System over a P2P Network, static.last.fm/johan/nosql-20090611/cassandra_nosql.pdf
  • 26.
    CAP TheoremProposed byEric Brewer, 2000Three properties of a system: consistency, availability and partitionsYou can have at most two of these three properties for any shared-data systemScale out requires partitionsMost large web-based systems choose availability over consistency26Reference: Brewer, PODC 2000; Gilbert/Lynch, SIGACT News 2002
  • 27.
    Eventual ConsistencyAll updateseventually propagate through the system and all nodes will eventually be consistent (assuming no more updates)Eventually, a node is either updated or removed from service. Can be implemented with Gossip protocolAmazon’s Dynamo popularized this approachSometimes this is called BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID27
  • 28.
    Part 3. Sector Architecture28
  • 29.
    Design Objectives1. ProvideInternet scale data storage for large dataSupport multiple data centers connected by high speed wide networks2. Simplify data intensive computing for a larger class of problems than covered by MapReduceSupport applying User Defined Functions to the data managed by a storage cloud, with transparent load balancing and fault tolerance
  • 30.
    Sector’s Large DataCloud30ApplicationsCompute ServicesSphere’s UDFsData ServicesSector’s Distributed File System (SDFS)Storage ServicesRouting & Transport ServicesUDP-based Data Transport Protocol (UDT)Sector’s Stack
  • 31.
    Apply User DefinedFunctions (UDF) to Files in Storage Cloudmap/shufflereduce31UDFUDF
  • 32.
    System ArchitectureUser accountDataprotectionSystem SecurityMetadataSchedulingService providerSystem access toolsApp. Programming InterfacesSecurity ServerMastersClientsSSLSSLDataUDTEncryption optionalslavesslavesStorage and Processing
  • 33.
  • 35.
    Terasort BenchmarkSector/Sphere 1.24a,Hadoop 0.20.1 with no replication on Phase 2 of Open Cloud Testbed with co-located racks.
  • 36.
  • 37.
    MalStoneSector/Sphere 1.20, Hadoop0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack. Data consisted of 20 nodes with 500 million 100-byte records / node.
  • 38.
    Parallel Data Processingusing SphereInput data segment is processing by Sphere Processing Engine (SPE) on each slave in parallel.Results can be either stored in local node or sent to multiple bucket files on different nodesAll inputs and outputs are Sector files. Sphere processes can combined by communicating via Sector files.
  • 39.
    DisksInput Segments Filesnot split into blocks
  • 40.
  • 41.
    In-memory objectsUDFBucket WritersOutput SegmentsDisks
  • 42.
    Application-Aware File SystemFilesare not split into blocksUsers supply appropriate sized files to SectorDirectory and File FamilySector will keep related files together during upload and replication (e.g., a directory of files, data and index for a database table, etc.)Allow users to specify file location, if necessaryBetter data localityno need to move blocks for applications that process files or even multiple files as minimum input unitIn-memory objectStore repeatedly-accessed data in memory
  • 43.
    Sector SummarySector isfastest open source large data cloudAs measured by MalStone & TerasortSector is easy to programUDFs, MapReduce & Python over streamsSector does not require extensive tuningSector is secureA HIPAA compliant Sector cloud is being launchedSector is reliableSector supports multiple active master node servers41
  • 44.
    Part 4. SectorApplications
  • 45.
  • 46.
    App 2: BulkDownload of the SDSS44LLPR = local / long distance performance
  • 47.
    Sector LLPRvaries between 0.61 and 0.98 Recent Sloan Digital Sky Survey (SDSS) data release is 14 TB in size.
  • 48.
    App 3: Anomaliesin Network Data45
  • 49.
    Sector ApplicationsDistributing the15 TB Sloan Digital Sky Survey to astronomers around the world (with JHU, 2005)Managing and analyzing high throughput sequence data (Cistrack, University of Chicago, 2007).Detecting emergent behavior in distributed network data (Angle, won SC 07 Analytics Challenge)Wide area clouds (won SC 09 BWC with 100 Gbps wide area computation) New ensemble-based algorithms for treesGraph processingImage processing (OCC Project Matsu)46
  • 50.
    CreditsSector was developedby YunhongGu from the University of Illinois at Chicago and verycloud.com
  • 51.
    For More InformationFormore information, please visitsector.sourceforge.netrgrossman.com