• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Topic 10: Taxonomy of Data and Storage
 

Topic 10: Taxonomy of Data and Storage

on

  • 979 views

Cloud Computing Workshop 2013, ITU

Cloud Computing Workshop 2013, ITU

Statistics

Views

Total Views
979
Views on SlideShare
979
Embed Views
0

Actions

Likes
0
Downloads
39
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Topic 10: Taxonomy of Data and Storage Topic 10: Taxonomy of Data and Storage Presentation Transcript

    • 10: Taxonomy of Data and StorageZubair Nabizubair.nabi@itu.edu.pkApril 20, 2013Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 1 / 27
    • Outline1 Datasets2 Storage3 Beyond RDBMS4 NoSQL Taxonomy5 NewSQLZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 2 / 27
    • Outline1 Datasets2 Storage3 Beyond RDBMS4 NoSQL Taxonomy5 NewSQLZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 3 / 27
    • IntroductionData is everywhere and is the driving force behind our livesZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
    • IntroductionData is everywhere and is the driving force behind our livesThe address book on your phone is dataZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
    • IntroductionData is everywhere and is the driving force behind our livesThe address book on your phone is dataSo is the newspaper that you read every morningZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
    • IntroductionData is everywhere and is the driving force behind our livesThe address book on your phone is dataSo is the newspaper that you read every morningEverything you see around you is a potential source of data whichmight be useful for a certain applicationZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
    • IntroductionData is everywhere and is the driving force behind our livesThe address book on your phone is dataSo is the newspaper that you read every morningEverything you see around you is a potential source of data whichmight be useful for a certain applicationWe use this data to share information and make a more informeddecision about different eventsZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
    • IntroductionData is everywhere and is the driving force behind our livesThe address book on your phone is dataSo is the newspaper that you read every morningEverything you see around you is a potential source of data whichmight be useful for a certain applicationWe use this data to share information and make a more informeddecision about different eventsDatasets can easily be classified on the basis of their structureZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
    • IntroductionData is everywhere and is the driving force behind our livesThe address book on your phone is dataSo is the newspaper that you read every morningEverything you see around you is a potential source of data whichmight be useful for a certain applicationWe use this data to share information and make a more informeddecision about different eventsDatasets can easily be classified on the basis of their structure1 Structured2 Unstructured3 Semi-structuredZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
    • Structured DataFormatted in a universally understandable and identifiable wayZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
    • Structured DataFormatted in a universally understandable and identifiable wayIn most cases, structured data is formally specified by a schemaZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
    • Structured DataFormatted in a universally understandable and identifiable wayIn most cases, structured data is formally specified by a schemaYour phone address phone is structured because it has a schemaconsisting of name, phone number, address, email address, etc.Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
    • Structured DataFormatted in a universally understandable and identifiable wayIn most cases, structured data is formally specified by a schemaYour phone address phone is structured because it has a schemaconsisting of name, phone number, address, email address, etc.Most traditional databases contain structured data revolving arounddata laid out across columns and rowsZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
    • Structured DataFormatted in a universally understandable and identifiable wayIn most cases, structured data is formally specified by a schemaYour phone address phone is structured because it has a schemaconsisting of name, phone number, address, email address, etc.Most traditional databases contain structured data revolving arounddata laid out across columns and rowsEach field also has an associated typeZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
    • Structured DataFormatted in a universally understandable and identifiable wayIn most cases, structured data is formally specified by a schemaYour phone address phone is structured because it has a schemaconsisting of name, phone number, address, email address, etc.Most traditional databases contain structured data revolving arounddata laid out across columns and rowsEach field also has an associated typePossible to search for items based on their data typesZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
    • Unstructured DataData without any conceptual definition or typeZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
    • Unstructured DataData without any conceptual definition or typeCan vary from raw text to binary dataZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
    • Unstructured DataData without any conceptual definition or typeCan vary from raw text to binary dataProcessing unstructured data requires parsing and tagging on the flyZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
    • Unstructured DataData without any conceptual definition or typeCan vary from raw text to binary dataProcessing unstructured data requires parsing and tagging on the flyIn most cases, consists of simple log filesZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
    • Semi-structured DataOccupies the space between the structured and unstructured dataspectrumZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
    • Semi-structured DataOccupies the space between the structured and unstructured dataspectrumFor instance, while binary data has no structure, audio and video fileshave meta-data which has structure, such as author, time of creation,etc.Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
    • Semi-structured DataOccupies the space between the structured and unstructured dataspectrumFor instance, while binary data has no structure, audio and video fileshave meta-data which has structure, such as author, time of creation,etc.Can also be labelled as self-describing structureZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
    • Outline1 Datasets2 Storage3 Beyond RDBMS4 NoSQL Taxonomy5 NewSQLZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 8 / 27
    • Database Management Systems (DBMS)Used to store and manage dataZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
    • Database Management Systems (DBMS)Used to store and manage dataSupport for large amounts of dataZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
    • Database Management Systems (DBMS)Used to store and manage dataSupport for large amounts of dataEnsure concurrency, sharing, and lockingZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
    • Database Management Systems (DBMS)Used to store and manage dataSupport for large amounts of dataEnsure concurrency, sharing, and lockingSecurity is useful too; to enable fine-grained access controlZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
    • Database Management Systems (DBMS)Used to store and manage dataSupport for large amounts of dataEnsure concurrency, sharing, and lockingSecurity is useful too; to enable fine-grained access controlAbility to keep working in the face of failureZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
    • Relational Database Management Systems (RDBMS)The most popular and predominant storage system in useZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
    • Relational Database Management Systems (RDBMS)The most popular and predominant storage system in useData in different files is connected by using a key fieldZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
    • Relational Database Management Systems (RDBMS)The most popular and predominant storage system in useData in different files is connected by using a key fieldData is laid out in different tables, with a key field that identifies eachrowZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
    • Relational Database Management Systems (RDBMS)The most popular and predominant storage system in useData in different files is connected by using a key fieldData is laid out in different tables, with a key field that identifies eachrowThe same key field is used to connect one table to anotherZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
    • Relational Database Management Systems (RDBMS)The most popular and predominant storage system in useData in different files is connected by using a key fieldData is laid out in different tables, with a key field that identifies eachrowThe same key field is used to connect one table to anotherFor instance, a relation might have customer ID as key and her detailsas data; another table might have the same key but different data, sayher purchases; yet another table with the same key might have abreakdown of her preferencesZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
    • Relational Database Management Systems (RDBMS)The most popular and predominant storage system in useData in different files is connected by using a key fieldData is laid out in different tables, with a key field that identifies eachrowThe same key field is used to connect one table to anotherFor instance, a relation might have customer ID as key and her detailsas data; another table might have the same key but different data, sayher purchases; yet another table with the same key might have abreakdown of her preferencesExamples include Oracle Database, MS SQL Server, MySQL, IBMDB2, and TeradataZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
    • Structured Query Language (SQL)Non-procedural language used for data retrieval and manipulation inRDBMSZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
    • Structured Query Language (SQL)Non-procedural language used for data retrieval and manipulation inRDBMSAdds a layer of abstraction over relational algebra, which enables setoperations, selections, etc.Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
    • Structured Query Language (SQL)Non-procedural language used for data retrieval and manipulation inRDBMSAdds a layer of abstraction over relational algebra, which enables setoperations, selections, etc.Due to its declarative nature, users operate in terms of their expectedoutput while the underlying system decides the actual query executionplanZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
    • Structured Query Language (SQL)Non-procedural language used for data retrieval and manipulation inRDBMSAdds a layer of abstraction over relational algebra, which enables setoperations, selections, etc.Due to its declarative nature, users operate in terms of their expectedoutput while the underlying system decides the actual query executionplanInstructions consist of a specific SQL statement and additionalparameters and operandsZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
    • Structured Query Language (SQL)Non-procedural language used for data retrieval and manipulation inRDBMSAdds a layer of abstraction over relational algebra, which enables setoperations, selections, etc.Due to its declarative nature, users operate in terms of their expectedoutput while the underlying system decides the actual query executionplanInstructions consist of a specific SQL statement and additionalparameters and operandsFor instance, the SELECT operator retrieves certain records, INSERTadds a record, and so onZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
    • RDBMS and Structured DataAs structured data follows a predefined schema, it naturally maps on toa relational database systemZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
    • RDBMS and Structured DataAs structured data follows a predefined schema, it naturally maps on toa relational database systemThe schema defines the type and structure of the data and its relationsZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
    • RDBMS and Structured DataAs structured data follows a predefined schema, it naturally maps on toa relational database systemThe schema defines the type and structure of the data and its relationsSchema design is an arduous process and needs to be done beforethe database can be populatedZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
    • RDBMS and Structured DataAs structured data follows a predefined schema, it naturally maps on toa relational database systemThe schema defines the type and structure of the data and its relationsSchema design is an arduous process and needs to be done beforethe database can be populatedAnother consequence of a strict schema is that it is non-trivial toextend itZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
    • RDBMS and Structured DataAs structured data follows a predefined schema, it naturally maps on toa relational database systemThe schema defines the type and structure of the data and its relationsSchema design is an arduous process and needs to be done beforethe database can be populatedAnother consequence of a strict schema is that it is non-trivial toextend itFor instance, adding a new attribute to an existing row necessitatesadding a new column to the entire tableZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
    • RDBMS and Structured DataAs structured data follows a predefined schema, it naturally maps on toa relational database systemThe schema defines the type and structure of the data and its relationsSchema design is an arduous process and needs to be done beforethe database can be populatedAnother consequence of a strict schema is that it is non-trivial toextend itFor instance, adding a new attribute to an existing row necessitatesadding a new column to the entire tableExtremely suboptimal in tables with millions of rowsZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
    • RDBMS and Semi- and Un-structured DataUnstructured data has no notion of schema while semi-structured dataonly has a weak oneZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
    • RDBMS and Semi- and Un-structured DataUnstructured data has no notion of schema while semi-structured dataonly has a weak oneData within such datasets also has an associated typeZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
    • RDBMS and Semi- and Un-structured DataUnstructured data has no notion of schema while semi-structured dataonly has a weak oneData within such datasets also has an associated typeIn fact, types are application-centric: It might be possible to interpret afield as a float in one application and as a string in anotherZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
    • RDBMS and Semi- and Un-structured DataUnstructured data has no notion of schema while semi-structured dataonly has a weak oneData within such datasets also has an associated typeIn fact, types are application-centric: It might be possible to interpret afield as a float in one application and as a string in anotherWhile it is possible, with human intervention, to glean structure fromunstructured data, it is an extremely expensive taskZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
    • RDBMS and Semi- and Un-structured DataUnstructured data has no notion of schema while semi-structured dataonly has a weak oneData within such datasets also has an associated typeIn fact, types are application-centric: It might be possible to interpret afield as a float in one application and as a string in anotherWhile it is possible, with human intervention, to glean structure fromunstructured data, it is an extremely expensive taskStructureless data generated by real-time sources can change thenumber of attributes and their types on the flyZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
    • RDBMS and Semi- and Un-structured DataUnstructured data has no notion of schema while semi-structured dataonly has a weak oneData within such datasets also has an associated typeIn fact, types are application-centric: It might be possible to interpret afield as a float in one application and as a string in anotherWhile it is possible, with human intervention, to glean structure fromunstructured data, it is an extremely expensive taskStructureless data generated by real-time sources can change thenumber of attributes and their types on the flyRDBMS would require the creation of a new table each time such achange takes placeZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
    • RDBMS and Semi- and Un-structured DataUnstructured data has no notion of schema while semi-structured dataonly has a weak oneData within such datasets also has an associated typeIn fact, types are application-centric: It might be possible to interpret afield as a float in one application and as a string in anotherWhile it is possible, with human intervention, to glean structure fromunstructured data, it is an extremely expensive taskStructureless data generated by real-time sources can change thenumber of attributes and their types on the flyRDBMS would require the creation of a new table each time such achange takes placeTherefore, unstructured and semi-structured data does not fit therelational modelZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
    • Outline1 Datasets2 Storage3 Beyond RDBMS4 NoSQL Taxonomy5 NewSQLZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 14 / 27
    • MotivationDifferent semantics:RDBMS provide ACID semantics:1 Acid: The entire transaction either succeeds or failsZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
    • MotivationDifferent semantics:RDBMS provide ACID semantics:1 Acid: The entire transaction either succeeds or fails2 Consistent: Data within the database remains consistent after eachtransactionZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
    • MotivationDifferent semantics:RDBMS provide ACID semantics:1 Acid: The entire transaction either succeeds or fails2 Consistent: Data within the database remains consistent after eachtransaction3 Isolation: Transactions are sandboxed from each otherZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
    • MotivationDifferent semantics:RDBMS provide ACID semantics:1 Acid: The entire transaction either succeeds or fails2 Consistent: Data within the database remains consistent after eachtransaction3 Isolation: Transactions are sandboxed from each other4 Durable: Transactions are persistent across failures and restartsZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
    • MotivationDifferent semantics:RDBMS provide ACID semantics:1 Acid: The entire transaction either succeeds or fails2 Consistent: Data within the database remains consistent after eachtransaction3 Isolation: Transactions are sandboxed from each other4 Durable: Transactions are persistent across failures and restartsOverkill in case of most user-facing applicationsZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
    • MotivationDifferent semantics:RDBMS provide ACID semantics:1 Acid: The entire transaction either succeeds or fails2 Consistent: Data within the database remains consistent after eachtransaction3 Isolation: Transactions are sandboxed from each other4 Durable: Transactions are persistent across failures and restartsOverkill in case of most user-facing applicationsMost applications are more interested in availability and willing tosacrifice consistency leading to eventual consistencyZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
    • MotivationDifferent semantics:RDBMS provide ACID semantics:1 Acid: The entire transaction either succeeds or fails2 Consistent: Data within the database remains consistent after eachtransaction3 Isolation: Transactions are sandboxed from each other4 Durable: Transactions are persistent across failures and restartsOverkill in case of most user-facing applicationsMost applications are more interested in availability and willing tosacrifice consistency leading to eventual consistencyThis basically available, soft state, eventually consistent (BASE) modelenables applications to function even in the face of partial failureZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
    • MotivationDifferent semantics:RDBMS provide ACID semantics:1 Acid: The entire transaction either succeeds or fails2 Consistent: Data within the database remains consistent after eachtransaction3 Isolation: Transactions are sandboxed from each other4 Durable: Transactions are persistent across failures and restartsOverkill in case of most user-facing applicationsMost applications are more interested in availability and willing tosacrifice consistency leading to eventual consistencyThis basically available, soft state, eventually consistent (BASE) modelenables applications to function even in the face of partial failureHigh Throughput: Most NoSQL databases sacrifice consistency foravailability leading to higher throughput (in some cases an order ofmagnitude)Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
    • Motivation (2)Horizontal Scalability: To cater for more data, NoSQL stores can bescaled up by just adding more machines and the underlying systemautomatically re-distributes the dataZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
    • Motivation (2)Horizontal Scalability: To cater for more data, NoSQL stores can bescaled up by just adding more machines and the underlying systemautomatically re-distributes the dataCommodity Hardware: A large number of RDBMS require specializedand proprietary hardware for operation. In contrast, NoSQL databasesfunction over commodity off-the-shelf hardwareZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
    • Motivation (2)Horizontal Scalability: To cater for more data, NoSQL stores can bescaled up by just adding more machines and the underlying systemautomatically re-distributes the dataCommodity Hardware: A large number of RDBMS require specializedand proprietary hardware for operation. In contrast, NoSQL databasesfunction over commodity off-the-shelf hardwareProgramming Language Support: Over the years programminglanguages have started providing abstractions for database support(LINQ, etc.) while bypassing SQL. NoSQL databases provideabstractions that directly map onto the language abstractions leadingto tighter couplingZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
    • Motivation (3)The Rise of Cloud Computing: Cloud Computing applications requirehorizontal scalability and low administration overhead. Bothrequirements are naturally satisfied by NoSQL storesZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 17 / 27
    • Outline1 Datasets2 Storage3 Beyond RDBMS4 NoSQL Taxonomy5 NewSQLZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 18 / 27
    • IntroductionNoSQL databases can be classified on the basis of:1 Data Model: How data is representedZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
    • IntroductionNoSQL databases can be classified on the basis of:1 Data Model: How data is represented2 Scalability: How scalable the system isZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
    • IntroductionNoSQL databases can be classified on the basis of:1 Data Model: How data is represented2 Scalability: How scalable the system is3 Query Model: What type of API it exposesZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
    • IntroductionNoSQL databases can be classified on the basis of:1 Data Model: How data is represented2 Scalability: How scalable the system is3 Query Model: What type of API it exposes4 Persistence: How persistent the data isZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
    • Classification by Data ModelBased on the data model, NoSQL databases can roughly be categorizedinto three categories:1 Key/value Stores: A map/dictionary allowing put/get semantics perkeyZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
    • Classification by Data ModelBased on the data model, NoSQL databases can roughly be categorizedinto three categories:1 Key/value Stores: A map/dictionary allowing put/get semantics perkey2 Document Stores: Complex data structures to encapsulate documentkey/value pairsZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
    • Classification by Data ModelBased on the data model, NoSQL databases can roughly be categorizedinto three categories:1 Key/value Stores: A map/dictionary allowing put/get semantics perkey2 Document Stores: Complex data structures to encapsulate documentkey/value pairs3 Column-Oriented Stores: Data laid out by columnZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
    • Key/value StoresData is stored within a large hash mapZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
    • Key/value StoresData is stored within a large hash mapSimple get/put APIZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
    • Key/value StoresData is stored within a large hash mapSimple get/put APIFavour scalability over consistencyZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
    • Key/value StoresData is stored within a large hash mapSimple get/put APIFavour scalability over consistencyLimit on the size of the keyZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
    • Key/value StoresData is stored within a large hash mapSimple get/put APIFavour scalability over consistencyLimit on the size of the keyExamples include Amazon’s Dynamo, LinkedIn’s Voldemort, Redis,and MemcachedZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
    • Document StoresKey/value semantics but based on documentsZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
    • Document StoresKey/value semantics but based on documentsA document encapsulates data in a standard format, such as JSON,XML, PDF, etc.Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
    • Document StoresKey/value semantics but based on documentsA document encapsulates data in a standard format, such as JSON,XML, PDF, etc.Documents themselves can be heterogeneousZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
    • Document StoresKey/value semantics but based on documentsA document encapsulates data in a standard format, such as JSON,XML, PDF, etc.Documents themselves can be heterogeneousDocuments can also be retrieved based on their contentZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
    • Document StoresKey/value semantics but based on documentsA document encapsulates data in a standard format, such as JSON,XML, PDF, etc.Documents themselves can be heterogeneousDocuments can also be retrieved based on their contentExamples include Apache CouchDB and MongoDBZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
    • Column-Oriented StoresData is stored and processed by columnZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
    • Column-Oriented StoresData is stored and processed by columnUseful for read-mostly and read-intensive dataZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
    • Column-Oriented StoresData is stored and processed by columnUseful for read-mostly and read-intensive dataData within the same column is of the same type enablingopportunities for efficient compressionZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
    • Column-Oriented StoresData is stored and processed by columnUseful for read-mostly and read-intensive dataData within the same column is of the same type enablingopportunities for efficient compressionColumns are stored separately so they can be loaded in parallelZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
    • Column-Oriented StoresData is stored and processed by columnUseful for read-mostly and read-intensive dataData within the same column is of the same type enablingopportunities for efficient compressionColumns are stored separately so they can be loaded in parallelExamples include Google’s BigTable (Apache HBase is its open sourceclone) and Facebook’s CassandraZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
    • Outline1 Datasets2 Storage3 Beyond RDBMS4 NoSQL Taxonomy5 NewSQLZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 24 / 27
    • IntroductionA hybrid of traditional RDBMS and NoSQLZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
    • IntroductionA hybrid of traditional RDBMS and NoSQLScalability and performance of NoSQL and ACID guarantees of RDBMSZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
    • IntroductionA hybrid of traditional RDBMS and NoSQLScalability and performance of NoSQL and ACID guarantees of RDBMSUse SQL as the primary languageZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
    • IntroductionA hybrid of traditional RDBMS and NoSQLScalability and performance of NoSQL and ACID guarantees of RDBMSUse SQL as the primary languageAbility to scale out and run over commodity hardwareZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
    • IntroductionA hybrid of traditional RDBMS and NoSQLScalability and performance of NoSQL and ACID guarantees of RDBMSUse SQL as the primary languageAbility to scale out and run over commodity hardwareClassified into:1 New Databases: Designed from scratchZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
    • IntroductionA hybrid of traditional RDBMS and NoSQLScalability and performance of NoSQL and ACID guarantees of RDBMSUse SQL as the primary languageAbility to scale out and run over commodity hardwareClassified into:1 New Databases: Designed from scratch2 New MySQL Storage Engines: Keep MySQL as interface but replacethe storage engineZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
    • IntroductionA hybrid of traditional RDBMS and NoSQLScalability and performance of NoSQL and ACID guarantees of RDBMSUse SQL as the primary languageAbility to scale out and run over commodity hardwareClassified into:1 New Databases: Designed from scratch2 New MySQL Storage Engines: Keep MySQL as interface but replacethe storage engine3 Transparent Clustering: Add pluggable features to existing databasesto ensure scalabilityZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
    • New Databases1 Query Distribution:Each node holds a subset of the dataZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
    • New Databases1 Query Distribution:Each node holds a subset of the dataQueries are split and shipped to nodes that own the dataZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
    • New Databases1 Query Distribution:Each node holds a subset of the dataQueries are split and shipped to nodes that own the dataExamples include Google’s Spanner and NuoDBZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
    • New Databases1 Query Distribution:Each node holds a subset of the dataQueries are split and shipped to nodes that own the dataExamples include Google’s Spanner and NuoDB2 Pull Data:A central node (possibly replicated) holds all dataZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
    • New Databases1 Query Distribution:Each node holds a subset of the dataQueries are split and shipped to nodes that own the dataExamples include Google’s Spanner and NuoDB2 Pull Data:A central node (possibly replicated) holds all dataA set of processing nodes receives queries and pulls in required datafrom the central nodeZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
    • New Databases1 Query Distribution:Each node holds a subset of the dataQueries are split and shipped to nodes that own the dataExamples include Google’s Spanner and NuoDB2 Pull Data:A central node (possibly replicated) holds all dataA set of processing nodes receives queries and pulls in required datafrom the central nodeExamples include VMware’s SQLFireZubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
    • References1 NoSQL Databases: https://oak.cs.ucla.edu/cs144/handouts/nosqldbs.pdf2 NewSQL – The New Way to Handle Big Data: http://www.linuxforu.com/2012/01/newsql-handle-big-data/Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 27 / 27