SlideShare a Scribd company logo
1 of 19
Distributed Data Stores and
No SQL Databases
S. Sudarshan
IIT Bombay
Parallel Databases and Data Stores
 Relational Databases – mainstay of business
 Web-based applications caused spikes
 Especially true for public-facing e-Commerce sites
 Many application servers, one database
 Easy to parallelize application servers to 1000s of
servers, harder to parallelize databases to same scale
 First solution: memcache or other caching
mechanisms to reduce database access
Scaling Up
 What if the dataset is huge, and very high
number of transactions per second
 Use multiple servers to host database
 Parallel databases have been around for a
while
 But expensive, and designed for decision
support not OLTP
Scaling RDBMS – Master/Slave
 Master-Slave
 All writes are written to the master. All reads
performed against the replicated slave
databases
 Good for mostly read, very few update
applications
 Critical reads may be incorrect as writes may
not have been propagated down
 Large data sets can pose problems as
master needs to duplicate data to slaves
Scaling RDBMS - Partitioning
 Partitioning
 Divide the database across many machines
 E.g. hash or range partitioning
 Handled transparently by parallel databases
 but they are expensive
 “Sharding”
 Divide data amongst many cheap databases
(MySQL/PostgreSQL)
 Manage parallel access in the application
 Scales well for both reads and writes
 Not transparent, application needs to be partition-aware
What is NoSQL?
 Stands for Not Only SQL
 Class of non-relational data storage systems
 E.g. BigTable, Dynamo, PNUTS/Sherpa, ..
 Usually do not require a fixed table schema nor
do they use the concept of joins
 All NoSQL offerings relax one or more of the
ACID properties (will talk about the CAP
theorem)
 Not a backlash/rebellion against RDBMS
 SQL is a rich query language that cannot be
rivaled by the current list of NoSQL offerings
Why Now?
 Explosion of social media sites (Facebook,
Twitter) with large data needs
 Explosion of storage needs in large web
sites such as Google, Yahoo
 Much of the data is not files
 Rise of cloud-based solutions such as
Amazon S3 (simple storage solution)
 Shift to dynamically-typed data with frequent
schema changes
 Open-source community
Distributed Key-Value Data Stores
 Distributed key-value data storage systems allow
key-value pairs to be stored (and retrieved on key)
in a massively parallel system
 E.g. Google BigTable, Yahoo! Sherpa/PNUTS, Amazon
Dynamo, ..
 Partitioning, high availability etc completely
transparent to application
 Sharding systems and key-value stores don’t
support many relational features
 No join operations (except within partition)
 No referential integrity constraints across partitions
 etc.
Typical NoSQL API
 Basic API access:
 get(key) -- Extract the value given a key
 put(key, value) -- Create or update the value
given its key
 delete(key) -- Remove the key and its
associated value
 execute(key, operation, parameters) --
Invoke an operation to the value (given its
key) which is a special data structure (e.g.
List, Set, Map .... etc).
Flexible Data Model
ColumnFamily: Rockets
Key Value
1
2
3
Name Value
toon
inventoryQty
brakes
Rocket-Powered Roller Skates
Ready, Set, Zoom
5
false
name
Name Value
toon
inventoryQty
brakes
Little Giant Do-It-Yourself Rocket-Sled Kit
Beep Prepared
4
false
Name Value
toon
inventoryQty
wheels
Acme Jet Propelled Unicycle
Hot Rod and Reel
1
1
name
name
NoSQL Data Storage: Classification
 Uninterpreted key/value or ‘the big hash
table’.
 Amazon S3 (Dynamo)
 Flexible schema
 BigTable, Cassandra, HBase (ordered keys,
semi-structured data),
 Sherpa/PNuts (unordered keys, JSON)
 MongoDB (based on JSON)
 CouchDB (name/value in text)
PNUTS Data Storage Architecture
CAP Theorem
 Three properties of a system
 Consistency (all copies have same value)
 Availability (system can run even if parts have failed)
 Partitions (network can break into two or more parts,
each with active systems that can’t talk to other parts)
 Brewer’s CAP “Theorem”: You can have at most
two of these three properties for any system
 Very large systems will partition at some point
 Choose one of consistency or availablity
 Traditional database choose consistency
 Most Web applications choose availability
 Except for specific parts such as order processing
Availability
 Traditionally, thought of as the
server/process available five 9’s (99.999
%).
 However, for large node system, at almost
any point in time there’s a good chance that
a node is either down or there is a network
disruption among the nodes.
 Want a system that is resilient in the face of
network disruption
Eventual Consistency
 When no updates occur for a long period of time, eventually
all updates will propagate through the system and all the
nodes will be consistent
 For a given accepted update and a given node, eventually
either the update reaches the node or the node is removed
from service
 Known as BASE (Basically Available, Soft state, Eventual
consistency), as opposed to ACID
 Soft state: copies of a data item may be inconsistent
 Eventually Consistent – copies becomes consistent at
some later time if there are no more updates to that data
item
Common Advantages
 Cheap, easy to implement (open source)
 Data are replicated to multiple nodes (therefore
identical and fault-tolerant) and can be
partitioned
 When data is written, the latest version is on at least
one node and then replicated to other nodes
 Down nodes easily replaced
 No single point of failure
 Easy to distribute
 Don't require a schema
What does NoSQL Not Provide?
 Joins
 Group by
 But PNUTS provides interesting
materialized view approach to
joins/aggregation.
 ACID transactions
 SQL
 Integration with applications that are based
on SQL
Should I be using NoSQL Databases?
 NoSQL Data storage systems makes sense for
applications that need to deal with very very large
semi-structured data
 Log Analysis
 Social Networking Feeds
 Most of us work on organizational databases,
which are not that large and have low
update/query rates
 regular relational databases are THE correct
solution for such applications
Further Reading
 Lots of material on the Web
 E.g. nice presentation on NoSQL by Perry Hoekstra
(Perficient)
 Some material in this talk is from above
presentation
 Use a search engine to find information on data
storage systems such as
 BigTable (Google), Dynamo (Amazon), Cassandra
(Facebook/Apache), Pnuts/Sherpa (Yahoo),
CouchDB, MongoDB, …
 Several of above are open source

More Related Content

Similar to No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'

NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfajajkhan16
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, HowIgor Moochnick
 
Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesAndrew Kandels
 
Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsBob Pusateri
 
Nosql availability & integrity
Nosql availability & integrityNosql availability & integrity
Nosql availability & integrityFahri Firdausillah
 
NoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbmsNoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbmsAtulKabbur
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيMohamed Galal
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
no sql presentation
no sql presentationno sql presentation
no sql presentationchandanm2
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLAhmed Helmy
 

Similar to No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;' (20)

NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational Databases
 
Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAs
 
Nosql availability & integrity
Nosql availability & integrityNosql availability & integrity
Nosql availability & integrity
 
Artigo no sql x relational
Artigo no sql x relationalArtigo no sql x relational
Artigo no sql x relational
 
NoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbmsNoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbms
 
Datastores
DatastoresDatastores
Datastores
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Big Data NoSQL 1017
Big Data NoSQL 1017Big Data NoSQL 1017
Big Data NoSQL 1017
 
no sql presentation
no sql presentationno sql presentation
no sql presentation
 
No sql
No sqlNo sql
No sql
 
No sql
No sqlNo sql
No sql
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample material
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
NoSQL
NoSQLNoSQL
NoSQL
 

Recently uploaded

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 

Recently uploaded (20)

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'

  • 1. Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay
  • 2. Parallel Databases and Data Stores  Relational Databases – mainstay of business  Web-based applications caused spikes  Especially true for public-facing e-Commerce sites  Many application servers, one database  Easy to parallelize application servers to 1000s of servers, harder to parallelize databases to same scale  First solution: memcache or other caching mechanisms to reduce database access
  • 3. Scaling Up  What if the dataset is huge, and very high number of transactions per second  Use multiple servers to host database  Parallel databases have been around for a while  But expensive, and designed for decision support not OLTP
  • 4. Scaling RDBMS – Master/Slave  Master-Slave  All writes are written to the master. All reads performed against the replicated slave databases  Good for mostly read, very few update applications  Critical reads may be incorrect as writes may not have been propagated down  Large data sets can pose problems as master needs to duplicate data to slaves
  • 5. Scaling RDBMS - Partitioning  Partitioning  Divide the database across many machines  E.g. hash or range partitioning  Handled transparently by parallel databases  but they are expensive  “Sharding”  Divide data amongst many cheap databases (MySQL/PostgreSQL)  Manage parallel access in the application  Scales well for both reads and writes  Not transparent, application needs to be partition-aware
  • 6. What is NoSQL?  Stands for Not Only SQL  Class of non-relational data storage systems  E.g. BigTable, Dynamo, PNUTS/Sherpa, ..  Usually do not require a fixed table schema nor do they use the concept of joins  All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)  Not a backlash/rebellion against RDBMS  SQL is a rich query language that cannot be rivaled by the current list of NoSQL offerings
  • 7. Why Now?  Explosion of social media sites (Facebook, Twitter) with large data needs  Explosion of storage needs in large web sites such as Google, Yahoo  Much of the data is not files  Rise of cloud-based solutions such as Amazon S3 (simple storage solution)  Shift to dynamically-typed data with frequent schema changes  Open-source community
  • 8. Distributed Key-Value Data Stores  Distributed key-value data storage systems allow key-value pairs to be stored (and retrieved on key) in a massively parallel system  E.g. Google BigTable, Yahoo! Sherpa/PNUTS, Amazon Dynamo, ..  Partitioning, high availability etc completely transparent to application  Sharding systems and key-value stores don’t support many relational features  No join operations (except within partition)  No referential integrity constraints across partitions  etc.
  • 9. Typical NoSQL API  Basic API access:  get(key) -- Extract the value given a key  put(key, value) -- Create or update the value given its key  delete(key) -- Remove the key and its associated value  execute(key, operation, parameters) -- Invoke an operation to the value (given its key) which is a special data structure (e.g. List, Set, Map .... etc).
  • 10. Flexible Data Model ColumnFamily: Rockets Key Value 1 2 3 Name Value toon inventoryQty brakes Rocket-Powered Roller Skates Ready, Set, Zoom 5 false name Name Value toon inventoryQty brakes Little Giant Do-It-Yourself Rocket-Sled Kit Beep Prepared 4 false Name Value toon inventoryQty wheels Acme Jet Propelled Unicycle Hot Rod and Reel 1 1 name name
  • 11. NoSQL Data Storage: Classification  Uninterpreted key/value or ‘the big hash table’.  Amazon S3 (Dynamo)  Flexible schema  BigTable, Cassandra, HBase (ordered keys, semi-structured data),  Sherpa/PNuts (unordered keys, JSON)  MongoDB (based on JSON)  CouchDB (name/value in text)
  • 12. PNUTS Data Storage Architecture
  • 13. CAP Theorem  Three properties of a system  Consistency (all copies have same value)  Availability (system can run even if parts have failed)  Partitions (network can break into two or more parts, each with active systems that can’t talk to other parts)  Brewer’s CAP “Theorem”: You can have at most two of these three properties for any system  Very large systems will partition at some point  Choose one of consistency or availablity  Traditional database choose consistency  Most Web applications choose availability  Except for specific parts such as order processing
  • 14. Availability  Traditionally, thought of as the server/process available five 9’s (99.999 %).  However, for large node system, at almost any point in time there’s a good chance that a node is either down or there is a network disruption among the nodes.  Want a system that is resilient in the face of network disruption
  • 15. Eventual Consistency  When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent  For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service  Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID  Soft state: copies of a data item may be inconsistent  Eventually Consistent – copies becomes consistent at some later time if there are no more updates to that data item
  • 16. Common Advantages  Cheap, easy to implement (open source)  Data are replicated to multiple nodes (therefore identical and fault-tolerant) and can be partitioned  When data is written, the latest version is on at least one node and then replicated to other nodes  Down nodes easily replaced  No single point of failure  Easy to distribute  Don't require a schema
  • 17. What does NoSQL Not Provide?  Joins  Group by  But PNUTS provides interesting materialized view approach to joins/aggregation.  ACID transactions  SQL  Integration with applications that are based on SQL
  • 18. Should I be using NoSQL Databases?  NoSQL Data storage systems makes sense for applications that need to deal with very very large semi-structured data  Log Analysis  Social Networking Feeds  Most of us work on organizational databases, which are not that large and have low update/query rates  regular relational databases are THE correct solution for such applications
  • 19. Further Reading  Lots of material on the Web  E.g. nice presentation on NoSQL by Perry Hoekstra (Perficient)  Some material in this talk is from above presentation  Use a search engine to find information on data storage systems such as  BigTable (Google), Dynamo (Amazon), Cassandra (Facebook/Apache), Pnuts/Sherpa (Yahoo), CouchDB, MongoDB, …  Several of above are open source