SlideShare a Scribd company logo
1 of 45
Data Modeling for
Relationships Handling
and
Data Distribution
Dr. Fabio Fumarola
Agenda
• How to deal with relationships
– Graph Databases
– Materialized Views
• Modeling for Data Access
• Distribution Models
– Single server
– Sharding
– Master-Slave
– Peer-to-Peer
2
How to Deal with Relationships
• Aggregates are useful to put together data that is
commonly accessed jointly
• But we still have cases when we need to deal with
relationships.
• Also in the previous examples we modeled some
relation between orders and customers.
• In this cases we want to separate order and
customer aggregates but with some kind of
relationship.
3
How to Deal with Relationships
• The simplest way to provide such link is to embed
the ID of the customer within the order’s aggregate
data.
• In this way if we need data from a customer record,
– We read the customer ID and
– We make another call to the database to read the
customer data
• This can be applied several times but the database
does not know about such relationship.
4
How to Deal with Relationships
• However, provide a way to make a relationship
visible to the database can be useful.
• For example,
– Document store make the content of an aggregate
available to the database to form indexes.
– Some key-value store allows us to put link information in
metadata, supporting partial lookup (Riak).
5
Relationship, Aggregates and
Updates
• In the end, aggregate-oriented databases treat
aggregate as the unit of data-management.
• An atomicity is supported only at aggregate level.
• Thus, we need to model relation and relationships
basing data access patterns
– In the next slides we will explore how to deal with this.
• However, this problem make a good point to
introduce another category of NoSql databases:
Graph Databases
6
Graph Databases
• While other NoSql database are designed to run on a
cluster, Graph databases are not.
• Graph databases are motivated by a different
annoyance with relational databases.
• That is no SQL
• They have an opposite model, based on
– Small records with complex interconnections
7
Example
8
Graph Databases
• In the previous figure we have an example of graph
database known as The Graph of the Gods.
• Its nodes are very small, in number of attributes
• But, there is a rich structure of interconnections.
• Within this structure we can ask questions like:
1. saturn.in('father').in('father').name
2. hercules.out('father','mother').name
9
Graph Databases: features
• Their data models is very simple: nodes connected
by edges (also called arcs)
– FlockDB model only nodes and edges
– Neo4J allows us to attach java object to nodes and edges
– OrientDB stores Java Objects that are sub-classes of nodes
and edges.
• They are ideal for modeling large scale data with
complex relationship such as social networks,
product preferences, etc.
10
Graph Databases: features
• It allows us to query the modeled network as nodes
or edges based queries.
• Graph and relational databases are different for
Relationships and Joins.
• In relational databases:
– Relations are implemented using foreign keys.
– The joins required to navigate around can get quite
expensive (especially for strongly connected databases).
11
Graph Databases: features
• Graph databases:
– Make the relationship using edges (direct and undirected)
– Joins are done making traversal along the relationship
(very cheap)
• The fast traversal is because most of the work from
query time to insert time.
• This paying off in situations where querying
performance is more important than insert speed.
12
Graph Databases: Query
• Another key difference is how query are done.
• In relational database you make a query by selecting
some tuples using a where condition
• On a graph database you query starting from a node
(saturn, hercules,…).
• However nodes can be indexed by an attributes such
as ID.
• Thus, they expect most of the query work to be
navigating realtionships
13
Graph Databases: Rationale
• The emphasis on relationships makes graph
databases very different from aggregated-oriented
• However, these databases are:
– More likely to run on a single server,
– ACID transactions need to cover multiple nodes and edges
• The only think they have in common with aggregate-
oriented databases is the rejection of the relational
model.
14
Materialized views
• Wi aggregate-models we stressed their advantage of
aggregation
– If you want to access orders, it is useful to have orders
stored in a single unit
• We know that aggregated orientation has the
disadvantage over analytics queries.
• A first solution to reduce this problem is by building
an index but we are still working against aggregates
15
Materialized Views: RDBMS
• Relational databases offer another mechanism to
deal with analytics queries, that is views.
• A view is like A relational table but defined by
computation over the base tables
• When a views is accessed the databases execute the
computation required.
• Or with materialized views are computed in advance
and cached on disk
16
Materialized Views: NoSQL
• NoSQL database do not have views, but they have
pre-computed and cached queries.
• This is a central aspect for aggregate-oriented
databases since some queries may not fit with the
aggregate structure.
• Often, materialized views are created using map-
reduce computation.
17
Materialized Views: Approaches
• There are two approaches: Eager and Lazy.
• Eager
– the materialized view is updated when the base tables are
updated.
– This approach is good when we have more frequent reads
than writes
• Lazy:
– The updates are run via batch jobs at regular interval
– It is good when the data updates are not business critical
18
Materialized Views: Approaches
• Moreover, we can create views outside the
database.
• We can read the data, computing the view, and
saving it back to the database (MapReduce).
• Often the databases support building materialized
views themselves (Incremental MapReduce).
– We provide the need computation and
– The databases execute computation when needed
19
Materialized Views: Approaches
• They can be used within the same aggregate.
– An order document might include an order summary
element of the same order which can be used as view
• In column-oriented databases different column-
families are used for materialized views.
• An advantage of doing this is that it allows you to
update views with the same atomic operation.
20
Modeling for Data Access
• Now we can consider in detail how to model data
with aggregate-orientation
• We need to consider:
1. How the data is going to be read
2. What are the side effects on data related to those
aggregates
21
Modeling: key-value store
• In this scenario, the
application can read the
customer’s information and
all the related data by using
the key.
• If we need to read the
products sold in each order
we need to read and parse
all the object
22
Modeling: document
• When references are needed,
we can switch to documents
and query inside the documents
• With references we can find
orders independently from the
customer.
• With orderId reference we can
find all Orders
• But we have to push Order
reference into Customer
23
Modeling: column-family stores
• We have the benefit of the columns being ordered,
• It allows us to name consistently the columns that
are accessed frequently so that they are fetched
first.
• When using column-families to model data,
remember to do it per query requirements and not
for writing purpose.
• The general rule is to make it easy to query and
denormalize data during write.
24
Modeling: column-family stores
• There are multiple ways to model
data.
• One way is to store the Customer
and Order in different column
families.
• Note that the reference to all the
orders placed by customer are in
the Customer column-family.
• Similarly other denormalizations
are generally done so that query
(read) performance is improved.
25
Modeling: Graph Databases
• We model all objects as nodes and
relationship as edges.
• Relationship have types and directional
significance.
• Relationship have names, that let we
traverse the graph.
• Graph databases allows us to make
query such as: list all the customers
that purchased “akka in action”
26
Distribution Models
27
Distribution Models
• We already discussed the advantages of scale up vs.
scale out.
• Scale out is more appealing since we can run
databases on a cluster of servers.
• Depending on the distribution model the data store
can give us the ability:
1. To handle large quantity of data,
2. To process a greater read or write traffic
3. To have more availability in the case of network
slowdowns of breakages
28
Distribution Models
• These are important benefit, but they come at a cost.
• Running over a cluster introduces complexity.
• There are two path for distribution:
– Replication and
– Sharding
29
Distribution Model: Single Server
• It is the first and simplest distribution option.
• Also if NoSQL database are designed to run on a
cluster they can be used in a single server application.
• This make sense if a NoSQL database is more suited
for the application data model.
• Graph database are the more obvious
• If data usage is most about processing aggregates,
than a key or a document store may be useful.
30
Sharding
• Often, a data store is busy because different people
are accessing different part of the dataset.
• In this cases we can support horizontal scalability by
putting different part of the data onto different
servers (Sharding)
• The concept of sharding is not new as a part of
application logic.
• It consists in put all the customer with surname A-D
on one shard and E-G to another
31
Sharding
• This complicates the programming model as the
application code needs to distributed the load across
the shards
• In the ideal setting we have each user to talk one
server and the load is balanced.
• Of course the ideal case is rare.
32
Sharding: Approaches
• In order to get the ideal case we have to guarantee
that data accessed together are stored in the same
node.
– This is very simple using aggregates.
• When considering data distribution across nodes.
– If access is based on physical location, we can place data
close to where are accessed.
33
Sharding: Approaches
• Another factor is trying to keep data balanced.
• We should arrange aggregates so they are evenly
distributed in order that each node receive the same
amount of the load.
• Another approach is to put aggregate together if we
think they may be read in sequence (BigTable).
• In BigTable as examples data on web addresses are
stored in reverse domain names.
34
Sharding and NoSQL
• In general, many NoSQL databases offers auto-
sharding.
• This can make much easier to use sharding in an
application.
• Sharding is especially valuable for performance
because it improves read and write performances.
• It scales read and writes on the different nodes of
the same cluster.
35
Sharding and Resilience
• Sharding does little to improve resilience when used
alone.
• Since data is on different nodes, a node failure
makes shard’s data unavailable.
• So in practice, sharding alone is likely to decrease
resilience.
36
Sharding: right time
• Some databases are intended to be sharded at the
beginning
• Some other let us start with a single node and then
distribute and shard.
• However, sharding very late may create trouble
– especially if done in production where the database
became essentially unavailable during the moving of the
data to the new shards.
37
Master-Slave Replication
• In this setting one node is designated as the master,
or primary ad the other as slaves.
• The master is the authoritative source for the date
and designed to process updates and send them to
slaves.
• The slaves are used for read operations.
• This allows us to scale in data intensive dataset.
38
Master-Slave Replication
• We can scale horizontally by adding more slaves
• But, we are limited by the ability of the master in
processing incoming data.
• An advantage is read resilience.
– Also if the master fails the slaves can still handle read
requests.
– Anyway writes are not allowed until the master is not
restored.
39
Master-Slave Replication
• Another characteristic is that a slave can be
appointed as master.
• Masters can be appointed manually or automatically.
• In order to achieve resilience we need that read and
write paths are different.
• This is normally done using separate database
connections.
40
Master-Slave Replication
• Replication in master-slave have the analyzed
advantages but it come with the problem of
inconsistency.
• The readers reading from the slaves can read data
not updated.
41
Peer-to-Peer Replication
• Master-Slave replication helps with read scalability
but has problems on scalability of writes.
• Moreover, it provides resilience on read but not on
writes.
• The master is still a single point of failure.
• Peer-to-Peer attacks these problems by not having a
master.
42
Peer-to-Peer Replication
• All the replica are equal (accept writes and reads)
• With a Peer-to-Peer we can have node failures
without lose write capability and losing data.
43
Peer-to-Peer Replication
• Furthermore we can easily add nodes for
performances.
• The bigger compliance here is consistency.
• When we can write on different nodes, we increase
the probability to have inconsistency on writes.
• However there is a way to deal with this problem.
44
Combining Sharing with Replication
• Master-slave and sharding: we have multiple
masters, but each data has a single master.
– Depending on the configuration we can decide the master
for each group of data.
• Peer-to-Peer and sharding is a common strategy for
column-family databases.
– This is commonly composed using replication of the shards
45

More Related Content

What's hot

NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.Navdeep Charan
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management SystemHardik Patil
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouseKrish_ver2
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management systemVinay D. Patel
 
Dbms classification according to data models
Dbms classification according to data modelsDbms classification according to data models
Dbms classification according to data modelsABDUL KHALIQ
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouseJ M
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data modeljagdish_93
 
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)Ravinder Kamboj
 
Complete dbms notes
Complete dbms notesComplete dbms notes
Complete dbms notesTanya Makkar
 

What's hot (20)

Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Rdbms
RdbmsRdbms
Rdbms
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management system
 
Dbms classification according to data models
Dbms classification according to data modelsDbms classification according to data models
Dbms classification according to data models
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)
 
Polyglot Persistence
Polyglot Persistence Polyglot Persistence
Polyglot Persistence
 
Complete dbms notes
Complete dbms notesComplete dbms notes
Complete dbms notes
 
Data models
Data modelsData models
Data models
 

Similar to 6 Data Modeling for NoSQL 2/2

Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Introduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databasesIntroduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databasesShilpaKrishna6
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture PatternsMaynooth University
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
SQL vs NoSQL Data Modeling.pptx
SQL vs NoSQL Data Modeling.pptxSQL vs NoSQL Data Modeling.pptx
SQL vs NoSQL Data Modeling.pptxGarimaHasija1
 
RELATIONAL MODEL OF DATABASES AND OTHER CONCEPTS OF DATABASES​
RELATIONAL MODEL OF DATABASES AND OTHER CONCEPTS OF DATABASES​RELATIONAL MODEL OF DATABASES AND OTHER CONCEPTS OF DATABASES​
RELATIONAL MODEL OF DATABASES AND OTHER CONCEPTS OF DATABASES​EdwinJacob5
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Use a data parallel approach to proAcess
Use a data parallel approach to proAcessUse a data parallel approach to proAcess
Use a data parallel approach to proAcess23mz02
 
Chapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsChapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsnehabsairam
 
Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsTerry Bunio
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesMaynooth University
 
The final frontier
The final frontierThe final frontier
The final frontierTerry Bunio
 
Types of data bases
Types of data basesTypes of data bases
Types of data basesJanu Jahnavi
 
Nosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxNosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxRadhika R
 
Nosql part1 8th December
Nosql part1 8th December Nosql part1 8th December
Nosql part1 8th December Ruru Chowdhury
 

Similar to 6 Data Modeling for NoSQL 2/2 (20)

Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
NoSql Brownbag
NoSql BrownbagNoSql Brownbag
NoSql Brownbag
 
Introduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databasesIntroduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databases
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture Patterns
 
10. Graph Databases
10. Graph Databases10. Graph Databases
10. Graph Databases
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
SQL vs NoSQL Data Modeling.pptx
SQL vs NoSQL Data Modeling.pptxSQL vs NoSQL Data Modeling.pptx
SQL vs NoSQL Data Modeling.pptx
 
RELATIONAL MODEL OF DATABASES AND OTHER CONCEPTS OF DATABASES​
RELATIONAL MODEL OF DATABASES AND OTHER CONCEPTS OF DATABASES​RELATIONAL MODEL OF DATABASES AND OTHER CONCEPTS OF DATABASES​
RELATIONAL MODEL OF DATABASES AND OTHER CONCEPTS OF DATABASES​
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Use a data parallel approach to proAcess
Use a data parallel approach to proAcessUse a data parallel approach to proAcess
Use a data parallel approach to proAcess
 
Chapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsChapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortals
 
dbms introduction.pptx
dbms introduction.pptxdbms introduction.pptx
dbms introduction.pptx
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
 
Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling Topics
 
No SQL
No SQLNo SQL
No SQL
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
The final frontier
The final frontierThe final frontier
The final frontier
 
Types of data bases
Types of data basesTypes of data bases
Types of data bases
 
Nosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxNosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptx
 
Nosql part1 8th December
Nosql part1 8th December Nosql part1 8th December
Nosql part1 8th December
 

More from Fabio Fumarola

11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2Fabio Fumarola
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2Fabio Fumarola
 
10b. Graph Databases Lab
10b. Graph Databases Lab10b. Graph Databases Lab
10b. Graph Databases LabFabio Fumarola
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases labFabio Fumarola
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 
8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab8b. Column Oriented Databases Lab
8b. Column Oriented Databases LabFabio Fumarola
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with DockerFabio Fumarola
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databasesFabio Fumarola
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory Fabio Fumarola
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and DockerFabio Fumarola
 
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...Fabio Fumarola
 
An introduction to maven gradle and sbt
An introduction to maven gradle and sbtAn introduction to maven gradle and sbt
An introduction to maven gradle and sbtFabio Fumarola
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and dockerFabio Fumarola
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and dockerFabio Fumarola
 
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce Fabio Fumarola
 

More from Fabio Fumarola (20)

11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2
 
10b. Graph Databases Lab
10b. Graph Databases Lab10b. Graph Databases Lab
10b. Graph Databases Lab
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
3 Git
3 Git3 Git
3 Git
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and Docker
 
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
 
Scala and spark
Scala and sparkScala and spark
Scala and spark
 
Hbase an introduction
Hbase an introductionHbase an introduction
Hbase an introduction
 
An introduction to maven gradle and sbt
An introduction to maven gradle and sbtAn introduction to maven gradle and sbt
An introduction to maven gradle and sbt
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and docker
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
 
08 datasets
08 datasets08 datasets
08 datasets
 
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
 

Recently uploaded

如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...yulianti213969
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunksgmuir1066
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444saurabvyas476
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfgreat91
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证ppy8zfkfm
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 

Recently uploaded (20)

如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 

6 Data Modeling for NoSQL 2/2

  • 1. Data Modeling for Relationships Handling and Data Distribution Dr. Fabio Fumarola
  • 2. Agenda • How to deal with relationships – Graph Databases – Materialized Views • Modeling for Data Access • Distribution Models – Single server – Sharding – Master-Slave – Peer-to-Peer 2
  • 3. How to Deal with Relationships • Aggregates are useful to put together data that is commonly accessed jointly • But we still have cases when we need to deal with relationships. • Also in the previous examples we modeled some relation between orders and customers. • In this cases we want to separate order and customer aggregates but with some kind of relationship. 3
  • 4. How to Deal with Relationships • The simplest way to provide such link is to embed the ID of the customer within the order’s aggregate data. • In this way if we need data from a customer record, – We read the customer ID and – We make another call to the database to read the customer data • This can be applied several times but the database does not know about such relationship. 4
  • 5. How to Deal with Relationships • However, provide a way to make a relationship visible to the database can be useful. • For example, – Document store make the content of an aggregate available to the database to form indexes. – Some key-value store allows us to put link information in metadata, supporting partial lookup (Riak). 5
  • 6. Relationship, Aggregates and Updates • In the end, aggregate-oriented databases treat aggregate as the unit of data-management. • An atomicity is supported only at aggregate level. • Thus, we need to model relation and relationships basing data access patterns – In the next slides we will explore how to deal with this. • However, this problem make a good point to introduce another category of NoSql databases: Graph Databases 6
  • 7. Graph Databases • While other NoSql database are designed to run on a cluster, Graph databases are not. • Graph databases are motivated by a different annoyance with relational databases. • That is no SQL • They have an opposite model, based on – Small records with complex interconnections 7
  • 9. Graph Databases • In the previous figure we have an example of graph database known as The Graph of the Gods. • Its nodes are very small, in number of attributes • But, there is a rich structure of interconnections. • Within this structure we can ask questions like: 1. saturn.in('father').in('father').name 2. hercules.out('father','mother').name 9
  • 10. Graph Databases: features • Their data models is very simple: nodes connected by edges (also called arcs) – FlockDB model only nodes and edges – Neo4J allows us to attach java object to nodes and edges – OrientDB stores Java Objects that are sub-classes of nodes and edges. • They are ideal for modeling large scale data with complex relationship such as social networks, product preferences, etc. 10
  • 11. Graph Databases: features • It allows us to query the modeled network as nodes or edges based queries. • Graph and relational databases are different for Relationships and Joins. • In relational databases: – Relations are implemented using foreign keys. – The joins required to navigate around can get quite expensive (especially for strongly connected databases). 11
  • 12. Graph Databases: features • Graph databases: – Make the relationship using edges (direct and undirected) – Joins are done making traversal along the relationship (very cheap) • The fast traversal is because most of the work from query time to insert time. • This paying off in situations where querying performance is more important than insert speed. 12
  • 13. Graph Databases: Query • Another key difference is how query are done. • In relational database you make a query by selecting some tuples using a where condition • On a graph database you query starting from a node (saturn, hercules,…). • However nodes can be indexed by an attributes such as ID. • Thus, they expect most of the query work to be navigating realtionships 13
  • 14. Graph Databases: Rationale • The emphasis on relationships makes graph databases very different from aggregated-oriented • However, these databases are: – More likely to run on a single server, – ACID transactions need to cover multiple nodes and edges • The only think they have in common with aggregate- oriented databases is the rejection of the relational model. 14
  • 15. Materialized views • Wi aggregate-models we stressed their advantage of aggregation – If you want to access orders, it is useful to have orders stored in a single unit • We know that aggregated orientation has the disadvantage over analytics queries. • A first solution to reduce this problem is by building an index but we are still working against aggregates 15
  • 16. Materialized Views: RDBMS • Relational databases offer another mechanism to deal with analytics queries, that is views. • A view is like A relational table but defined by computation over the base tables • When a views is accessed the databases execute the computation required. • Or with materialized views are computed in advance and cached on disk 16
  • 17. Materialized Views: NoSQL • NoSQL database do not have views, but they have pre-computed and cached queries. • This is a central aspect for aggregate-oriented databases since some queries may not fit with the aggregate structure. • Often, materialized views are created using map- reduce computation. 17
  • 18. Materialized Views: Approaches • There are two approaches: Eager and Lazy. • Eager – the materialized view is updated when the base tables are updated. – This approach is good when we have more frequent reads than writes • Lazy: – The updates are run via batch jobs at regular interval – It is good when the data updates are not business critical 18
  • 19. Materialized Views: Approaches • Moreover, we can create views outside the database. • We can read the data, computing the view, and saving it back to the database (MapReduce). • Often the databases support building materialized views themselves (Incremental MapReduce). – We provide the need computation and – The databases execute computation when needed 19
  • 20. Materialized Views: Approaches • They can be used within the same aggregate. – An order document might include an order summary element of the same order which can be used as view • In column-oriented databases different column- families are used for materialized views. • An advantage of doing this is that it allows you to update views with the same atomic operation. 20
  • 21. Modeling for Data Access • Now we can consider in detail how to model data with aggregate-orientation • We need to consider: 1. How the data is going to be read 2. What are the side effects on data related to those aggregates 21
  • 22. Modeling: key-value store • In this scenario, the application can read the customer’s information and all the related data by using the key. • If we need to read the products sold in each order we need to read and parse all the object 22
  • 23. Modeling: document • When references are needed, we can switch to documents and query inside the documents • With references we can find orders independently from the customer. • With orderId reference we can find all Orders • But we have to push Order reference into Customer 23
  • 24. Modeling: column-family stores • We have the benefit of the columns being ordered, • It allows us to name consistently the columns that are accessed frequently so that they are fetched first. • When using column-families to model data, remember to do it per query requirements and not for writing purpose. • The general rule is to make it easy to query and denormalize data during write. 24
  • 25. Modeling: column-family stores • There are multiple ways to model data. • One way is to store the Customer and Order in different column families. • Note that the reference to all the orders placed by customer are in the Customer column-family. • Similarly other denormalizations are generally done so that query (read) performance is improved. 25
  • 26. Modeling: Graph Databases • We model all objects as nodes and relationship as edges. • Relationship have types and directional significance. • Relationship have names, that let we traverse the graph. • Graph databases allows us to make query such as: list all the customers that purchased “akka in action” 26
  • 28. Distribution Models • We already discussed the advantages of scale up vs. scale out. • Scale out is more appealing since we can run databases on a cluster of servers. • Depending on the distribution model the data store can give us the ability: 1. To handle large quantity of data, 2. To process a greater read or write traffic 3. To have more availability in the case of network slowdowns of breakages 28
  • 29. Distribution Models • These are important benefit, but they come at a cost. • Running over a cluster introduces complexity. • There are two path for distribution: – Replication and – Sharding 29
  • 30. Distribution Model: Single Server • It is the first and simplest distribution option. • Also if NoSQL database are designed to run on a cluster they can be used in a single server application. • This make sense if a NoSQL database is more suited for the application data model. • Graph database are the more obvious • If data usage is most about processing aggregates, than a key or a document store may be useful. 30
  • 31. Sharding • Often, a data store is busy because different people are accessing different part of the dataset. • In this cases we can support horizontal scalability by putting different part of the data onto different servers (Sharding) • The concept of sharding is not new as a part of application logic. • It consists in put all the customer with surname A-D on one shard and E-G to another 31
  • 32. Sharding • This complicates the programming model as the application code needs to distributed the load across the shards • In the ideal setting we have each user to talk one server and the load is balanced. • Of course the ideal case is rare. 32
  • 33. Sharding: Approaches • In order to get the ideal case we have to guarantee that data accessed together are stored in the same node. – This is very simple using aggregates. • When considering data distribution across nodes. – If access is based on physical location, we can place data close to where are accessed. 33
  • 34. Sharding: Approaches • Another factor is trying to keep data balanced. • We should arrange aggregates so they are evenly distributed in order that each node receive the same amount of the load. • Another approach is to put aggregate together if we think they may be read in sequence (BigTable). • In BigTable as examples data on web addresses are stored in reverse domain names. 34
  • 35. Sharding and NoSQL • In general, many NoSQL databases offers auto- sharding. • This can make much easier to use sharding in an application. • Sharding is especially valuable for performance because it improves read and write performances. • It scales read and writes on the different nodes of the same cluster. 35
  • 36. Sharding and Resilience • Sharding does little to improve resilience when used alone. • Since data is on different nodes, a node failure makes shard’s data unavailable. • So in practice, sharding alone is likely to decrease resilience. 36
  • 37. Sharding: right time • Some databases are intended to be sharded at the beginning • Some other let us start with a single node and then distribute and shard. • However, sharding very late may create trouble – especially if done in production where the database became essentially unavailable during the moving of the data to the new shards. 37
  • 38. Master-Slave Replication • In this setting one node is designated as the master, or primary ad the other as slaves. • The master is the authoritative source for the date and designed to process updates and send them to slaves. • The slaves are used for read operations. • This allows us to scale in data intensive dataset. 38
  • 39. Master-Slave Replication • We can scale horizontally by adding more slaves • But, we are limited by the ability of the master in processing incoming data. • An advantage is read resilience. – Also if the master fails the slaves can still handle read requests. – Anyway writes are not allowed until the master is not restored. 39
  • 40. Master-Slave Replication • Another characteristic is that a slave can be appointed as master. • Masters can be appointed manually or automatically. • In order to achieve resilience we need that read and write paths are different. • This is normally done using separate database connections. 40
  • 41. Master-Slave Replication • Replication in master-slave have the analyzed advantages but it come with the problem of inconsistency. • The readers reading from the slaves can read data not updated. 41
  • 42. Peer-to-Peer Replication • Master-Slave replication helps with read scalability but has problems on scalability of writes. • Moreover, it provides resilience on read but not on writes. • The master is still a single point of failure. • Peer-to-Peer attacks these problems by not having a master. 42
  • 43. Peer-to-Peer Replication • All the replica are equal (accept writes and reads) • With a Peer-to-Peer we can have node failures without lose write capability and losing data. 43
  • 44. Peer-to-Peer Replication • Furthermore we can easily add nodes for performances. • The bigger compliance here is consistency. • When we can write on different nodes, we increase the probability to have inconsistency on writes. • However there is a way to deal with this problem. 44
  • 45. Combining Sharing with Replication • Master-slave and sharding: we have multiple masters, but each data has a single master. – Depending on the configuration we can decide the master for each group of data. • Peer-to-Peer and sharding is a common strategy for column-family databases. – This is commonly composed using replication of the shards 45

Editor's Notes

  1. Hercules 2. Jupiter, Alcmene
  2. A relational table but defined by computation over the base tables
  3. (or split key-value into customer and orders).