SlideShare a Scribd company logo
1 of 54
Aggregate Data Models
Dr. Fabio Fumarola
Agenda
• Data Model Evolution
• Relational Model vs Aggregate Model
• Consequences of Aggregate Models
• Aggregates and Transactions
• Aggregates Models on NoSQL
– Key-value and Document
– Column-Family Stores
• Summarizing Aggregate-Oriented databases
2
Data Model
• A data model is a representation that we use to
perceive and manipulate our data.
• It allows us to:
– Represent the data elements under analysis, and
– How these are related to each others
• This representation depends on our perception.
3
Data Model: Database View
• In the database field, it describes how we interact
with the data in the database.
• This is distinct from the storage model:
– It describes how the database stores and manipulate the
data internally.
• In an ideal worlds:
– We should be ignorant of the storage model, but
– In practice we need at least some insight to achieve a
decent performance
4
Data Models: Example
• A Data model is the model of the specific data in an
application
• A developer might point to an entity-relationship
diagram and refer it as the data model containing
– customers,
– orders and
– products
5
Data Model: Definition
In this course we will refer “data model” as the model
by which the database organize data.
It can be more formally defined as meta-model
6
Last Decades Data Model
• The dominant data model of the last decades what
the relational data model.
1. It can be represented as a set of tables.
2. Each table has rows, with each row representing
some entity of interest.
3. We describe entities through columns (???)
4. A column may refer to another row in the same or
different table (relationship).
7
NoSQL Data Model
• It moves away from the relational data model
• Each NoSQL database has a different model
– Key-value,
– Document,
– Column-family,
– Graph, and
– Sparse (Index based)
• Of these, the first three share a common
characteristic (Aggregate Orientation).
8
Relational Model
vs
Aggregate Model
9
Relational Model
• The relational model takes the information that we
want to store and divides it into tuples (rows).
• However, a tuple is a limited data structure.
• It captures a set of values.
• So, we can’t nest one tuple within another to get
nested records.
• Nor we can put a list of values or tuple within
another.
10
Relational Model
• This simplicity characterize the relational model
• It allows us to think on data manipulation as
operation that have:
– As input tuples, and
– Return tuples
• Aggregate orientation takes a different approach.
11
Aggregate Model
• It recognizes that, you want to operate on data unit
having a more complex structure than a set of tuples.
• We can think on term of complex record that allows:
– List,
– Map,
– And other data structures to be nested inside it
• Key-Value, document, and column-family databases
uses this complex structure.
12
Aggregate Model
• Aggregate is a term coming from Domain-Driven
Design [Evans03]
– An aggregate is a collection of related objects that we wish
to treat as a unit for data manipulation, management a
consistency.
• We like to update aggregates with atomic operation
• We like to communicate with our data storage in
terms of aggregates
13http://pbdmng.datatoknowledge.it/readingMaterial/Evans03.pdf
Aggregate Models
• This definition matches really with how key-value,
document, and column-family databases works.
• With aggregates we can easier work on a cluster,
since they are unit for replication and sharding.
• Aggregates are also easier for application
programmer to work since solve the impedance
mismatch problem of relational databases.
14
Example of Relational Model
• Assume we are building
an e-commerce
website;
• We have to store
information about:
users, products, orders,
shipping addresses,
billing addresses, and
payment data.
15
Example of Relational Model
• As we are good
relational soldier:
– Everything is
normalized
– No data is repeated
in multiple tables.
– We have referential
integrity
16
Example of Aggregate Model
• We have two aggregates:
– Customers and
– Orders
• We use the black-
diamond composition to
show how data fits into
the aggregate structure
17
Example of Aggregate Model
• The customer contains a
list of billing addresses;
• The order contains a list
of:
– order items,
– a shipping address,
– and payments
• The payment itself
contains a billing address
for that payment
18
Example of Aggregate Model
• A single address appears 3
times, but instead of using an
id it is copied each time
• This fits a domain where we
don’t want shipping,
payment and billing address
to change
• What is the difference w.r.t a
relational representation?
19
Example of Aggregate Model
• The link between customer and the order is a relationship
between aggregates
20
//Customer
{
"id": 1,
"name": "Fabio",
"billingAddress": [
{
"city": "Bari"
}
]
}
//Orders
{
"id": 99,
"customerId": 1,
"orderItems": [
{
"productId": 27,
"price": 34,
"productName": "Scala in Action”
} ],
"shippingAddress": [ {"city": "Bari”} ],
"orderPayment": [
{ "ccinfo": "100-432423-545-134",
"txnId": "afdfsdfsd",
"billingAddress": [ {"city": "Bari” }]
} ]
}
Orders Details
•There is the customer id
•The product name is a part of the
ordered Items
•The product id is part of the
ordered items
•The address is stored several
times
21
//Orders
{
"id": 99,
"customerId": 1,
"orderItems": [
{
"productId": 27,
"price": 34,
"productName": "Scala in Action”
} ],
"shippingAddress": [ {"city": "Bari”} ],
"orderPayment": [
{ "ccinfo": "100-432423-545-134",
"txnId": "afdfsdfsd",
"billingAddress": [ {"city": "Bari” }]
} ]
}
Example: Rationale
• We aggregate to minimize the number of aggregates
we access during data interaction
• The important think to notice is that,
1. We have to think about accessing that data
2. We make this part of our thinking when developing the
application data model
• We could draw our aggregate differently, but it really
depends on the “data accessing models”
22
Example: Rationale
• Like most thinks in modeling, there is no an universal
answer on how we draw aggregate boundaries.
• It depends on how we have to manipulate our data
1. If we tend to access a customer with all its orders at
once, then we should prefer a single aggregate.
2. If we tend to access a single order at time, then we
should prefer having separate aggregates for each order.
23
Consequences of Aggregate Models
24
No Distributable Storage
• Relational mapping can captures data elements and
their relationship well.
• It does not need any notion of aggregate entity,
because it uses foreign key relationship.
• But we cannot distinguish for a relationship that
represent aggregations from those that don’t.
• As result we cannot take advantage of that
knowledge to store and distribute our data.
25
Marking Aggregate Tools
• Many data modeling techniques provides way to
mark aggregate structures in relational models
• However, they do not provide semantic that helps in
distinguish relationships
• When working with aggregate-oriented databases,
we have a clear view of the semantic of the data.
• We can focus on the unit of interaction with the data
storage.
26
Aggregate Ignorant
• Relational database are aggregate-ignorant, since
they don’t have concept of aggregate
• Also graph database are aggregate-ignorant.
• This is not always bad.
• In domains where it is difficult to draw aggregate
boundaries aggregate-ignorant databases are useful.
27
Aggregate and Operations
• An order is a good aggregate when:
– A customer is making and reviewing an order, and
– When the retailer is processing orders
• However, when the retailer want to analyze its
product sales over the last months, then aggregate
are trouble.
• We need to analyze each aggregate to extract sales
history.
28
Aggregate and Operations
• Aggregate may help in some operation and not in
others.
• In cases where there is not a clear view aggregate-
ignorant database are the best option.
• But, remember the point that drove us to aggregate
models (cluster distribution).
• Running databases on a cluster is need when dealing
with huge quantities of data.
29
Running on a Cluster
• It gives several advantages on computation power
and data distribution
• However, it requires to minimize the number of
nodes to query when gathering data
• By explicitly including aggregates, we give the
database an important view of which information
should be stored together
• But, still we have the problem on querying historical
data, do we have any solution?
30
Aggregates and TRansactions
31
ACID transactions
• Relational database allow us to manipulate any
combination of rows from any table in a single
transaction.
• ACID transactions:
– Atomic,
– Consistent,
– Isolated, and
– Durable
have the main point in Atomicity.
32
Atomicity & RDBMS
• Many rows spanning many tables are updated into
an Atomic operation
• It may succeeded or failed entirely
• Concurrently operations are isolated and we cannot
see partial updates
• However relational database still fail.
33
Atomicity & NoSQL
• NoSQL don’t support Atomicity that spans multiple
aggregates.
• This means that if we need to update multiple
aggregates we have to manage that in the
application code.
• Thus the Atomicity is one of the consideration for
deciding how to divide up our data into aggregates
34
Aggregates Models on NoSQL
Key-value and Document
35
Key-Value and Document
• Key-value and Document databases are strongly
aggregate-oriented.
• Both of these types of databases consists of lot of
aggregates with a key used to get the data.
• The two type of databases differ in that:
– In a key-value stores the aggregate is opaque (Blob)
– In a document database we can see a structure in the
aggregate.
36
Key-Value and Document
• The advantage of key-value is that we can store any
type of object
• The database may impose some size limit, but we
have freedom
• A document store imposes limits on what we can
place in it, defining a structure on the data.
– In return we a language to query documents.
37
Key-Value and Document
• With a key-value we can only access by its key
• With document:
– We can submit queries based on fields,
– We can retrieve part of the aggregate, and
– The database can create index based on the fields of the
aggregate.
• But in practice they are used differently
38
Key-Value and Document
• People use document as key-value
• Riak (key-value) allows you to add metadata to
aggregates for indexing
• Redis allows you to break aggregates into lists, sets
or maps.
• You can support queries by integrating search tools
like Solr. (Riak include solr for searching data stored
as XML or JSON).
39
Key-Value and Document
Despite this the general distinction still holds.
•With key-value databases we expect aggregates using
a key
•With document databases, we mostly expect to
submit some form of query on the internal structure of
the documents.
40
Aggregates Models on NoSQL
Column-Family Stores
41
Column-Family Stores
• One of the most influential NoSQL databases was
Google’s BigTable [Chang et al.]
• Its name derives from its structure composed by
sparse columns and no schema.
• We don’t have to think to this structure as a table,
but to a two-level map.
• BigTable models influenced the design the open
source HBase and Cassandra.
42
Column-Family Stores
• These BigTable-style data model are referred to as
column stores.
• Pre-NoSQL column stores like C-Store used SQL and
the relational model.
• What make NoSQL columns store different is how
physically they store data.
• Most databases has rows as unit of storage, which
helps in writing performances
43
Column-Family Stores
• However, there are many scenarios where:
– Write are rares, but
– You need to read a few columns of many rows at once
• In this situations, it’s better to store groups of
columns for all rows as the basic storage unit.
• These kind of databases are called column stores or
column-family databases
44
Column-Family Stores
• Column-family databases have a two-level aggregate
structure.
• Similarly to key-value the first key is the row
identifier.
• The difference is that retrieving a key return a Map
of more detailed values.
• These second-level values are defined to as columns.
• Fixing a row we can access to all the column-families
or to a particular element.
45
Example of Column Model
46
Column-Family Stores
• They organize their columns into families.
• Each column is a part of a family, and column
family acts as unit of access.
• Then the data for a particular column family
are accessed together.
47
Column-Family Stores:
How to structure data
• In row-oriented:
– each row is an aggregate (For example the customer with
id 456),
– with column families representing useful chunks of data
(profile, order history) within that aggregate
• In column-oriented:
– each column family defines a record type (e.g. customer
profiles) with rows for each of the records.
– You can this of a row as the join of records in all column-
families
48
Column Family: Storage Insights
• Since the database knows about column grouping, it
uses this information for storage and access
behavior.
• If we consider Document store, also if a document as
an array of elements the document is a single unit of
storage.
• Column families give a two-level dimension of stored
data.
49
Modeling Strategies on Column DB
• We can model list based elements as a
column-family.
• Thus we can have column-families with
many elements and other with few
element.
• This may let to high fragmentation on
data stored.
• In Cassandra this problem is faced by
defining: Skinny and Wide rows.
50
Modeling Strategies on Column DB
• Another characteristic is that the
element in column-families are sorted
by their keys.
• For the orders, this would be useful if
we made a key out of a concatenation
of date and id.
51
Summarizing Aggregate-Oriented
databases
52
Key Points
• All share the notion of an aggregated indexed by a
key.
• The key can be used for lookup
• The aggregate is central to running on a cluster.
• The aggregates acts as the atomic unit for updates
providing transaction on aggregates.
• Key-value treats the values as Blob
53
Key Points
• The document model makes the aggregate
transparent for queries, but the document is treated
as single unit of storage.
• Column-family models divide aggregates into column
families, allowing the database to treat them as units
of data in the aggregates.
• This imposes some structure but allows the database
to take advantage of the structure to improve its
accessibility.
54

More Related Content

What's hot

Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisVincenzo Gulisano
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Object Relational Database Management System(ORDBMS)
Object Relational Database Management System(ORDBMS)Object Relational Database Management System(ORDBMS)
Object Relational Database Management System(ORDBMS)Rabin BK
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemRutvik Bapat
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessingSalah Amean
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management systemPooja Dixit
 

What's hot (20)

DDBMS Paper with Solution
DDBMS Paper with SolutionDDBMS Paper with Solution
DDBMS Paper with Solution
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Ddbms1
Ddbms1Ddbms1
Ddbms1
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Object Relational Database Management System(ORDBMS)
Object Relational Database Management System(ORDBMS)Object Relational Database Management System(ORDBMS)
Object Relational Database Management System(ORDBMS)
 
Active database
Active databaseActive database
Active database
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Data models in NoSQL
Data models in NoSQLData models in NoSQL
Data models in NoSQL
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Distributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data ControlDistributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data Control
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Replication in Distributed Systems
Replication in Distributed SystemsReplication in Distributed Systems
Replication in Distributed Systems
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
data-mining-tutorial.ppt
data-mining-tutorial.pptdata-mining-tutorial.ppt
data-mining-tutorial.ppt
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 

Similar to 5 Data Modeling for NoSQL 1/2

Chapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsChapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsnehabsairam
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsTerry Bunio
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3Terry Bunio
 
Data modelling it's process and examples
Data modelling it's process and examplesData modelling it's process and examples
Data modelling it's process and examplesJayeshGadhave1
 
Database Models, Client-Server Architecture, Distributed Database and Classif...
Database Models, Client-Server Architecture, Distributed Database and Classif...Database Models, Client-Server Architecture, Distributed Database and Classif...
Database Models, Client-Server Architecture, Distributed Database and Classif...Rubal Sagwal
 
Unit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptxUnit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptxMaryJoseph79
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
DBMS data modeling.pptx
DBMS data modeling.pptxDBMS data modeling.pptx
DBMS data modeling.pptxMrwafaAbbas
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture PatternsMaynooth University
 
Hackolade Tutorial - part 1 - What is a data model
Hackolade Tutorial - part 1 - What is a data modelHackolade Tutorial - part 1 - What is a data model
Hackolade Tutorial - part 1 - What is a data modelPascalDesmarets1
 

Similar to 5 Data Modeling for NoSQL 1/2 (20)

10. Graph Databases
10. Graph Databases10. Graph Databases
10. Graph Databases
 
Chapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsChapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortals
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling Topics
 
mod 2.pdf
mod 2.pdfmod 2.pdf
mod 2.pdf
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
 
Data modelling it's process and examples
Data modelling it's process and examplesData modelling it's process and examples
Data modelling it's process and examples
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Database Models, Client-Server Architecture, Distributed Database and Classif...
Database Models, Client-Server Architecture, Distributed Database and Classif...Database Models, Client-Server Architecture, Distributed Database and Classif...
Database Models, Client-Server Architecture, Distributed Database and Classif...
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Unit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptxUnit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptx
 
Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
DBMS data modeling.pptx
DBMS data modeling.pptxDBMS data modeling.pptx
DBMS data modeling.pptx
 
NoSql
NoSqlNoSql
NoSql
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture Patterns
 
(Dbms) class 1 & 2 (Presentation)
(Dbms) class 1 & 2 (Presentation)(Dbms) class 1 & 2 (Presentation)
(Dbms) class 1 & 2 (Presentation)
 
Hackolade Tutorial - part 1 - What is a data model
Hackolade Tutorial - part 1 - What is a data modelHackolade Tutorial - part 1 - What is a data model
Hackolade Tutorial - part 1 - What is a data model
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
NoSql Brownbag
NoSql BrownbagNoSql Brownbag
NoSql Brownbag
 

More from Fabio Fumarola

11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2Fabio Fumarola
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2Fabio Fumarola
 
10b. Graph Databases Lab
10b. Graph Databases Lab10b. Graph Databases Lab
10b. Graph Databases LabFabio Fumarola
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases labFabio Fumarola
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 
8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab8b. Column Oriented Databases Lab
8b. Column Oriented Databases LabFabio Fumarola
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with DockerFabio Fumarola
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databasesFabio Fumarola
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory Fabio Fumarola
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and DockerFabio Fumarola
 
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...Fabio Fumarola
 
An introduction to maven gradle and sbt
An introduction to maven gradle and sbtAn introduction to maven gradle and sbt
An introduction to maven gradle and sbtFabio Fumarola
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and dockerFabio Fumarola
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and dockerFabio Fumarola
 
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce Fabio Fumarola
 

More from Fabio Fumarola (20)

11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2
 
10b. Graph Databases Lab
10b. Graph Databases Lab10b. Graph Databases Lab
10b. Graph Databases Lab
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
3 Git
3 Git3 Git
3 Git
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and Docker
 
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
 
Scala and spark
Scala and sparkScala and spark
Scala and spark
 
Hbase an introduction
Hbase an introductionHbase an introduction
Hbase an introduction
 
An introduction to maven gradle and sbt
An introduction to maven gradle and sbtAn introduction to maven gradle and sbt
An introduction to maven gradle and sbt
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and docker
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
 
08 datasets
08 datasets08 datasets
08 datasets
 
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
 

Recently uploaded

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Recently uploaded (20)

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

5 Data Modeling for NoSQL 1/2

  • 1. Aggregate Data Models Dr. Fabio Fumarola
  • 2. Agenda • Data Model Evolution • Relational Model vs Aggregate Model • Consequences of Aggregate Models • Aggregates and Transactions • Aggregates Models on NoSQL – Key-value and Document – Column-Family Stores • Summarizing Aggregate-Oriented databases 2
  • 3. Data Model • A data model is a representation that we use to perceive and manipulate our data. • It allows us to: – Represent the data elements under analysis, and – How these are related to each others • This representation depends on our perception. 3
  • 4. Data Model: Database View • In the database field, it describes how we interact with the data in the database. • This is distinct from the storage model: – It describes how the database stores and manipulate the data internally. • In an ideal worlds: – We should be ignorant of the storage model, but – In practice we need at least some insight to achieve a decent performance 4
  • 5. Data Models: Example • A Data model is the model of the specific data in an application • A developer might point to an entity-relationship diagram and refer it as the data model containing – customers, – orders and – products 5
  • 6. Data Model: Definition In this course we will refer “data model” as the model by which the database organize data. It can be more formally defined as meta-model 6
  • 7. Last Decades Data Model • The dominant data model of the last decades what the relational data model. 1. It can be represented as a set of tables. 2. Each table has rows, with each row representing some entity of interest. 3. We describe entities through columns (???) 4. A column may refer to another row in the same or different table (relationship). 7
  • 8. NoSQL Data Model • It moves away from the relational data model • Each NoSQL database has a different model – Key-value, – Document, – Column-family, – Graph, and – Sparse (Index based) • Of these, the first three share a common characteristic (Aggregate Orientation). 8
  • 10. Relational Model • The relational model takes the information that we want to store and divides it into tuples (rows). • However, a tuple is a limited data structure. • It captures a set of values. • So, we can’t nest one tuple within another to get nested records. • Nor we can put a list of values or tuple within another. 10
  • 11. Relational Model • This simplicity characterize the relational model • It allows us to think on data manipulation as operation that have: – As input tuples, and – Return tuples • Aggregate orientation takes a different approach. 11
  • 12. Aggregate Model • It recognizes that, you want to operate on data unit having a more complex structure than a set of tuples. • We can think on term of complex record that allows: – List, – Map, – And other data structures to be nested inside it • Key-Value, document, and column-family databases uses this complex structure. 12
  • 13. Aggregate Model • Aggregate is a term coming from Domain-Driven Design [Evans03] – An aggregate is a collection of related objects that we wish to treat as a unit for data manipulation, management a consistency. • We like to update aggregates with atomic operation • We like to communicate with our data storage in terms of aggregates 13http://pbdmng.datatoknowledge.it/readingMaterial/Evans03.pdf
  • 14. Aggregate Models • This definition matches really with how key-value, document, and column-family databases works. • With aggregates we can easier work on a cluster, since they are unit for replication and sharding. • Aggregates are also easier for application programmer to work since solve the impedance mismatch problem of relational databases. 14
  • 15. Example of Relational Model • Assume we are building an e-commerce website; • We have to store information about: users, products, orders, shipping addresses, billing addresses, and payment data. 15
  • 16. Example of Relational Model • As we are good relational soldier: – Everything is normalized – No data is repeated in multiple tables. – We have referential integrity 16
  • 17. Example of Aggregate Model • We have two aggregates: – Customers and – Orders • We use the black- diamond composition to show how data fits into the aggregate structure 17
  • 18. Example of Aggregate Model • The customer contains a list of billing addresses; • The order contains a list of: – order items, – a shipping address, – and payments • The payment itself contains a billing address for that payment 18
  • 19. Example of Aggregate Model • A single address appears 3 times, but instead of using an id it is copied each time • This fits a domain where we don’t want shipping, payment and billing address to change • What is the difference w.r.t a relational representation? 19
  • 20. Example of Aggregate Model • The link between customer and the order is a relationship between aggregates 20 //Customer { "id": 1, "name": "Fabio", "billingAddress": [ { "city": "Bari" } ] } //Orders { "id": 99, "customerId": 1, "orderItems": [ { "productId": 27, "price": 34, "productName": "Scala in Action” } ], "shippingAddress": [ {"city": "Bari”} ], "orderPayment": [ { "ccinfo": "100-432423-545-134", "txnId": "afdfsdfsd", "billingAddress": [ {"city": "Bari” }] } ] }
  • 21. Orders Details •There is the customer id •The product name is a part of the ordered Items •The product id is part of the ordered items •The address is stored several times 21 //Orders { "id": 99, "customerId": 1, "orderItems": [ { "productId": 27, "price": 34, "productName": "Scala in Action” } ], "shippingAddress": [ {"city": "Bari”} ], "orderPayment": [ { "ccinfo": "100-432423-545-134", "txnId": "afdfsdfsd", "billingAddress": [ {"city": "Bari” }] } ] }
  • 22. Example: Rationale • We aggregate to minimize the number of aggregates we access during data interaction • The important think to notice is that, 1. We have to think about accessing that data 2. We make this part of our thinking when developing the application data model • We could draw our aggregate differently, but it really depends on the “data accessing models” 22
  • 23. Example: Rationale • Like most thinks in modeling, there is no an universal answer on how we draw aggregate boundaries. • It depends on how we have to manipulate our data 1. If we tend to access a customer with all its orders at once, then we should prefer a single aggregate. 2. If we tend to access a single order at time, then we should prefer having separate aggregates for each order. 23
  • 25. No Distributable Storage • Relational mapping can captures data elements and their relationship well. • It does not need any notion of aggregate entity, because it uses foreign key relationship. • But we cannot distinguish for a relationship that represent aggregations from those that don’t. • As result we cannot take advantage of that knowledge to store and distribute our data. 25
  • 26. Marking Aggregate Tools • Many data modeling techniques provides way to mark aggregate structures in relational models • However, they do not provide semantic that helps in distinguish relationships • When working with aggregate-oriented databases, we have a clear view of the semantic of the data. • We can focus on the unit of interaction with the data storage. 26
  • 27. Aggregate Ignorant • Relational database are aggregate-ignorant, since they don’t have concept of aggregate • Also graph database are aggregate-ignorant. • This is not always bad. • In domains where it is difficult to draw aggregate boundaries aggregate-ignorant databases are useful. 27
  • 28. Aggregate and Operations • An order is a good aggregate when: – A customer is making and reviewing an order, and – When the retailer is processing orders • However, when the retailer want to analyze its product sales over the last months, then aggregate are trouble. • We need to analyze each aggregate to extract sales history. 28
  • 29. Aggregate and Operations • Aggregate may help in some operation and not in others. • In cases where there is not a clear view aggregate- ignorant database are the best option. • But, remember the point that drove us to aggregate models (cluster distribution). • Running databases on a cluster is need when dealing with huge quantities of data. 29
  • 30. Running on a Cluster • It gives several advantages on computation power and data distribution • However, it requires to minimize the number of nodes to query when gathering data • By explicitly including aggregates, we give the database an important view of which information should be stored together • But, still we have the problem on querying historical data, do we have any solution? 30
  • 32. ACID transactions • Relational database allow us to manipulate any combination of rows from any table in a single transaction. • ACID transactions: – Atomic, – Consistent, – Isolated, and – Durable have the main point in Atomicity. 32
  • 33. Atomicity & RDBMS • Many rows spanning many tables are updated into an Atomic operation • It may succeeded or failed entirely • Concurrently operations are isolated and we cannot see partial updates • However relational database still fail. 33
  • 34. Atomicity & NoSQL • NoSQL don’t support Atomicity that spans multiple aggregates. • This means that if we need to update multiple aggregates we have to manage that in the application code. • Thus the Atomicity is one of the consideration for deciding how to divide up our data into aggregates 34
  • 35. Aggregates Models on NoSQL Key-value and Document 35
  • 36. Key-Value and Document • Key-value and Document databases are strongly aggregate-oriented. • Both of these types of databases consists of lot of aggregates with a key used to get the data. • The two type of databases differ in that: – In a key-value stores the aggregate is opaque (Blob) – In a document database we can see a structure in the aggregate. 36
  • 37. Key-Value and Document • The advantage of key-value is that we can store any type of object • The database may impose some size limit, but we have freedom • A document store imposes limits on what we can place in it, defining a structure on the data. – In return we a language to query documents. 37
  • 38. Key-Value and Document • With a key-value we can only access by its key • With document: – We can submit queries based on fields, – We can retrieve part of the aggregate, and – The database can create index based on the fields of the aggregate. • But in practice they are used differently 38
  • 39. Key-Value and Document • People use document as key-value • Riak (key-value) allows you to add metadata to aggregates for indexing • Redis allows you to break aggregates into lists, sets or maps. • You can support queries by integrating search tools like Solr. (Riak include solr for searching data stored as XML or JSON). 39
  • 40. Key-Value and Document Despite this the general distinction still holds. •With key-value databases we expect aggregates using a key •With document databases, we mostly expect to submit some form of query on the internal structure of the documents. 40
  • 41. Aggregates Models on NoSQL Column-Family Stores 41
  • 42. Column-Family Stores • One of the most influential NoSQL databases was Google’s BigTable [Chang et al.] • Its name derives from its structure composed by sparse columns and no schema. • We don’t have to think to this structure as a table, but to a two-level map. • BigTable models influenced the design the open source HBase and Cassandra. 42
  • 43. Column-Family Stores • These BigTable-style data model are referred to as column stores. • Pre-NoSQL column stores like C-Store used SQL and the relational model. • What make NoSQL columns store different is how physically they store data. • Most databases has rows as unit of storage, which helps in writing performances 43
  • 44. Column-Family Stores • However, there are many scenarios where: – Write are rares, but – You need to read a few columns of many rows at once • In this situations, it’s better to store groups of columns for all rows as the basic storage unit. • These kind of databases are called column stores or column-family databases 44
  • 45. Column-Family Stores • Column-family databases have a two-level aggregate structure. • Similarly to key-value the first key is the row identifier. • The difference is that retrieving a key return a Map of more detailed values. • These second-level values are defined to as columns. • Fixing a row we can access to all the column-families or to a particular element. 45
  • 46. Example of Column Model 46
  • 47. Column-Family Stores • They organize their columns into families. • Each column is a part of a family, and column family acts as unit of access. • Then the data for a particular column family are accessed together. 47
  • 48. Column-Family Stores: How to structure data • In row-oriented: – each row is an aggregate (For example the customer with id 456), – with column families representing useful chunks of data (profile, order history) within that aggregate • In column-oriented: – each column family defines a record type (e.g. customer profiles) with rows for each of the records. – You can this of a row as the join of records in all column- families 48
  • 49. Column Family: Storage Insights • Since the database knows about column grouping, it uses this information for storage and access behavior. • If we consider Document store, also if a document as an array of elements the document is a single unit of storage. • Column families give a two-level dimension of stored data. 49
  • 50. Modeling Strategies on Column DB • We can model list based elements as a column-family. • Thus we can have column-families with many elements and other with few element. • This may let to high fragmentation on data stored. • In Cassandra this problem is faced by defining: Skinny and Wide rows. 50
  • 51. Modeling Strategies on Column DB • Another characteristic is that the element in column-families are sorted by their keys. • For the orders, this would be useful if we made a key out of a concatenation of date and id. 51
  • 53. Key Points • All share the notion of an aggregated indexed by a key. • The key can be used for lookup • The aggregate is central to running on a cluster. • The aggregates acts as the atomic unit for updates providing transaction on aggregates. • Key-value treats the values as Blob 53
  • 54. Key Points • The document model makes the aggregate transparent for queries, but the document is treated as single unit of storage. • Column-family models divide aggregates into column families, allowing the database to treat them as units of data in the aggregates. • This imposes some structure but allows the database to take advantage of the structure to improve its accessibility. 54

Editor's Notes

  1. Relations and Tuples