CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx

A
Asst.Prof. M.GokilavaniProfessor at KL university
CCS334 BIG DATAANALYTICS
(R-21 III (I Sem))
Department of Artificial Intelligence and Data Science )
Session 3
by
Asst.Prof.M.Gokilavani
NIET
9/19/2023 Department of AI & DS 1
TEXT BOOKS
• Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data,
Big Analytics: Emerging Business Intelligence and Analytic Trends for
Today's Businesses", Wiley, 2013.
• Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
• Sadalage, Pramod J. “NoSQL distilled”, 2013.
REFERENCES
• E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive",
O'Reilley, 2012.
• Lars George, "HBase: The Definitive Guide", O'Reilley, 2011.
• Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010.
9/19/2023 Department of AI & DS 2
Topics covered in Unit 2 session
9/19/2023 Department of AI & DS 3
UNIT II NOSQL DATA MANAGEMENT
Introduction to NoSQL – aggregate data models – key-value and
document data models – relationships – graph databases – schema
less databases – materialized views – distribution models – master-
slave replication – consistency - Cassandra – Cassandra data model
– Cassandra examples – Cassandra clients.
Distribution Models
• Depending on the distribution model the data store can give us the ability:
• To handle large Quantity of data
• To process a grater read or write traffic
• To have more availability in the case of network slowdowns of
breakages.
• There are two path for distribution:
• Sharding: Sharding distributes different data across multiple
servers, so each server acts as the single source for a subset of
data.
• Replication: Replication copies data across multiple servers,
so each bit of data can be found in multiple places.
9/19/2023 Department of AI & DS 4
Distributed Models
• Single server
• Sharding
• Master Slave Replication
• Peer-to-Peer Replication
• Combining Sharding & Replication
9/19/2023 Department of AI & DS 5
Single Server
• It is the first and simplest distribution option.
• Also if NoSQL database are designed to run on a cluster they can be
used in a single server application.
• This make sense if a NoSQL database is more suited for the
application data model.
• Graph database are the more obvious.
• If data usage is most about processing aggregates, than a key or a
document store may be useful.
9/19/2023 Department of AI & DS 6
Sharding
• Often, a data store is busy because different people are accessing
different part of the dataset.
• In this cases we can support horizontal scalability by putting different
part of the data onto different servers (Sharding)
• The concept of sharding is not new as a part of application logic.
• It consists in put all the customer with surname A-D on one shard and
E-G to another
• This complicates the programming model as the application code
needs to distributed the load across the shards.
• In the ideal setting we have each user to talk one server and the load is
balanced.
• Of course the ideal case is rare.
9/19/2023 Department of AI & DS 7
9/19/2023 Department of AI & DS 8
Sharding and NoSQL
• In general, many NoSQL databases offers auto- sharding.
• This can make much easier to use sharding in an application.
• Sharding is especially valuable for performance because it improves
read and write performances.
• It scales read and writes on the different nodes of the same cluster.
9/19/2023 Department of AI & DS 9
Master – Slave Replication
• In this setting one node is designated as the master, or primary ad the
other as slaves.
• The master is the authoritative source for the date and designed to
process updates and send them to slaves.
• The slaves are used for read operations.
• This allows us to scale in data intensive dataset.
9/19/2023 Department of AI & DS 10
9/19/2023 Department of AI & DS 11
Master – Slave Replication
• We can scale horizontally by adding more slaves.
• But, we are limited by the ability of the master in processing incoming
data.
An advantage is read resilience.
Disadvantage:
• Also if the master fails the slaves can still handle read requests.
• Anyway writes are not allowed until the master is not restored.
9/19/2023 Department of AI & DS 12
Master – Slave Replication
• Another characteristic is that a slave can be appointed as master.
• Masters can be appointed manually or automatically.
• In order to achieve resilience we need that read and write paths are
different.
• This is normally done using separate database connections.
• Disadvantage: Replication in master-slave have the analyzed
advantages but it come with the problem of inconsistency.
• The readers reading from the slaves can read data not updated.
9/19/2023 Department of AI & DS 13
Master – Slave Replication in MongoDB
9/19/2023 Department of AI & DS 14
Peer to Peer Replication
• Master-Slave replication helps with read scalability but has problems
on scalability of writes.
• Moreover, it provides resilience on read but not on writes.
• The master is still a single point of failure.
• Peer-to-Peer attacks these problems by not having a master.
• All the replica are equal (accept writes and reads).
• With a Peer-to-Peer we can have node failures without lose write
capability and losing data.
9/19/2023 Department of AI & DS 15
9/19/2023 Department of AI & DS 16
Combine Sharing
• We have multiple masters, but each data has a single master.
• Depending on the configuration we can decide the master for each
group of data.
• Peer-to-Peer and sharding is a common strategy for column-family
databases.
• This is commonly composed using replication of the shards.
9/19/2023 Department of AI & DS 17
Topics to be covered in next session 4
• Cassandra data model
9/19/2023 Department of AI & DS 18
Thank you!!!
1 of 18

Recommended

Lecture6 introduction to data streams by
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
3.7K views19 slides
Data Streaming For Big Data by
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big DataSeval Çapraz
747 views32 slides
Big Data & Hadoop Tutorial by
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
90.2K views54 slides
Hadoop File system (HDFS) by
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
8.4K views54 slides
Introduction to NoSQL Databases by
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
47.7K views36 slides
Hadoop Security Architecture by
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
30.2K views30 slides

More Related Content

What's hot

MIT Deep Learning Basics: Introduction and Overview by Lex Fridman by
MIT Deep Learning Basics: Introduction and Overview by Lex FridmanMIT Deep Learning Basics: Introduction and Overview by Lex Fridman
MIT Deep Learning Basics: Introduction and Overview by Lex FridmanPeerasak C.
2.1K views69 slides
Data Engineering Basics by
Data Engineering BasicsData Engineering Basics
Data Engineering BasicsCatherine Kimani
5.7K views25 slides
Big data ppt by
Big data pptBig data ppt
Big data pptThirunavukkarasu Ps
75K views33 slides
Big data and Hadoop by
Big data and HadoopBig data and Hadoop
Big data and HadoopRahul Agarwal
82.3K views18 slides
Hadoop introduction , Why and What is Hadoop ? by
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
71.4K views34 slides
Introduction to NoSQL by
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLPolarSeven Pty Ltd
7K views15 slides

What's hot(20)

MIT Deep Learning Basics: Introduction and Overview by Lex Fridman by Peerasak C.
MIT Deep Learning Basics: Introduction and Overview by Lex FridmanMIT Deep Learning Basics: Introduction and Overview by Lex Fridman
MIT Deep Learning Basics: Introduction and Overview by Lex Fridman
Peerasak C.2.1K views
Big data and Hadoop by Rahul Agarwal
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal82.3K views
Hadoop introduction , Why and What is Hadoop ? by sudhakara st
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st71.4K views
Introduction to Map Reduce by Apache Apex
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
Apache Apex7.7K views
Big data lecture notes by Mohit Saini
Big data lecture notesBig data lecture notes
Big data lecture notes
Mohit Saini32.1K views
Hadoop Overview & Architecture by EMC
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC64.6K views
5.1 mining data streams by Krish_ver2
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
Krish_ver214.4K views
Introduction to Hadoop by Apache Apex
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex14.4K views
Introduction to Hadoop and Hadoop component by rebeccatho
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho2.7K views
Nosql-Module 1 PPT.pptx by Radhika R
Nosql-Module 1 PPT.pptxNosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptx
Radhika R352 views
Market oriented Cloud Computing by Jithin Parakka
Market oriented Cloud ComputingMarket oriented Cloud Computing
Market oriented Cloud Computing
Jithin Parakka19.5K views
Introducing Technologies for Handling Big Data by Jaseela by Student
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by Jaseela
Student1.5K views
Machine Learning Project by Abhishek Singh
Machine Learning ProjectMachine Learning Project
Machine Learning Project
Abhishek Singh17.5K views
5 Data Modeling for NoSQL 1/2 by Fabio Fumarola
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
Fabio Fumarola31K views
Free Training: How to Build a Lakehouse by Databricks
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Databricks3.3K views

Similar to CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx

Session 1 Introduction to NoSQL.pptx by
 Session 1 Introduction to NoSQL.pptx Session 1 Introduction to NoSQL.pptx
Session 1 Introduction to NoSQL.pptxAsst.Prof. M.Gokilavani
13 views18 slides
Report 2.0.docx by
Report 2.0.docxReport 2.0.docx
Report 2.0.docxpinstechwork
7 views6 slides
Introduction to NoSQL and MongoDB by
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
158 views26 slides
NoSQLDatabases by
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
395 views39 slides
No sql database by
No sql databaseNo sql database
No sql databasevishal gupta
364 views25 slides
6 Data Modeling for NoSQL 2/2 by
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2Fabio Fumarola
12.1K views45 slides

Similar to CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx(20)

Introduction to NoSQL and MongoDB by Ahmed Farag
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
Ahmed Farag158 views
NoSQLDatabases by Adi Challa
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa395 views
6 Data Modeling for NoSQL 2/2 by Fabio Fumarola
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
Fabio Fumarola12.1K views
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt... by PascalDesmarets1
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
PascalDesmarets159 views
A survey on data mining and analysis in hadoop and mongo db by Alexander Decker
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
Alexander Decker307 views
A survey on data mining and analysis in hadoop and mongo db by Alexander Decker
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
Alexander Decker542 views
Cloud Computing: The Hard Problems Never Go Away by ZendCon
Cloud Computing: The Hard Problems Never Go AwayCloud Computing: The Hard Problems Never Go Away
Cloud Computing: The Hard Problems Never Go Away
ZendCon2.4K views
Relational and non relational database 7 by abdulrahmanhelan
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
abdulrahmanhelan230 views
Objectivity/DB: A Multipurpose NoSQL Database by InfiniteGraph
Objectivity/DB: A Multipurpose NoSQL DatabaseObjectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL Database
InfiniteGraph5.1K views
NoSQL Seminer by Partha Das
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
Partha Das760 views

More from Asst.Prof. M.Gokilavani

AI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdf by
AI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdfAI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdf
AI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdfAsst.Prof. M.Gokilavani
8 views16 slides
AI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdf by
AI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdfAI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdf
AI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdfAsst.Prof. M.Gokilavani
13 views34 slides
AI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdf by
AI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdfAI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdf
AI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdfAsst.Prof. M.Gokilavani
12 views33 slides
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ... by
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...Asst.Prof. M.Gokilavani
8 views17 slides
AI3391 Session 12 Local search in continious space.pptx by
AI3391 Session 12 Local search in continious space.pptxAI3391 Session 12 Local search in continious space.pptx
AI3391 Session 12 Local search in continious space.pptxAsst.Prof. M.Gokilavani
5 views16 slides
AI3391 Session 11 Hill climbing algorithm.pptx by
AI3391 Session 11 Hill climbing algorithm.pptxAI3391 Session 11 Hill climbing algorithm.pptx
AI3391 Session 11 Hill climbing algorithm.pptxAsst.Prof. M.Gokilavani
3 views22 slides

More from Asst.Prof. M.Gokilavani(20)

AI3391 Session 13 searching with Non-Deterministic Actions and partial observ... by Asst.Prof. M.Gokilavani
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect... by Asst.Prof. M.Gokilavani
AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect...AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect...
AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect...
AI3391 ARTIFICIAL INTELLIGENCE Session 7 Uniformed search strategies.pptx by Asst.Prof. M.Gokilavani
AI3391 ARTIFICIAL INTELLIGENCE Session 7 Uniformed search strategies.pptxAI3391 ARTIFICIAL INTELLIGENCE Session 7 Uniformed search strategies.pptx
AI3391 ARTIFICIAL INTELLIGENCE Session 7 Uniformed search strategies.pptx
AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f... by Asst.Prof. M.Gokilavani
AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f...AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f...
AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f...

Recently uploaded

_MAKRIADI-FOTEINI_diploma thesis.pptx by
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptxfotinimakriadi
8 views32 slides
SNMPx by
SNMPxSNMPx
SNMPxAmatullahbutt
17 views12 slides
802.11 Computer Networks by
802.11 Computer Networks802.11 Computer Networks
802.11 Computer NetworksTusharChoudhary72015
10 views33 slides
GDSC Mikroskil Members Onboarding 2023.pdf by
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdfgdscmikroskil
51 views62 slides
fakenews_DBDA_Mar23.pptx by
fakenews_DBDA_Mar23.pptxfakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptxdeepmitra8
14 views34 slides
Design of machine elements-UNIT 3.pptx by
Design of machine elements-UNIT 3.pptxDesign of machine elements-UNIT 3.pptx
Design of machine elements-UNIT 3.pptxgopinathcreddy
32 views31 slides

Recently uploaded(20)

_MAKRIADI-FOTEINI_diploma thesis.pptx by fotinimakriadi
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptx
fotinimakriadi8 views
GDSC Mikroskil Members Onboarding 2023.pdf by gdscmikroskil
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdf
gdscmikroskil51 views
fakenews_DBDA_Mar23.pptx by deepmitra8
fakenews_DBDA_Mar23.pptxfakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptx
deepmitra814 views
Design of machine elements-UNIT 3.pptx by gopinathcreddy
Design of machine elements-UNIT 3.pptxDesign of machine elements-UNIT 3.pptx
Design of machine elements-UNIT 3.pptx
gopinathcreddy32 views
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) by Tsuyoshi Horigome
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
What is Whirling Hygrometer.pdf by IIT KHARAGPUR
What is Whirling Hygrometer.pdfWhat is Whirling Hygrometer.pdf
What is Whirling Hygrometer.pdf
IIT KHARAGPUR 12 views
Machine Element II Course outline.pdf by odatadese1
Machine Element II Course outline.pdfMachine Element II Course outline.pdf
Machine Element II Course outline.pdf
odatadese19 views
MSA Website Slideshow (16).pdf by msaucla
MSA Website Slideshow (16).pdfMSA Website Slideshow (16).pdf
MSA Website Slideshow (16).pdf
msaucla68 views
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx by lwang78
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
lwang7853 views
SUMIT SQL PROJECT SUPERSTORE 1.pptx by Sumit Jadhav
SUMIT SQL PROJECT SUPERSTORE 1.pptxSUMIT SQL PROJECT SUPERSTORE 1.pptx
SUMIT SQL PROJECT SUPERSTORE 1.pptx
Sumit Jadhav 13 views
Effect of deep chemical mixing columns on properties of surrounding soft clay... by AltinKaradagli
Effect of deep chemical mixing columns on properties of surrounding soft clay...Effect of deep chemical mixing columns on properties of surrounding soft clay...
Effect of deep chemical mixing columns on properties of surrounding soft clay...
AltinKaradagli6 views

CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx

  • 1. CCS334 BIG DATAANALYTICS (R-21 III (I Sem)) Department of Artificial Intelligence and Data Science ) Session 3 by Asst.Prof.M.Gokilavani NIET 9/19/2023 Department of AI & DS 1
  • 2. TEXT BOOKS • Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013. • Eric Sammer, "Hadoop Operations", O'Reilley, 2012. • Sadalage, Pramod J. “NoSQL distilled”, 2013. REFERENCES • E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilley, 2012. • Lars George, "HBase: The Definitive Guide", O'Reilley, 2011. • Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010. 9/19/2023 Department of AI & DS 2
  • 3. Topics covered in Unit 2 session 9/19/2023 Department of AI & DS 3 UNIT II NOSQL DATA MANAGEMENT Introduction to NoSQL – aggregate data models – key-value and document data models – relationships – graph databases – schema less databases – materialized views – distribution models – master- slave replication – consistency - Cassandra – Cassandra data model – Cassandra examples – Cassandra clients.
  • 4. Distribution Models • Depending on the distribution model the data store can give us the ability: • To handle large Quantity of data • To process a grater read or write traffic • To have more availability in the case of network slowdowns of breakages. • There are two path for distribution: • Sharding: Sharding distributes different data across multiple servers, so each server acts as the single source for a subset of data. • Replication: Replication copies data across multiple servers, so each bit of data can be found in multiple places. 9/19/2023 Department of AI & DS 4
  • 5. Distributed Models • Single server • Sharding • Master Slave Replication • Peer-to-Peer Replication • Combining Sharding & Replication 9/19/2023 Department of AI & DS 5
  • 6. Single Server • It is the first and simplest distribution option. • Also if NoSQL database are designed to run on a cluster they can be used in a single server application. • This make sense if a NoSQL database is more suited for the application data model. • Graph database are the more obvious. • If data usage is most about processing aggregates, than a key or a document store may be useful. 9/19/2023 Department of AI & DS 6
  • 7. Sharding • Often, a data store is busy because different people are accessing different part of the dataset. • In this cases we can support horizontal scalability by putting different part of the data onto different servers (Sharding) • The concept of sharding is not new as a part of application logic. • It consists in put all the customer with surname A-D on one shard and E-G to another • This complicates the programming model as the application code needs to distributed the load across the shards. • In the ideal setting we have each user to talk one server and the load is balanced. • Of course the ideal case is rare. 9/19/2023 Department of AI & DS 7
  • 9. Sharding and NoSQL • In general, many NoSQL databases offers auto- sharding. • This can make much easier to use sharding in an application. • Sharding is especially valuable for performance because it improves read and write performances. • It scales read and writes on the different nodes of the same cluster. 9/19/2023 Department of AI & DS 9
  • 10. Master – Slave Replication • In this setting one node is designated as the master, or primary ad the other as slaves. • The master is the authoritative source for the date and designed to process updates and send them to slaves. • The slaves are used for read operations. • This allows us to scale in data intensive dataset. 9/19/2023 Department of AI & DS 10
  • 12. Master – Slave Replication • We can scale horizontally by adding more slaves. • But, we are limited by the ability of the master in processing incoming data. An advantage is read resilience. Disadvantage: • Also if the master fails the slaves can still handle read requests. • Anyway writes are not allowed until the master is not restored. 9/19/2023 Department of AI & DS 12
  • 13. Master – Slave Replication • Another characteristic is that a slave can be appointed as master. • Masters can be appointed manually or automatically. • In order to achieve resilience we need that read and write paths are different. • This is normally done using separate database connections. • Disadvantage: Replication in master-slave have the analyzed advantages but it come with the problem of inconsistency. • The readers reading from the slaves can read data not updated. 9/19/2023 Department of AI & DS 13
  • 14. Master – Slave Replication in MongoDB 9/19/2023 Department of AI & DS 14
  • 15. Peer to Peer Replication • Master-Slave replication helps with read scalability but has problems on scalability of writes. • Moreover, it provides resilience on read but not on writes. • The master is still a single point of failure. • Peer-to-Peer attacks these problems by not having a master. • All the replica are equal (accept writes and reads). • With a Peer-to-Peer we can have node failures without lose write capability and losing data. 9/19/2023 Department of AI & DS 15
  • 17. Combine Sharing • We have multiple masters, but each data has a single master. • Depending on the configuration we can decide the master for each group of data. • Peer-to-Peer and sharding is a common strategy for column-family databases. • This is commonly composed using replication of the shards. 9/19/2023 Department of AI & DS 17
  • 18. Topics to be covered in next session 4 • Cassandra data model 9/19/2023 Department of AI & DS 18 Thank you!!!