SlideShare a Scribd company logo
1 of 31
Big Data Storage Concepts
Big Data concepts Technology and Architecture
Raghad Joukhadar
2023-2024
• Introduction
• Cluster computing
• Types of cluster
• Cluster Structure
• Distribution Models
• Sharding
• Data Replication
• Sharding and Replication
• Distributed File System
• Relational and Non-Relational Databases
• RDBMS Databases
• NoSQL Databases
• NewSQL Databases
• Scaling Up and Scaling Out Storage
Plan
Introduction
– Example : Hadoop
• open-source
• allows organizations to effectively
store and analyze large volumes of
data.
• The big data revolution provides significant improvements to the
data storage architecture.
• Need for framework for storing data on clusters of commodity
hardware
Cluster Computing
• A group of loosely coupled
computers that work together
closely, so it can be viewed as a
“single larger and more
powerful virtual computer”.
• The cluster components are
connected together through
local area networks (LANs).
Overview of Cluster computing
• The login node acts as the
gateway into the cluster.
• When the cluster has to be
accessed by the users from a
public network, the user has to
login to the login node.
• This is to prevent unauthorized
access by the users.
Cluster Benefits
• Scalability,
– by removing nodes or adding additional nodes as per the
demand without hindering the system
• Availability,
– As nodes within the cluster provide backup to each other in the
event of a failure
• Performance,
– Multiple computing resources are connected together in a
cluster increasing the performance
TYPES OF CLUSTER (purpose)
• High Availability Clusters
– Nodes in a highly available cluster must have access to a
shared storage
– If a node becomes inoperative, continuous service is
provided by failing over service from the inoperative cluster
node to another, without administrative intervention
TYPES OF CLUSTER cont..
• Load Balancing Cluster
– Distributes incoming requests among multiple nodes running the
same programs or having the same content
– If a node in a load-balancing cluster goes down, the load from that
node is switched over to another node
– Optimize the use of resources, minimize response time
TYPES OF CLUSTER (Structure)
• Symmetric
– Each node functions as an
individual computer capable
of running applications.
– Additional machines can be
added as needed.
Cluster Structure
• Asymmetric
– Are a type of cluster
structure in which one
machine acts as the head
node
– it serves as the gateway
between the user and the
remaining nodes.
Distribution Models
• There are several distribution models
– Replication: placing the same set of data over multiple nodes.
– Sharding: placing different sets of data on different nodes
– Sharding & Replication :can either be used alone or together
Replication
• Replication is the process of creating copies of the same set
of data across multiple servers.
• The copy of a block is called replica.
• To overcome issues like:
– when a node crashes, the data stored in that node will be lost
– when a node is down for maintenance, it will not be available until
the maintenance process is over.
Data Replication Example
•
Replication Advantages
• Replication makes the system fault tolerant since the data is
not lost when an individual node fails as the data is
redundant across the nodes.
• Replication increases the data availability as the same copy
of data is available across multiple nodes.
Replication Models
Master-slave
• Master controls one or more
devices known as slaves
• The flow of control is only
from master to the slaves
• Incoming data are written on
the master node
• Read requests are handled by
slave nodes
• This architecture supports
intensive read requests
• The cluster still suffers from single
point of failure, if the master fails
• The writes are limited to the
maximum capacity that a master
can handle
Replication Models
• All the nodes have the same
responsibility and are at the
same level
• Either of the devices involved
in the process can initiate
communication
• The nodes consume as well
as donate the resources
• Reliability is improved through
replication
Peer-Peer
Sharding
• Partitioning very large data sets into smaller and easily
manageable chunks called shards.
• The shards are stored by distributing them across multiple
machines called nodes.
• No two shards of the same file are stored in the same node
• Shards spread across multiple nodes collectively constitute the
data set.
Sharding Examples
Sharding Advantages
• Scalability where new shards can be added at runtime
without shutting down the application for maintenance
• Improves the fault tolerance of the system as the failure of a
node affects only the block of the data stored in that
particular node.
Sharding & Replication
• In sharding when a node goes down, the data stored in the
node will be lost.
• So it provides only a limited fault tolerance to the system.
• Sharding and replication can be combined to make the system
fault tolerant and highly available.
Sharding & Replication Example
•
Distributed File System (DFS)
• A file system is a way of storing and organizing the data on storage devices
(HD, DVDs, ...) and to keep track of the files stored on them.
• The file is the smallest unit of storage defined by the file system to pile data.
• File systems store and retrieve data for the application to run effectively and
efficiently on the operating systems.
• A distributed file system stores the files across cluster nodes and allows the
clients to access the files from the cluster.
• Files are distributed across the nodes, but logically it appears to as if they are
residing on the clients local machine.
• Since a DFS provides access to more than one client simultaneously, the
server organizes updates for the clients to access the current updated
version of the file, and no version conflicts arise.
• Big data widely adopts a distributed file system known as Hadoop Distributed
File System (HDFS)
DFS Key concepts
• Data replication where the copies of data are distributed on
multiple cluster nodes so that there is no single point of failure,
which increases the reliability.
• The client can communicate with any of the closest available
nodes to reduce latency and network traffic
• Fault tolerance is achieved through data replication as the data
will not be lost in case of node failure due to the redundancy in
the data across nodes.
Relational and Non-Relational Databases
Relational and Non-Relational Databases
Relational Databases
• Organize data into tables of rows
(records) & columns
(attributes|fields)
• Unsuitable when organizations
collect vast amount of customer
databases, transactions, and other
data, which may not be structured to
fit into relational databases.
Non-Relational
• This has led to the evolution of non-
relational databases, which are
schema-less.
• NoSQL is a non-relational database
Properties of RDBMS Databases
• Is vertically scalable (by increasing server hardware power)
• Exhibits ACID (atomicity, consistency, isolation,durability) properties
• Support data that adhere to a specific schema
• Can no longer keep pace with the volume, velocity, and variety of data being
generated and consumed
Properties of NoSQL Databases
• Includes all non-relational databases
• Exhibits the BASE (basically available, soft state, eventually consistent) model
• Are not appropriate for implementing large transactions
Properties of NewSQL Databases
• Aim to combine the scalability and performance benefits of NoSQL
databases with the familiar relational data model and ACID transaction
guarantees of traditional SQL databases
• Horizontally scalable
• Fault tolerant
• Support relational data model with three layers: the administrative,
transactional, and storage layer.
• The applications : those that execute the same queries repeatedly with
different inputs and have a large number of transactions
high
performance
fault tolerant distributed in-memory scale-out
Clustrix yes yes yes - -
NuoDB - yes yes - yes
VoltDB yes yes yes yes yes
MemSQL yes yes yes yes -
NewSQL Databases comparison
Scaling up vs. Scaling out
Scaling out
(Horizontal)
Scaling up
(Vertical)
THANK YOU
ANY
QUESTIONS?

More Related Content

Similar to Big Data Storage Concepts from the "Big Data concepts Technology and Architecture" book.pptx

Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsrishavkumar1402
 
System design fundamentals CAP.pdf
System design fundamentals CAP.pdfSystem design fundamentals CAP.pdf
System design fundamentals CAP.pdfUsmanAhmed269749
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL AzureShy Engelberg
 
VTU 6th Sem Elective CSE - Module 4 cloud computing
VTU 6th Sem Elective CSE - Module 4  cloud computingVTU 6th Sem Elective CSE - Module 4  cloud computing
VTU 6th Sem Elective CSE - Module 4 cloud computingSachin Gowda
 
module4-cloudcomputing-180131071200.pdf
module4-cloudcomputing-180131071200.pdfmodule4-cloudcomputing-180131071200.pdf
module4-cloudcomputing-180131071200.pdfSumanthReddy540432
 
Introduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databasesIntroduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databasesShilpaKrishna6
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
An Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.pptAn Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.pptHarshalUbale2
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedAnant Kumar
 
lecture-13.pptx
lecture-13.pptxlecture-13.pptx
lecture-13.pptxlaiba29012
 
UNIT 5- Other Databases.pdf
UNIT 5- Other Databases.pdfUNIT 5- Other Databases.pdf
UNIT 5- Other Databases.pdfShitalGhotekar
 

Similar to Big Data Storage Concepts from the "Big Data concepts Technology and Architecture" book.pptx (20)

Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 
System design fundamentals CAP.pdf
System design fundamentals CAP.pdfSystem design fundamentals CAP.pdf
System design fundamentals CAP.pdf
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
 
Big data stores
Big data  storesBig data  stores
Big data stores
 
NoSQL and Couchbase
NoSQL and CouchbaseNoSQL and Couchbase
NoSQL and Couchbase
 
VTU 6th Sem Elective CSE - Module 4 cloud computing
VTU 6th Sem Elective CSE - Module 4  cloud computingVTU 6th Sem Elective CSE - Module 4  cloud computing
VTU 6th Sem Elective CSE - Module 4 cloud computing
 
module4-cloudcomputing-180131071200.pdf
module4-cloudcomputing-180131071200.pdfmodule4-cloudcomputing-180131071200.pdf
module4-cloudcomputing-180131071200.pdf
 
NoSql
NoSqlNoSql
NoSql
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Introduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databasesIntroduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databases
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
An Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.pptAn Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.ppt
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme based
 
lecture-13.pptx
lecture-13.pptxlecture-13.pptx
lecture-13.pptx
 
dbms introduction.pptx
dbms introduction.pptxdbms introduction.pptx
dbms introduction.pptx
 
UNIT 5- Other Databases.pdf
UNIT 5- Other Databases.pdfUNIT 5- Other Databases.pdf
UNIT 5- Other Databases.pdf
 
Chapter 20
Chapter 20Chapter 20
Chapter 20
 

Recently uploaded

办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一3sw2qly1
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Personfurqan222004
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 

Recently uploaded (20)

办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Person
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 

Big Data Storage Concepts from the "Big Data concepts Technology and Architecture" book.pptx

  • 1. Big Data Storage Concepts Big Data concepts Technology and Architecture Raghad Joukhadar 2023-2024
  • 2. • Introduction • Cluster computing • Types of cluster • Cluster Structure • Distribution Models • Sharding • Data Replication • Sharding and Replication • Distributed File System • Relational and Non-Relational Databases • RDBMS Databases • NoSQL Databases • NewSQL Databases • Scaling Up and Scaling Out Storage Plan
  • 3. Introduction – Example : Hadoop • open-source • allows organizations to effectively store and analyze large volumes of data. • The big data revolution provides significant improvements to the data storage architecture. • Need for framework for storing data on clusters of commodity hardware
  • 4. Cluster Computing • A group of loosely coupled computers that work together closely, so it can be viewed as a “single larger and more powerful virtual computer”. • The cluster components are connected together through local area networks (LANs).
  • 5. Overview of Cluster computing • The login node acts as the gateway into the cluster. • When the cluster has to be accessed by the users from a public network, the user has to login to the login node. • This is to prevent unauthorized access by the users.
  • 6. Cluster Benefits • Scalability, – by removing nodes or adding additional nodes as per the demand without hindering the system • Availability, – As nodes within the cluster provide backup to each other in the event of a failure • Performance, – Multiple computing resources are connected together in a cluster increasing the performance
  • 7. TYPES OF CLUSTER (purpose) • High Availability Clusters – Nodes in a highly available cluster must have access to a shared storage – If a node becomes inoperative, continuous service is provided by failing over service from the inoperative cluster node to another, without administrative intervention
  • 8. TYPES OF CLUSTER cont.. • Load Balancing Cluster – Distributes incoming requests among multiple nodes running the same programs or having the same content – If a node in a load-balancing cluster goes down, the load from that node is switched over to another node – Optimize the use of resources, minimize response time
  • 9. TYPES OF CLUSTER (Structure) • Symmetric – Each node functions as an individual computer capable of running applications. – Additional machines can be added as needed.
  • 10. Cluster Structure • Asymmetric – Are a type of cluster structure in which one machine acts as the head node – it serves as the gateway between the user and the remaining nodes.
  • 11. Distribution Models • There are several distribution models – Replication: placing the same set of data over multiple nodes. – Sharding: placing different sets of data on different nodes – Sharding & Replication :can either be used alone or together
  • 12. Replication • Replication is the process of creating copies of the same set of data across multiple servers. • The copy of a block is called replica. • To overcome issues like: – when a node crashes, the data stored in that node will be lost – when a node is down for maintenance, it will not be available until the maintenance process is over.
  • 14. Replication Advantages • Replication makes the system fault tolerant since the data is not lost when an individual node fails as the data is redundant across the nodes. • Replication increases the data availability as the same copy of data is available across multiple nodes.
  • 15. Replication Models Master-slave • Master controls one or more devices known as slaves • The flow of control is only from master to the slaves • Incoming data are written on the master node • Read requests are handled by slave nodes • This architecture supports intensive read requests • The cluster still suffers from single point of failure, if the master fails • The writes are limited to the maximum capacity that a master can handle
  • 16. Replication Models • All the nodes have the same responsibility and are at the same level • Either of the devices involved in the process can initiate communication • The nodes consume as well as donate the resources • Reliability is improved through replication Peer-Peer
  • 17. Sharding • Partitioning very large data sets into smaller and easily manageable chunks called shards. • The shards are stored by distributing them across multiple machines called nodes. • No two shards of the same file are stored in the same node • Shards spread across multiple nodes collectively constitute the data set.
  • 19. Sharding Advantages • Scalability where new shards can be added at runtime without shutting down the application for maintenance • Improves the fault tolerance of the system as the failure of a node affects only the block of the data stored in that particular node.
  • 20. Sharding & Replication • In sharding when a node goes down, the data stored in the node will be lost. • So it provides only a limited fault tolerance to the system. • Sharding and replication can be combined to make the system fault tolerant and highly available.
  • 21. Sharding & Replication Example •
  • 22. Distributed File System (DFS) • A file system is a way of storing and organizing the data on storage devices (HD, DVDs, ...) and to keep track of the files stored on them. • The file is the smallest unit of storage defined by the file system to pile data. • File systems store and retrieve data for the application to run effectively and efficiently on the operating systems. • A distributed file system stores the files across cluster nodes and allows the clients to access the files from the cluster. • Files are distributed across the nodes, but logically it appears to as if they are residing on the clients local machine. • Since a DFS provides access to more than one client simultaneously, the server organizes updates for the clients to access the current updated version of the file, and no version conflicts arise. • Big data widely adopts a distributed file system known as Hadoop Distributed File System (HDFS)
  • 23. DFS Key concepts • Data replication where the copies of data are distributed on multiple cluster nodes so that there is no single point of failure, which increases the reliability. • The client can communicate with any of the closest available nodes to reduce latency and network traffic • Fault tolerance is achieved through data replication as the data will not be lost in case of node failure due to the redundancy in the data across nodes.
  • 25. Relational and Non-Relational Databases Relational Databases • Organize data into tables of rows (records) & columns (attributes|fields) • Unsuitable when organizations collect vast amount of customer databases, transactions, and other data, which may not be structured to fit into relational databases. Non-Relational • This has led to the evolution of non- relational databases, which are schema-less. • NoSQL is a non-relational database
  • 26. Properties of RDBMS Databases • Is vertically scalable (by increasing server hardware power) • Exhibits ACID (atomicity, consistency, isolation,durability) properties • Support data that adhere to a specific schema • Can no longer keep pace with the volume, velocity, and variety of data being generated and consumed
  • 27. Properties of NoSQL Databases • Includes all non-relational databases • Exhibits the BASE (basically available, soft state, eventually consistent) model • Are not appropriate for implementing large transactions
  • 28. Properties of NewSQL Databases • Aim to combine the scalability and performance benefits of NoSQL databases with the familiar relational data model and ACID transaction guarantees of traditional SQL databases • Horizontally scalable • Fault tolerant • Support relational data model with three layers: the administrative, transactional, and storage layer. • The applications : those that execute the same queries repeatedly with different inputs and have a large number of transactions
  • 29. high performance fault tolerant distributed in-memory scale-out Clustrix yes yes yes - - NuoDB - yes yes - yes VoltDB yes yes yes yes yes MemSQL yes yes yes yes - NewSQL Databases comparison
  • 30. Scaling up vs. Scaling out Scaling out (Horizontal) Scaling up (Vertical)