SlideShare a Scribd company logo
Project Voldemort
A distributed database.
Presented To:
Sir Tariq Mehmood Presented By:
Fasiha Ikram
Aniqa Naeem
Voldemort
Voldemort is a distributed data store that is
designed as a key-value store used by
LinkedIn for high-scalability storage.
It is named after the fictional Harry Potter
villain Lord Voldemort.
there are job titles, job openings for people, Groups
and companies that offerings jobs.
Big
Data
Variety
velocity
volum
e
Need frequently read write
LinkedIn Big Data Problem
Voldemort Scale Both
• The amount of data we can store (write)
• The number of requests for that data (read)
Why Not Using Hadopp
Naturally the only way to do this is to spread both
the load and the data across many servers.
1. Need to find a way to split the data so that all
servers have different data
2. Need to find a way to handle server failures
without interrupting service
3. HBase still be write-heavy (due to horizontal
partitioning and use of SSTables, which are write
optimized)
Why voldemort
• Data is automatically replicated over multiple
servers.
• Data is automatically partitioned so each server
contains only a subset of the total data
• Provides tunable consistency (strict quorum or
eventual consistency)
• Server failure is handled transparently
• Pluggable Storage Engines -- BDB-JE, MySQL,
Read-Only
Why voldemort
• Pluggable serialization -- Protocol Buffers, Thrift,
Avro and Java Serialization
• Data items are versioned to maximize data
integrity in failure scenarios without
compromising availability of the system
• Each node is independent of other nodes with no
central point of failure or coordination
Why voldemort
• Good single node performance: you can expect 10-
20k operations per second depending on the
machines, the network, the disk system, and the
data replication factor
• Support for pluggable data placement strategies to
support things like distribution across data
centers that are geographically far apart.
Voldemort Storage Engines
 Trivial to integrate new persistence mechanisms
with Voldemort
 2 Classes:
Config(data) & Storage Engine(servers info)
 3 Operations:
put(k, v), get(k), delete(k)
 Complication:
k is Versioned<Key>
Architecture of Queue
Key-Value Storage
• To enable high performance and availability it
allow only very simple key-value data access.
• Both keys and values can be complex compound
objects including lists or maps, but none-the-less
the only supported queries are effectively the
following:
value = store.get(key)
store.put(key, value)
store.delete(key)
Query execution
• Voldemort supports hashtable semantics, so a
single value can be modified at a time and
retrieval is by primary key.
• This makes distribution across machines
particularly easy since everything can be split by
the primary key.
Consistent Hashing Mechanism
• In order to effectively Scaling , the data in
Voldemort is split-up in such a way that each item
is stored on multiple Servers.
• For retrieving data first figure out which is the
correct server to use. This partitioning is done via
a consistent hashing mechanism that let’s any
server calculate the location of data without doing
any expensive look ups
Detecting Failure
• Voldemort set an SLA (service level agreement)
for the requests and ban servers who cannot meet
their SLA (this could be because they are down,
because requests are timing out).
• Servers that violate this SLA get banned for a short
period of time, after which they attempt to restore
them.
Dealing With Failure
 Since each value is stored in multiple places it is
possible that one of these servers will not get
updated (say because it is crashed when the
update occurs).
 To solve this problem Voldemort uses a data
versioning mechanism called Vector Clocks.
 This data versioning allows the servers to detect
stale data when it is read and repair it.
Comparison to Hbase databases
Query
language
Architecture
Database
Model
Replication Issues
Voldemort API calls
Big Unordered
Map
Key-value
NoSQL
Distributed
data
structure
Topology
Aware
Routing
Strategies
Not
Satisfyin
g ACID
Properti
es.
Hbase
API calls
REST
XML
Thrift
Big Multi-
dimensional
Sorted Map
HDFS
Master-
slave/Master
-master
replication
Master
Slave
Which Is
Not
Highly
Availabl
e
Acid Properties
Voldemort Hbase
Atomicity
Yes Yes
Consistency
Yes Yes
Isolation
No Yes
Durability Yes Yes
Locking model Optimistic Locking MVCC
Free for commercial use Yes Yes
Industry Implementation
 Linkdin founder of voldemort
 shopping cart used by Gilt Groupe, which is
powered by Voldemort.
Use Case
 High-Performance Key-Value Store (Amazon
Dynamo clone)
 treats the key‐value store as an API and adds an
in‐memory caching layer, which means that you
can plug into the back end that makes the most
sense for your particular needs.
Pros
• only efficient queries are possible, very predictable
performance.
• easy to distribute across a cluster.
• clean separation of storage and logic.
• The storage layer is completely mockable so
development and unit testing can be done against a
throw-away in-memory storage system without
needing a real cluster.
Cons
• no complex query filters
• no foreign key constraints
• no triggers
• No built-in support for “multiple data center”-
aware routing (there must be 1 copy of each key in
at least one data center)
Conclusion
• It is basically just a big, distributed, persistent,
fault-tolerant hash table.
• The redundancy of storage makes the system
more resilient to server failure. Since each value is
stored N times, you can tolerate as many as N –
1 machine failures without data loss.
Refrences
1. http://blog.linkedin.com/2009/04/01/project-voldemort-part-ii-how-it-works/
2. http://blog.linkedin.com/2009/03/20/project-voldemort-scaling-simple-storage-at-
linkedin/
3. http://en.wikipedia.org/wiki/Voldemort_(distributed_data_store)
4. http://highscalability.com/product-project-voldemort-distributed-database
5. http://www.dummies.com/how-to/content/using-pluggable-storage-with-nosql.html
6. http://vschart.com/compare/project-voldemort/vs/hbase
7. http://www.project-voldemort.com/voldemort/design.html

More Related Content

What's hot

A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
Navdeep Charan
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
Mohammed Fazuluddin
 
Text Classification
Text ClassificationText Classification
Text Classification
RAX Automation Suite
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Transaction management in DBMS
Transaction management in DBMSTransaction management in DBMS
Transaction management in DBMS
Megha Sharma
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
varshakumar21
 
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)
Ravinder Kamboj
 
Data base security & integrity
Data base security &  integrityData base security &  integrity
Data base security & integrity
Pooja Dixit
 
ORACLE ARCHITECTURE
ORACLE ARCHITECTUREORACLE ARCHITECTURE
ORACLE ARCHITECTURE
Manohar Tatwawadi
 
Ddb 1.6-design issues
Ddb 1.6-design issuesDdb 1.6-design issues
Ddb 1.6-design issues
Esar Qasmi
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
hktripathy
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
Zalpa Rathod
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
Fabio Fumarola
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
Ashwani Kumar
 
Information retrieval dynamic indexing
Information retrieval dynamic indexingInformation retrieval dynamic indexing
Information retrieval dynamic indexing
Nadia Nahar
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
Fabio Fumarola
 

What's hot (20)

A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
 
Locking in SQL Server
Locking in SQL ServerLocking in SQL Server
Locking in SQL Server
 
Text Classification
Text ClassificationText Classification
Text Classification
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Transaction management in DBMS
Transaction management in DBMSTransaction management in DBMS
Transaction management in DBMS
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)
 
Data base security & integrity
Data base security &  integrityData base security &  integrity
Data base security & integrity
 
SQL_NOTES.pdf
SQL_NOTES.pdfSQL_NOTES.pdf
SQL_NOTES.pdf
 
ORACLE ARCHITECTURE
ORACLE ARCHITECTUREORACLE ARCHITECTURE
ORACLE ARCHITECTURE
 
Ddb 1.6-design issues
Ddb 1.6-design issuesDdb 1.6-design issues
Ddb 1.6-design issues
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Cloud Reference Model
Cloud Reference ModelCloud Reference Model
Cloud Reference Model
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
Information retrieval dynamic indexing
Information retrieval dynamic indexingInformation retrieval dynamic indexing
Information retrieval dynamic indexing
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
 

Viewers also liked

Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to Production
Vinoth Chandar
 
Project Voldemort
Project VoldemortProject Voldemort
Project Voldemort
Fabiano Da Ventura
 
Project Voldemort
Project VoldemortProject Voldemort
Project Voldemort
Gregory Pence
 
Project Voldemort: Big data loading
Project Voldemort: Big data loadingProject Voldemort: Big data loading
Project Voldemort: Big data loading
Dan Harvey
 
Introducción a Voldemort - Innova4j
Introducción a Voldemort - Innova4jIntroducción a Voldemort - Innova4j
Introducción a Voldemort - Innova4j
Innova4j
 
Project voldemort - When relation database is not enough (too much?)
Project voldemort - When relation database is not enough (too much?)Project voldemort - When relation database is not enough (too much?)
Project voldemort - When relation database is not enough (too much?)nurkiewicz
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State DrivesVinoth Chandar
 
Bancos de dados open source
Bancos de dados open sourceBancos de dados open source
Bancos de dados open source
Rodrigo Aurélio
 
Composing and Executing Parallel Data Flow Graphs wth Shell Pipes
Composing and Executing Parallel Data Flow Graphs wth Shell PipesComposing and Executing Parallel Data Flow Graphs wth Shell Pipes
Composing and Executing Parallel Data Flow Graphs wth Shell PipesVinoth Chandar
 
Sha 2 기반 인증서 업그레이드 이해
Sha 2 기반 인증서 업그레이드 이해Sha 2 기반 인증서 업그레이드 이해
Sha 2 기반 인증서 업그레이드 이해
InGuen Hwang
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
Varad Meru
 
Cassandra
CassandraCassandra
Cassandra
Upaang Saxena
 
February 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDBFebruary 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDB
Amazon Web Services
 
Distributed Hash Table and Consistent Hashing
Distributed Hash Table and Consistent HashingDistributed Hash Table and Consistent Hashing
Distributed Hash Table and Consistent Hashing
CloudFundoo
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and consSaniya Khalsa
 
Memcached의 확장성 개선
Memcached의 확장성 개선Memcached의 확장성 개선
Memcached의 확장성 개선NAVER D2
 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itnathanmarz
 
Base de Datos Orientada a Objetos
Base de Datos Orientada a ObjetosBase de Datos Orientada a Objetos
Base de Datos Orientada a Objetos
Andrés Felipe Montoya Ríos
 

Viewers also liked (20)

Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to Production
 
Project Voldemort
Project VoldemortProject Voldemort
Project Voldemort
 
Project Voldemort
Project VoldemortProject Voldemort
Project Voldemort
 
Project Voldemort: Big data loading
Project Voldemort: Big data loadingProject Voldemort: Big data loading
Project Voldemort: Big data loading
 
Introducción a Voldemort - Innova4j
Introducción a Voldemort - Innova4jIntroducción a Voldemort - Innova4j
Introducción a Voldemort - Innova4j
 
Project voldemort - When relation database is not enough (too much?)
Project voldemort - When relation database is not enough (too much?)Project voldemort - When relation database is not enough (too much?)
Project voldemort - When relation database is not enough (too much?)
 
Bluetube
BluetubeBluetube
Bluetube
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State Drives
 
Bancos de dados open source
Bancos de dados open sourceBancos de dados open source
Bancos de dados open source
 
Composing and Executing Parallel Data Flow Graphs wth Shell Pipes
Composing and Executing Parallel Data Flow Graphs wth Shell PipesComposing and Executing Parallel Data Flow Graphs wth Shell Pipes
Composing and Executing Parallel Data Flow Graphs wth Shell Pipes
 
Sha 2 기반 인증서 업그레이드 이해
Sha 2 기반 인증서 업그레이드 이해Sha 2 기반 인증서 업그레이드 이해
Sha 2 기반 인증서 업그레이드 이해
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
 
Cassandra
CassandraCassandra
Cassandra
 
February 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDBFebruary 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDB
 
Distributed Hash Table and Consistent Hashing
Distributed Hash Table and Consistent HashingDistributed Hash Table and Consistent Hashing
Distributed Hash Table and Consistent Hashing
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and cons
 
Memcached의 확장성 개선
Memcached의 확장성 개선Memcached의 확장성 개선
Memcached의 확장성 개선
 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop it
 
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4jBases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
 
Base de Datos Orientada a Objetos
Base de Datos Orientada a ObjetosBase de Datos Orientada a Objetos
Base de Datos Orientada a Objetos
 

Similar to Voldemort

Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Emprovise
 
Big Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesBig Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace Images
Mark Kromer
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
 
Data engineering
Data engineeringData engineering
Data engineering
Parimala Killada
 
Mysql wp memcached
Mysql wp memcachedMysql wp memcached
Mysql wp memcachedkbour23
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
Fayez Shayeb
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
Tung Nguyen
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
punedevscom
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
Debajani Mohanty
 
Which Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San FranciscoWhich Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San Francisco
Amazon Web Services
 
Which Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SFWhich Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SF
Amazon Web Services
 
Which Database is Right for My Workload?
Which Database is Right for My Workload?Which Database is Right for My Workload?
Which Database is Right for My Workload?
Amazon Web Services
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Amazon Web Services
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics
 
Unleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCacheUnleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCache
Amazon Web Services
 

Similar to Voldemort (20)

Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
Big Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesBig Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace Images
 
Mysql wp memcached
Mysql wp memcachedMysql wp memcached
Mysql wp memcached
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Data engineering
Data engineeringData engineering
Data engineering
 
Mysql wp memcached
Mysql wp memcachedMysql wp memcached
Mysql wp memcached
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Which Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San FranciscoWhich Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San Francisco
 
Which Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SFWhich Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SF
 
Which Database is Right for My Workload?
Which Database is Right for My Workload?Which Database is Right for My Workload?
Which Database is Right for My Workload?
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
 
Unleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCacheUnleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCache
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

Voldemort

  • 1. Project Voldemort A distributed database. Presented To: Sir Tariq Mehmood Presented By: Fasiha Ikram Aniqa Naeem
  • 2. Voldemort Voldemort is a distributed data store that is designed as a key-value store used by LinkedIn for high-scalability storage. It is named after the fictional Harry Potter villain Lord Voldemort.
  • 3. there are job titles, job openings for people, Groups and companies that offerings jobs. Big Data Variety velocity volum e Need frequently read write LinkedIn Big Data Problem
  • 4. Voldemort Scale Both • The amount of data we can store (write) • The number of requests for that data (read)
  • 5. Why Not Using Hadopp Naturally the only way to do this is to spread both the load and the data across many servers. 1. Need to find a way to split the data so that all servers have different data 2. Need to find a way to handle server failures without interrupting service 3. HBase still be write-heavy (due to horizontal partitioning and use of SSTables, which are write optimized)
  • 6. Why voldemort • Data is automatically replicated over multiple servers. • Data is automatically partitioned so each server contains only a subset of the total data • Provides tunable consistency (strict quorum or eventual consistency) • Server failure is handled transparently • Pluggable Storage Engines -- BDB-JE, MySQL, Read-Only
  • 7. Why voldemort • Pluggable serialization -- Protocol Buffers, Thrift, Avro and Java Serialization • Data items are versioned to maximize data integrity in failure scenarios without compromising availability of the system • Each node is independent of other nodes with no central point of failure or coordination
  • 8. Why voldemort • Good single node performance: you can expect 10- 20k operations per second depending on the machines, the network, the disk system, and the data replication factor • Support for pluggable data placement strategies to support things like distribution across data centers that are geographically far apart.
  • 9. Voldemort Storage Engines  Trivial to integrate new persistence mechanisms with Voldemort  2 Classes: Config(data) & Storage Engine(servers info)  3 Operations: put(k, v), get(k), delete(k)  Complication: k is Versioned<Key>
  • 11.
  • 12.
  • 13. Key-Value Storage • To enable high performance and availability it allow only very simple key-value data access. • Both keys and values can be complex compound objects including lists or maps, but none-the-less the only supported queries are effectively the following: value = store.get(key) store.put(key, value) store.delete(key)
  • 14. Query execution • Voldemort supports hashtable semantics, so a single value can be modified at a time and retrieval is by primary key. • This makes distribution across machines particularly easy since everything can be split by the primary key.
  • 15. Consistent Hashing Mechanism • In order to effectively Scaling , the data in Voldemort is split-up in such a way that each item is stored on multiple Servers. • For retrieving data first figure out which is the correct server to use. This partitioning is done via a consistent hashing mechanism that let’s any server calculate the location of data without doing any expensive look ups
  • 16. Detecting Failure • Voldemort set an SLA (service level agreement) for the requests and ban servers who cannot meet their SLA (this could be because they are down, because requests are timing out). • Servers that violate this SLA get banned for a short period of time, after which they attempt to restore them.
  • 17. Dealing With Failure  Since each value is stored in multiple places it is possible that one of these servers will not get updated (say because it is crashed when the update occurs).  To solve this problem Voldemort uses a data versioning mechanism called Vector Clocks.  This data versioning allows the servers to detect stale data when it is read and repair it.
  • 18. Comparison to Hbase databases Query language Architecture Database Model Replication Issues Voldemort API calls Big Unordered Map Key-value NoSQL Distributed data structure Topology Aware Routing Strategies Not Satisfyin g ACID Properti es. Hbase API calls REST XML Thrift Big Multi- dimensional Sorted Map HDFS Master- slave/Master -master replication Master Slave Which Is Not Highly Availabl e
  • 19. Acid Properties Voldemort Hbase Atomicity Yes Yes Consistency Yes Yes Isolation No Yes Durability Yes Yes Locking model Optimistic Locking MVCC Free for commercial use Yes Yes
  • 20. Industry Implementation  Linkdin founder of voldemort  shopping cart used by Gilt Groupe, which is powered by Voldemort.
  • 21. Use Case  High-Performance Key-Value Store (Amazon Dynamo clone)  treats the key‐value store as an API and adds an in‐memory caching layer, which means that you can plug into the back end that makes the most sense for your particular needs.
  • 22. Pros • only efficient queries are possible, very predictable performance. • easy to distribute across a cluster. • clean separation of storage and logic. • The storage layer is completely mockable so development and unit testing can be done against a throw-away in-memory storage system without needing a real cluster.
  • 23. Cons • no complex query filters • no foreign key constraints • no triggers • No built-in support for “multiple data center”- aware routing (there must be 1 copy of each key in at least one data center)
  • 24. Conclusion • It is basically just a big, distributed, persistent, fault-tolerant hash table. • The redundancy of storage makes the system more resilient to server failure. Since each value is stored N times, you can tolerate as many as N – 1 machine failures without data loss.
  • 25. Refrences 1. http://blog.linkedin.com/2009/04/01/project-voldemort-part-ii-how-it-works/ 2. http://blog.linkedin.com/2009/03/20/project-voldemort-scaling-simple-storage-at- linkedin/ 3. http://en.wikipedia.org/wiki/Voldemort_(distributed_data_store) 4. http://highscalability.com/product-project-voldemort-distributed-database 5. http://www.dummies.com/how-to/content/using-pluggable-storage-with-nosql.html 6. http://vschart.com/compare/project-voldemort/vs/hbase 7. http://www.project-voldemort.com/voldemort/design.html