Submit Search
Upload
Fast Online Access to Massive Offline Data - SECR 2016
•
Download as PPTX, PDF
•
0 likes
•
317 views
Felix GV
Follow
My talk on Voldemort at the 2016 Software Engineering Conference of Russia.
Read less
Read more
Software
Slideshow view
Report
Share
Slideshow view
Report
Share
1 of 26
Download now
Recommended
Introducing Venice - Strata NYC 2017
Introducing Venice - Strata NYC 2017
Felix GV
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
Cloudera, Inc.
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Data Con LA
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
Chen Zhang
Spark streaming with apache kafka
Spark streaming with apache kafka
punesparkmeetup
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Michael Rainey
Rails on HBase
Rails on HBase
EffectiveUI
Change Data Capture using Kafka
Change Data Capture using Kafka
Akash Vacher
Recommended
Introducing Venice - Strata NYC 2017
Introducing Venice - Strata NYC 2017
Felix GV
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
Cloudera, Inc.
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Data Con LA
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
Chen Zhang
Spark streaming with apache kafka
Spark streaming with apache kafka
punesparkmeetup
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Michael Rainey
Rails on HBase
Rails on HBase
EffectiveUI
Change Data Capture using Kafka
Change Data Capture using Kafka
Akash Vacher
Data streaming-systems
Data streaming-systems
imcpune
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
Быстрый онлайн-доступ к огромному количеству оффлайн-данных в LinkedIn
Быстрый онлайн-доступ к огромному количеству оффлайн-данных в LinkedIn
CEE-SEC(R)
Building Distributed Data Streaming System
Building Distributed Data Streaming System
Ashish Tadose
Spark meetup - Zoomdata Streaming
Spark meetup - Zoomdata Streaming
Zoomdata
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
botsplash.com
HBaseConAsia2018 Keynote1: Apache HBase Project Status
HBaseConAsia2018 Keynote1: Apache HBase Project Status
Michael Stack
What database
What database
Regunath B
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
WSO2
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Michael Rainey
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
MariaDB plc
ClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale out
MariaDB plc
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
FlyData Inc.
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
DataStax
Aceleracion de aplicacione 2
Aceleracion de aplicacione 2
jfth
Software defined storage real or bs-2014
Software defined storage real or bs-2014
Howard Marks
Column and hadoop
Column and hadoop
Alex Jiang
Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
MariaDB Corporation
Achieve new levels of performance for Magento e-commerce sites.
Achieve new levels of performance for Magento e-commerce sites.
Clustrix
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
DataStax
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
DataStax Academy
More Related Content
What's hot
Data streaming-systems
Data streaming-systems
imcpune
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
Быстрый онлайн-доступ к огромному количеству оффлайн-данных в LinkedIn
Быстрый онлайн-доступ к огромному количеству оффлайн-данных в LinkedIn
CEE-SEC(R)
Building Distributed Data Streaming System
Building Distributed Data Streaming System
Ashish Tadose
Spark meetup - Zoomdata Streaming
Spark meetup - Zoomdata Streaming
Zoomdata
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
botsplash.com
HBaseConAsia2018 Keynote1: Apache HBase Project Status
HBaseConAsia2018 Keynote1: Apache HBase Project Status
Michael Stack
What database
What database
Regunath B
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
WSO2
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Michael Rainey
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
MariaDB plc
ClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale out
MariaDB plc
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
FlyData Inc.
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
DataStax
Aceleracion de aplicacione 2
Aceleracion de aplicacione 2
jfth
Software defined storage real or bs-2014
Software defined storage real or bs-2014
Howard Marks
Column and hadoop
Column and hadoop
Alex Jiang
Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
MariaDB Corporation
Achieve new levels of performance for Magento e-commerce sites.
Achieve new levels of performance for Magento e-commerce sites.
Clustrix
What's hot
(20)
Data streaming-systems
Data streaming-systems
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Быстрый онлайн-доступ к огромному количеству оффлайн-данных в LinkedIn
Быстрый онлайн-доступ к огромному количеству оффлайн-данных в LinkedIn
Building Distributed Data Streaming System
Building Distributed Data Streaming System
Spark meetup - Zoomdata Streaming
Spark meetup - Zoomdata Streaming
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
HBaseConAsia2018 Keynote1: Apache HBase Project Status
HBaseConAsia2018 Keynote1: Apache HBase Project Status
What database
What database
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
ClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale out
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Aceleracion de aplicacione 2
Aceleracion de aplicacione 2
Software defined storage real or bs-2014
Software defined storage real or bs-2014
Column and hadoop
Column and hadoop
Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Achieve new levels of performance for Magento e-commerce sites.
Achieve new levels of performance for Magento e-commerce sites.
Similar to Fast Online Access to Massive Offline Data - SECR 2016
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
DataStax
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
DataStax Academy
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
Big Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
Big Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Sam Palani
Updates to Apache CloudStack and LINBIT SDS
Updates to Apache CloudStack and LINBIT SDS
ShapeBlue
Case study migration from cm13 to cm14 - Oracle Primavera P6 Collaborate 14
Case study migration from cm13 to cm14 - Oracle Primavera P6 Collaborate 14
p6academy
Kafka at Peak Performance
Kafka at Peak Performance
Todd Palino
Datastax - Why Your RDBMS fails at scale
Datastax - Why Your RDBMS fails at scale
Ruth Mills
More Datacenters, More Problems
More Datacenters, More Problems
Todd Palino
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Databricks
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDB
MongoDB
Pass 2013 dantoni azure a gs
Pass 2013 dantoni azure a gs
Joseph D'Antoni
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
Continuent
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
MongoDB
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
Trivadis
L20 Scalability
L20 Scalability
Ólafur Andri Ragnarsson
Data core overview - haluk-final
Data core overview - haluk-final
Haluk Ulubay
How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data Project
Peak Hosting
Similar to Fast Online Access to Massive Offline Data - SECR 2016
(20)
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Big Data on Cloud Native Platform
Big Data on Cloud Native Platform
Big Data on Cloud Native Platform
Big Data on Cloud Native Platform
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Updates to Apache CloudStack and LINBIT SDS
Updates to Apache CloudStack and LINBIT SDS
Case study migration from cm13 to cm14 - Oracle Primavera P6 Collaborate 14
Case study migration from cm13 to cm14 - Oracle Primavera P6 Collaborate 14
Kafka at Peak Performance
Kafka at Peak Performance
Datastax - Why Your RDBMS fails at scale
Datastax - Why Your RDBMS fails at scale
More Datacenters, More Problems
More Datacenters, More Problems
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDB
Pass 2013 dantoni azure a gs
Pass 2013 dantoni azure a gs
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
L20 Scalability
L20 Scalability
Data core overview - haluk-final
Data core overview - haluk-final
How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data Project
Recently uploaded
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
Fatema Valibhai
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
Tier1 app
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Christina Lin
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
OPEN KNOWLEDGE GmbH
EY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
Neo4j
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
Frank van der Linden
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
MyIntelliSource, Inc.
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
soniya singh
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
VICTOR MAESTRE RAMIREZ
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
aditisharan08
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
soniya singh
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽❤️🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽❤️🧑🏻 89...
gurkirankumar98700
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
Christina Lin
What is Binary Language? Computer Number Systems
What is Binary Language? Computer Number Systems
JheuzeDellosa
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
ICS
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
joe51371421
Asset Management Software - Infographic
Asset Management Software - Infographic
Hr365.us smith
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
Power Karaoke
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
stazi3110
Professional Resume Template for Software Developers
Professional Resume Template for Software Developers
Vinodh Ram
Recently uploaded
(20)
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
EY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽❤️🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽❤️🧑🏻 89...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
What is Binary Language? Computer Number Systems
What is Binary Language? Computer Number Systems
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
Asset Management Software - Infographic
Asset Management Software - Infographic
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Professional Resume Template for Software Developers
Professional Resume Template for Software Developers
Fast Online Access to Massive Offline Data - SECR 2016
1.
ESPRESSO/Voldemort©2013 LinkedIn Corporation.
All Rights Reserved. Fast Online Access to Massive Offline Data Software Engineering Conference in Russia 28 October 2016 By Felix GV
2.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Agenda ▪ Introduction – High Level Web Architecture –What is Primary Data and Derived Data? –What is Voldemort? ▪ Recent Improvements to Voldemort RO – Cross-DC Bandwidth – Multi-tenancy – Performance ▪ How to Get Started? 2
3.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems High Level Web Architecture Web Front Ends Mid Tiers Primary Databases Derived Databases 3
4.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems High Level Web Architecture Web Front Ends Mid Tiers Queue Logs Logs Change Capture Hadoop Job Scheduler Launch Jobs Database Snapshots Trigger Derived Data Refresh Data Crunching ETL Bulk Load Derived Databases Primary Databases 4
5.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems What is Primary Data and Derived Data? Primary Data is “Source of Truth” data: ▪ Users’ profiles, private messages, etc. ▪ Typically requires strong consistency & read-your-write semantics Derived Data comes from crunching primary data: ▪ “People You May Know”, “Jobs You May Be Interested In”, etc. ▪ Aggregation, Joins, Machine learning ▪ Typically generated offline, in Hadoop How can we serve derived data back to online apps? 5
6.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems What is Voldemort? Voldemort is a Distributed Key Value Store with two modes: ▪ Read-Write – Random access writes, tunable consistency ▪ Read-Only – Bulk-loaded from Hadoop Can be single DC, or globally distributed. Pluggable: ▪ Storage Engine (BDB-JE, RocksDB, Read-Only format, etc.) ▪ Serialization (Avro, JSON, Protobuf, Thrift, raw bytes, etc.) 6
7.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems What is Voldemort Read-Only? Voldemort RO Servers: ▪ Fetch entire datasets from HDFS ▪ Serve key-value read requests Build and Push job: ▪ Validates store & schema ▪ Triggers MR job – MR job partitions data ▪ Triggers server fetches ▪ Swaps new dataset 7
8.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems For Voldemort to die, all pieces must die. 8
9.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Recent Improvements to Voldemort RO ▪ Cross-DC Bandwidth – Block-level Compression – Throttling ▪ Multi-tenancy – Nuage Integration – Storage Space Quotas ▪ Performance – Build and Push Performance – Client Latency 9
10.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Voldemort Cross-DC Bandwidth Voldemort RO among largest users of cross-DC bandwidth at LinkedIn ▪ >100 TB / day ingested across the WAN, every day We added block-level compression: ▪ Output of BnP’s MR job is compressed (GZIP) ▪ Voldemort servers decompress on the fly – CPU cost of decompression deemed negligible ▪ Completely transparent to store owners ▪ ~18% reduction in dataset size 10
11.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Voldemort Cross-DC Bandwidth DC1 DC2 DC3 DC1 DC2 DC3 Inbound bandwidth used to look like this: ▪ Very spiky ▪ Each DC was fetching sequentially ▪ Love/hate relationship with Net Ops Now, with parallel fetching and throttling: ▪ No more spikes ▪ DCs are fetching in parallel ▪ Best friends with Net Ops 11
12.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Recent Improvements to Voldemort RO ▪ Cross-DC Bandwidth – Block-level Compression – Throttling ▪ Multi-tenancy – Nuage Integration – Storage Space Quotas ▪ Performance – Build and Push Performance – Client Latency 12
13.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Voldemort Multi-tenancy Nuage is LinkedIn’s internal (AWS-like) self-service provisioning infra All new stores at LinkedIn are now created via Nuage ▪ Prevents accidental pushes to the wrong cluster ▪ Store creation in production is now self-service ▪ ~3 stores / week created in production! 13
14.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Voldemort Multi-tenancy Storage Space Quotas ▪ Precise storage requirement measured during build phase ▪ Voldemort server validates quota before starting to fetch ▪ The goal is to prevent unexpected growth from killing clusters LOTS of improvements on admin commands ▪ Stability ▪ Performance 14
15.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Recent Improvements to Voldemort RO ▪ Cross-DC Bandwidth – Block-level Compression – Throttling ▪ Multi-tenancy – Nuage Integration – Storage Space Quotas ▪ Performance – Build and Push Performance – Client Latency 15
16.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Build and Push Performance New datacenter with denser clusters required major BnP refactoring As a side effect of that rewrite, BnP is also a lot more efficient ▪ Each partition replica is now built only once ▪ The following metrics are reduced (roughly by half) : – Number of reduce tasks – Amount of shuffle bandwidth – Amount of data written to HDFS Leveraged new datacenter buildout to get rid of duplicate BnP jobs 16
17.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Build and Push Performance Delta: 63.2% reduction Before: 103 hours / job After: 38 hours / job 17
18.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Build and Push Performance Delta: 31.4% reduction Before: 381 jobs / day After: 261 jobs / day 18
19.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Build and Push Performance Delta: 74.4% reduction Before: 4.48 years / day After: 1.15 years / day 19
20.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Client Latency Major client/server communication rewrite: ▪ Client and server’s memory footprint reduced by half ▪ Client garbage collection reduced by 80% ▪ Failure detection is more accurate ▪ Solved the stability issues of our highest throughput clients Major effort to upgrade all clients to the latest version. 20
21.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Client Latency Set up: ▪ 20 client instances ▪ 40 nodes multi-tenant Voldemort cluster – >90 stores – 56K QPS at peak Metric: ▪ Worst p95/p99 latency from all clients ▪ Units are in milliseconds (“m” mean microseconds) ▪ Old code in pink ▪ New code in green 21
22.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Client Latency 95th percentile 22
23.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Client Latency 99th percentile 23
24.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems 24
25.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems How to Get Started? # Get code git clone https://github.com/voldemort/voldemort cd voldemort # Launch Voldemort Servers ./gradlew jar ./bin/voldemort-shell.sh config/readonly_two_nodes_cluster/node0/config & ./bin/voldemort-shell.sh config/readonly_two_nodes_cluster/node1/config & # Launch Build and Push job ./gradlew bnpJar ./bin/run-bnp.sh <config_file> 25
26.
©2016 LinkedIn Corporation.
All Rights Reserved. Distributed Data Systems Questions? (We’re hiring!)
Download now