SlideShare a Scribd company logo
1 of 34
Download to read offline
basho
Core Concepts
Introduction to Riak
AKQA
24th July 2013
Friday, 26 July 13
WHO AM I?
Joel Jacobson
Technical Evangelist
BashoTechnologies
@joeljacobson
Friday, 26 July 13
Distributed computing is
HARD.
Friday, 26 July 13
PROBLEMS?
• Concurrency and latency at scale
• Data consistency
• Uptime/failover
• MultiTenancy
• SLA’s
Friday, 26 July 13
WHAT IS RIAK?
• Key-Value store + extras
• Distributed and horizontally scalable
• Fault-tolerant
• Highly available
• Built for the web
Friday, 26 July 13
INSPIRED BY AMAZON DYNAMO
• White paper released to describe a database system to be
used for their shopping cart
• Masterless, peer-coordinated replication
• Dynamo inspired data-stores; Riak, Cassandra, Voldemort
etc.
• Consistent hashing - no sharding :-)
• Eventually consistent
Friday, 26 July 13
RIAK KEY-VALUE STORE
• Simple operations - GET, PUT, DELETE
• Value is opaque, with metadata
• Extras, e.g.
• Secondary Indexes (2i)
• MapReduce
• Full text search
Friday, 26 July 13
HORIZONTALLY SCALABLE
• Near linear scalability
• Query load and data are spread evenly
• Add more nodes and get more:
• ops/second
• storage capacity
• compute power (for Map/Reduce)
Friday, 26 July 13
FAULTTOLERANT
• All nodes participate equally - no single point of failure (SPOF)
• All data is replicated
• Clusters self heal - Handoff, Active Anti-Entropy
• Cluster transparently survives...
• node failure
• network partitions
• Built on Erlang/OTP (designed for FT)
Friday, 26 July 13
HIGHLY AVAILABLE
• Any node can serve client requests
• Fallbacks are used when nodes are down
• Always accepts read and write requests
• Per-request quorums
Friday, 26 July 13
QUORUMS - N/R/W
• Tunable down to bucket level
• n_val = 3 by default
• w / r = 2 by default
• w = 1 - Quicker response time, read could be inconsistent in
short term
• w = all - Slower response, increased data consistency
Friday, 26 July 13
CAPTHEOREM
• C = Consistency
• A = Availability
• P = PartitionTolerance
• Cap theorem states that a
distributed shared data
system can at most support
2 out of these 3 properties
DB DB DB
Client Client
Network/Data Partition
Friday, 26 July 13
THE RING
Friday, 26 July 13
REPLICATION
• Replicated to 3 nodes by default (n_val =3, which is
configurable)
Friday, 26 July 13
DISASTER SCENARIO
• Node fails
• Request goes to fallback
• Node comes back
• Handoff - data retuned to
recovered node
• Normal operations resume
automatically
Friday, 26 July 13
DISASTER SCENARIO
• Node fails
• Request goes to fallback
• Node comes back
• Handoff - data retuned to
recovered node
• Normal operations resume
automatically hash(“user_id”)
Friday, 26 July 13
ACTIVE ANTI-ENTROPY
• Automatically repair inconsistencies in data
• Active Anti-Entropy was new in 1.3.0 and uses Merkle trees to
compare data in partitions and periodically ensure consistency
• Active Anti-Entropy runs as a background process
• Can also be configured as a manual process
Friday, 26 July 13
CONFLICT RESOLUTION
• Network partitions and concurrent actors modifying the
same data cause data divergence
• Riak provides two solutions to manage this that can be set
on bucket level:
• Last Write Wins - an approach used for some use cases
• Vector Clocks - Retain “sibling” copies of data for merging
Friday, 26 July 13
VECTOR CLOCKS
• Every node has an ID
• Send last-seen vector clock in every “put” request
• Can be viewed as ‘commit history’ e.g Git
• Lets you decide conflicts
Friday, 26 July 13
SIBLING CREATION
0
32
1
Object
v1
Object
v1
[{a,3}]
[{a,2},{b,1}]
1) 2)
[{a,3}]
[{a,2},{b,1}]
0
32
1
Object
v1
Object v1
Object v1
• Siblings can be created by:
• Simultaneous writes (based on same object version)
• Network partitions
• Writes to existing key without submitting vector clock
Friday, 26 July 13
STORAGE BACKENDS
• Bitcask
• LevelDB
• Memory
• Multi
Friday, 26 July 13
BITCASK
• A fast, append-only key-value store
• In memory key lookup table (key_dir) data on disk
• Closed files are immutable
• Merging cleans up old data
• Developed by BashoTechnologies
• Suitable for bounded data, e.g. reference data
Friday, 26 July 13
LEVELDB
• Key-Value storage developed by Google
• Append-only for very large data sets
• Multiple levels of SSTable-like data structures
• Allows for more advanced querying (2i)
• It includes compression (Snappy algorithm)
• Suitable for unbounded data or advanced querying
Friday, 26 July 13
MEMORY
• Data is never persisted to disk
• Typically used for “test” databases
(unit tests... etc)
• Definable memory limits per vnode
• Configurable object expiry
• Useful for highly transient data
Friday, 26 July 13
MULTI
• Configure multiple storage engines for different types of data
• Configure the “default” storage engine
• Choose storage engine on per bucket basis
• No reason not to use it
Friday, 26 July 13
CLIENT APIS
• Riak supports two main client types:
• REST based HTTP Interface
• Easy to use from command line and simple scripts
• Useful if using intermediate caching layer, e.g.Varnish
• Protocol Buffers
• Optimized binary encoding standard developed by Google
• More performant than HTTP interface
Friday, 26 July 13
CLIENT LIBRARIES
• Client libraries supported by Basho:
• Community supported languages and frameworks:
• C/C++, Clojure, Common Lisp, Dart, Django, Go, Grails, Griffon, Groovy,
Erlang, Haskell, Java, .NET, Node.js, OCaml , Perl, PHP, Play, Python, Racket,
Ruby, Scala, Smalltalk
Friday, 26 July 13
• Using Riak as datastore for all back-end systems supporting
Angry Birds
• Game-state storage, ID/Login, Payments, Push notifications,
analytics, advertisements
• 9 clusters in use with over 100 nodes
• 263 million active monthly users
Friday, 26 July 13
• Spine2 project - storing patient data (80 million+)
• 500 complex messages per second
• 20,000 integrated end points
• 0 data loss
• 99.9% availability SLA
Friday, 26 July 13
• Push to talk application
• Billions of requests daily
• > 50 dedicated servers
• Everything stored in Riak
• https://github.com/mranney/node_riak
Friday, 26 July 13
MULTI DATACENTER
REPLICATION (MDC)
• Allows data to be replicated between clusters in different data
centers. Can handle larger latencies.
• Two synchronization modes that can be used together: real-
time and full sync
• Set up as uni-directional or bi-directional replication
• Can be used for global load-balancing, business continuity and
back-ups
Friday, 26 July 13
RIAK-CS
• Built on top of Riak and supports MDC
• S3 compatible object storage
• Supports multi-tenancy
• Per-tenant usage data and statistics on network I/O
• Supports Objects of Arbitrary ContentType Up to 5TB
• Often used to build private cloud storage
Friday, 26 July 13
PLAY AROUND WITH RIAK?
• https://github.com/joeljacobson/riak-dev-cluster
• https://github.com/joeljacobson/vagrant-riak-cluster
Friday, 26 July 13
THANKYOU
joel@basho.com
basho
Friday, 26 July 13

More Related Content

What's hot

Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersScyllaDB
 
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...DataStax Academy
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesBackup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesLeandro Totino Pereira
 
Keeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter whatKeeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter whatScyllaDB
 
Using druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scaleUsing druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scaleItai Yaffe
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataAltinity Ltd
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation Ericsson Labs
 
Webinar how to build a highly available time series solution with kairos-db (1)
Webinar  how to build a highly available time series solution with kairos-db (1)Webinar  how to build a highly available time series solution with kairos-db (1)
Webinar how to build a highly available time series solution with kairos-db (1)Julia Angell
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesScyllaDB
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesScyllaDB
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsOleg Magazov
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture PatternsMaynooth University
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHParis Data Engineers !
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyondMatija Gobec
 
Try Cloud Spanner
Try Cloud SpannerTry Cloud Spanner
Try Cloud SpannerSimon Su
 
Proofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaProofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaDataStax Academy
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...DataStax
 

What's hot (20)

Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
 
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesBackup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipes
 
Keeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter whatKeeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter what
 
Using druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scaleUsing druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scale
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 
Webinar how to build a highly available time series solution with kairos-db (1)
Webinar  how to build a highly available time series solution with kairos-db (1)Webinar  how to build a highly available time series solution with kairos-db (1)
Webinar how to build a highly available time series solution with kairos-db (1)
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary Differences
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking Databases
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture Patterns
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVH
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyond
 
Try Cloud Spanner
Try Cloud SpannerTry Cloud Spanner
Try Cloud Spanner
 
Proofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaProofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social Media
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
 

Viewers also liked

Building Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreBuilding Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreAndy Gross
 
Erlang latest version & opensource projects
Erlang latest version & opensource projectsErlang latest version & opensource projects
Erlang latest version & opensource projectsDigikrit
 
Vagrant for developer setup
Vagrant for developer setupVagrant for developer setup
Vagrant for developer setupakqaanoraks
 
Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Rusty Klophaus
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta DataDigikrit
 
James Turner (Caplin) - Enterprise HTML5 Patterns
James Turner (Caplin) - Enterprise HTML5 PatternsJames Turner (Caplin) - Enterprise HTML5 Patterns
James Turner (Caplin) - Enterprise HTML5 Patternsakqaanoraks
 
Riak - From Small to Large
Riak - From Small to LargeRiak - From Small to Large
Riak - From Small to LargeRusty Klophaus
 
Riak Training Session — Surge 2011
Riak Training Session — Surge 2011Riak Training Session — Surge 2011
Riak Training Session — Surge 2011DstroyAllModels
 
Riak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRiak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRusty Klophaus
 

Viewers also liked (9)

Building Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreBuilding Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak Core
 
Erlang latest version & opensource projects
Erlang latest version & opensource projectsErlang latest version & opensource projects
Erlang latest version & opensource projects
 
Vagrant for developer setup
Vagrant for developer setupVagrant for developer setup
Vagrant for developer setup
 
Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
 
James Turner (Caplin) - Enterprise HTML5 Patterns
James Turner (Caplin) - Enterprise HTML5 PatternsJames Turner (Caplin) - Enterprise HTML5 Patterns
James Turner (Caplin) - Enterprise HTML5 Patterns
 
Riak - From Small to Large
Riak - From Small to LargeRiak - From Small to Large
Riak - From Small to Large
 
Riak Training Session — Surge 2011
Riak Training Session — Surge 2011Riak Training Session — Surge 2011
Riak Training Session — Surge 2011
 
Riak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRiak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared State
 

Similar to Introduction to Riak - Joel Jacobson

Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeIke Ellis
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraPatrick McFadin
 
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSBetter, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSJohn Burwell
 
What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?Ivan Zoratti
 
Big Data with MySQL
Big Data with MySQLBig Data with MySQL
Big Data with MySQLIvan Zoratti
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonHentsū
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionLucidworks
 
Apereo OAE - Architectural overview
Apereo OAE - Architectural overviewApereo OAE - Architectural overview
Apereo OAE - Architectural overviewNicolaas Matthijs
 
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonMariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonIvan Zoratti
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
201311 - Middleware
201311 - Middleware201311 - Middleware
201311 - Middlewarelyonjug
 
Making Cloudy Peanut Butter Cups: Apache CloudStack + Riak CS
Making Cloudy Peanut Butter Cups: Apache CloudStack + Riak CSMaking Cloudy Peanut Butter Cups: Apache CloudStack + Riak CS
Making Cloudy Peanut Butter Cups: Apache CloudStack + Riak CSJohn Burwell
 
Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)Ryan Tabora
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...✔ Eric David Benari, PMP
 
Building Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CSBuilding Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CSJohn Burwell
 

Similar to Introduction to Riak - Joel Jacobson (20)

Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data Landscape
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache Cassandra
 
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSBetter, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
 
What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?
 
MySQL 开发
MySQL 开发MySQL 开发
MySQL 开发
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
Big Data with MySQL
Big Data with MySQLBig Data with MySQL
Big Data with MySQL
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
 
6269441.ppt
6269441.ppt6269441.ppt
6269441.ppt
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with Fusion
 
Apereo OAE - Architectural overview
Apereo OAE - Architectural overviewApereo OAE - Architectural overview
Apereo OAE - Architectural overview
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonMariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
201311 - Middleware
201311 - Middleware201311 - Middleware
201311 - Middleware
 
Making Cloudy Peanut Butter Cups: Apache CloudStack + Riak CS
Making Cloudy Peanut Butter Cups: Apache CloudStack + Riak CSMaking Cloudy Peanut Butter Cups: Apache CloudStack + Riak CS
Making Cloudy Peanut Butter Cups: Apache CloudStack + Riak CS
 
Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
 
Building Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CSBuilding Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CS
 

Recently uploaded

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Recently uploaded (20)

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Introduction to Riak - Joel Jacobson

  • 1. basho Core Concepts Introduction to Riak AKQA 24th July 2013 Friday, 26 July 13
  • 2. WHO AM I? Joel Jacobson Technical Evangelist BashoTechnologies @joeljacobson Friday, 26 July 13
  • 4. PROBLEMS? • Concurrency and latency at scale • Data consistency • Uptime/failover • MultiTenancy • SLA’s Friday, 26 July 13
  • 5. WHAT IS RIAK? • Key-Value store + extras • Distributed and horizontally scalable • Fault-tolerant • Highly available • Built for the web Friday, 26 July 13
  • 6. INSPIRED BY AMAZON DYNAMO • White paper released to describe a database system to be used for their shopping cart • Masterless, peer-coordinated replication • Dynamo inspired data-stores; Riak, Cassandra, Voldemort etc. • Consistent hashing - no sharding :-) • Eventually consistent Friday, 26 July 13
  • 7. RIAK KEY-VALUE STORE • Simple operations - GET, PUT, DELETE • Value is opaque, with metadata • Extras, e.g. • Secondary Indexes (2i) • MapReduce • Full text search Friday, 26 July 13
  • 8. HORIZONTALLY SCALABLE • Near linear scalability • Query load and data are spread evenly • Add more nodes and get more: • ops/second • storage capacity • compute power (for Map/Reduce) Friday, 26 July 13
  • 9. FAULTTOLERANT • All nodes participate equally - no single point of failure (SPOF) • All data is replicated • Clusters self heal - Handoff, Active Anti-Entropy • Cluster transparently survives... • node failure • network partitions • Built on Erlang/OTP (designed for FT) Friday, 26 July 13
  • 10. HIGHLY AVAILABLE • Any node can serve client requests • Fallbacks are used when nodes are down • Always accepts read and write requests • Per-request quorums Friday, 26 July 13
  • 11. QUORUMS - N/R/W • Tunable down to bucket level • n_val = 3 by default • w / r = 2 by default • w = 1 - Quicker response time, read could be inconsistent in short term • w = all - Slower response, increased data consistency Friday, 26 July 13
  • 12. CAPTHEOREM • C = Consistency • A = Availability • P = PartitionTolerance • Cap theorem states that a distributed shared data system can at most support 2 out of these 3 properties DB DB DB Client Client Network/Data Partition Friday, 26 July 13
  • 14. REPLICATION • Replicated to 3 nodes by default (n_val =3, which is configurable) Friday, 26 July 13
  • 15. DISASTER SCENARIO • Node fails • Request goes to fallback • Node comes back • Handoff - data retuned to recovered node • Normal operations resume automatically Friday, 26 July 13
  • 16. DISASTER SCENARIO • Node fails • Request goes to fallback • Node comes back • Handoff - data retuned to recovered node • Normal operations resume automatically hash(“user_id”) Friday, 26 July 13
  • 17. ACTIVE ANTI-ENTROPY • Automatically repair inconsistencies in data • Active Anti-Entropy was new in 1.3.0 and uses Merkle trees to compare data in partitions and periodically ensure consistency • Active Anti-Entropy runs as a background process • Can also be configured as a manual process Friday, 26 July 13
  • 18. CONFLICT RESOLUTION • Network partitions and concurrent actors modifying the same data cause data divergence • Riak provides two solutions to manage this that can be set on bucket level: • Last Write Wins - an approach used for some use cases • Vector Clocks - Retain “sibling” copies of data for merging Friday, 26 July 13
  • 19. VECTOR CLOCKS • Every node has an ID • Send last-seen vector clock in every “put” request • Can be viewed as ‘commit history’ e.g Git • Lets you decide conflicts Friday, 26 July 13
  • 20. SIBLING CREATION 0 32 1 Object v1 Object v1 [{a,3}] [{a,2},{b,1}] 1) 2) [{a,3}] [{a,2},{b,1}] 0 32 1 Object v1 Object v1 Object v1 • Siblings can be created by: • Simultaneous writes (based on same object version) • Network partitions • Writes to existing key without submitting vector clock Friday, 26 July 13
  • 21. STORAGE BACKENDS • Bitcask • LevelDB • Memory • Multi Friday, 26 July 13
  • 22. BITCASK • A fast, append-only key-value store • In memory key lookup table (key_dir) data on disk • Closed files are immutable • Merging cleans up old data • Developed by BashoTechnologies • Suitable for bounded data, e.g. reference data Friday, 26 July 13
  • 23. LEVELDB • Key-Value storage developed by Google • Append-only for very large data sets • Multiple levels of SSTable-like data structures • Allows for more advanced querying (2i) • It includes compression (Snappy algorithm) • Suitable for unbounded data or advanced querying Friday, 26 July 13
  • 24. MEMORY • Data is never persisted to disk • Typically used for “test” databases (unit tests... etc) • Definable memory limits per vnode • Configurable object expiry • Useful for highly transient data Friday, 26 July 13
  • 25. MULTI • Configure multiple storage engines for different types of data • Configure the “default” storage engine • Choose storage engine on per bucket basis • No reason not to use it Friday, 26 July 13
  • 26. CLIENT APIS • Riak supports two main client types: • REST based HTTP Interface • Easy to use from command line and simple scripts • Useful if using intermediate caching layer, e.g.Varnish • Protocol Buffers • Optimized binary encoding standard developed by Google • More performant than HTTP interface Friday, 26 July 13
  • 27. CLIENT LIBRARIES • Client libraries supported by Basho: • Community supported languages and frameworks: • C/C++, Clojure, Common Lisp, Dart, Django, Go, Grails, Griffon, Groovy, Erlang, Haskell, Java, .NET, Node.js, OCaml , Perl, PHP, Play, Python, Racket, Ruby, Scala, Smalltalk Friday, 26 July 13
  • 28. • Using Riak as datastore for all back-end systems supporting Angry Birds • Game-state storage, ID/Login, Payments, Push notifications, analytics, advertisements • 9 clusters in use with over 100 nodes • 263 million active monthly users Friday, 26 July 13
  • 29. • Spine2 project - storing patient data (80 million+) • 500 complex messages per second • 20,000 integrated end points • 0 data loss • 99.9% availability SLA Friday, 26 July 13
  • 30. • Push to talk application • Billions of requests daily • > 50 dedicated servers • Everything stored in Riak • https://github.com/mranney/node_riak Friday, 26 July 13
  • 31. MULTI DATACENTER REPLICATION (MDC) • Allows data to be replicated between clusters in different data centers. Can handle larger latencies. • Two synchronization modes that can be used together: real- time and full sync • Set up as uni-directional or bi-directional replication • Can be used for global load-balancing, business continuity and back-ups Friday, 26 July 13
  • 32. RIAK-CS • Built on top of Riak and supports MDC • S3 compatible object storage • Supports multi-tenancy • Per-tenant usage data and statistics on network I/O • Supports Objects of Arbitrary ContentType Up to 5TB • Often used to build private cloud storage Friday, 26 July 13
  • 33. PLAY AROUND WITH RIAK? • https://github.com/joeljacobson/riak-dev-cluster • https://github.com/joeljacobson/vagrant-riak-cluster Friday, 26 July 13