SlideShare a Scribd company logo
Distributed Systems @ OK.RU
Oleg Anastasyev
@m0nstermind
oa@ok.ru
1. Absolutely reliable network
2. with negligible Latency
3. and practically unlimited Bandwidth
4. It is homogenous
5. Nobody can break into our LAN
6. Topology changes are unnoticeable
7. All managed by single genius admin
8. So data transport cost is zero now
2
OK.ru has come to:
1. Absolutely reliable network
2. with negligible Latency
3. and practically unlimited Bandwidth
4. It is homogenous (same HW and hop cnt to every server)
5. Nobody can break into our LAN
6. Topology changes are unnoticeable
7. All managed by single genius admin
8. So data transport cost is zero now
3
Fallacies of distributed computing
https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing
[Peter Deutsch, 1994; James Gosling 1997]
4
4
Datacenters
150
distinct
microservices
8000
iron servers
OK.RU has come to:
5
hardware
engineers
network
engineers
operations
developers
6
My friends page
1. Retrieve friends ids
2. Filter by friendship type
3. Apply black list
4. Resolve ids to profiles
5. Sort profiles
6. Retrieve stickers
7. Calculate summaries
7
The Simple WayTM
SELECT * FROM friendlist, users 

WHERE userId=? AND f.kind=? AND u.name LIKE ?
AND NOT EXISTS( SELECT * FROM blacklist …)
…
• Friendships
• 12 billions of edges, 300GB
• 500 000 requests per sec
8
Simple ways don't work
• User profiles
• > 350 millions,
• 3 500 000 requests/sec, 50 Gbit/sec
9
How stuff works
web frontend API frontend
app server
one-graph user-cache black-list
microservices
10
Micro-service dissected
Remote interface
Business logic, caches
[ Local storage ]
1 JVM
11
Micro-service dissected
Remote interface
https://github.com/odnoklassniki/one-nio
interface GraphService extends RemoteService {
@RemoteMethod
long[] getFriendsByFilter(@Partition long vertexId, long relationMask);
}
interface UserCache {

@RemoteMethod
User getUserById(long id);
}
12
App Server code
https://github.com/odnoklassniki/one-nio
long []friendsIds = graphService.getFriendsByFilter(userId, mask);
List<User> users = new ArrayList<Long>(friendsIds.length);
for (long id : friendsIds) {
if(blackList.isAllowed(userId,id)) {
users.add(userCache.getUserById(id));
}
}
…
return users;
• Partition by this parameter value
• Using partitioning strategy
• long id -> int partitionId(id) -> node1,node2,…
• Strategies can be different
• Cassandra ring, Voldemort partitions
• or …
13
interface GraphService extends RemoteService {
@RemoteMethod
long[] getFriendsByFilter(@Partition long vertexId, long relationMask);
}
14
Weighted quadrant
p = id % 16
p = 0
p = 15
p = 1
N01 N02 N03 . . . 019 020
W=1
W=100
N11
node = wrr(p)
SET
15
A coding issue
https://github.com/odnoklassniki/one-nio
long []friendsIds = graphService.getFriendsByFilter(userId, mask);
List<User> users = new ArrayList<Long>(friendsIds.length);
for (long id : friendsIds) {
if(blackList.isAllowed(userId,id)) {
users.add(userCache.getUserById(id));
}
}
…
return users;
16
latency 

= 1.0ms * 2 reqs * 200 friends

= 400 ms

A roundtrip price
0.1-0.3 ms
0.7-1.0 ms
remote datacenter
* this price is tightly coupled with the specific infrastructure and frameworks
10k friends latency = 20 seconds
17
Batch requests to the rescue
public interface UserCache {

@RemoteMethod( split = true )
Collection<User> getUsersByIds(long[] keys);
}
long []friendsIds = graphService.getFriendsByFilter(userId, mask);


friendsIds = blackList.filterAllowed(userId, friendsIds );
List<User> users = userCache.getUsersByIds(friendsIds);
…
return users;
18
split & merge
split ( ids by p )
-> ids0, ids1
p = 0
p = 1
N01 N02 N03 . . .
N11
ids0
ids1
users = merge (users0, users1)
19
1. Client crash
2. Server crash
3. Request omission
4. Response omission
5. Server timeout
6. Invalid value response
7. Arbitrary failure
What could possibly fail ?
Failures
Distributed systems at OK.RU
• We can not prevent failures - only mask them
• If a Failure can occur it will occur
• Redundancy is a must to mask failures
• Information ( error correction codes )
• Hardware (replicas, substitute hardware)
• Time (transactions, retries)
21
What to do with failures ?
22
What happened to transaction ?
Don’t give up!
Must retry !
Must give up! 

Don't retry !
? ?
Add Friend
• Client does not really know
• What client can do ?
• Don’t make any guarantees.
• Never retry. At Most Once.
• Always retry. At Least Once.
23
Was friendship succeeded ?
1. Transaction in ACID database
• single master, success is atomic (either yes or no)
• atomic rollback is possible
2. Cache cluster refresh
• many replicas, no master
• no rollback, partial failures are possible
24
Making new friendship
• Operation can be reapplied multiple times with same result
• e.g.: read, Set.add(), Math.max(x,y)
• Atomic change with order and dup control

25
Idempotence
“Always retry” policy can be applied

only on

Idempotent Operations
https://en.wikipedia.org/wiki/Idempotence
26
Idempotence in ACID database
Make friends
wait; timeout
Make friends (retry)
Friendship, peace and bubble gum !
Already friends ?
No, let’s make it !
Already friends ?
Yes, NOP !
27
Sequencing
MakeFriends (OpId)
Made friends!
Is Dup (OpId) ?
No, making changes
OpId := Generate()
Generate() examples:
• OpId+=1
• OpId=currentTimeMillis()
• OpId=TimeUUID
http://johannburkard.de/software/uuid/
1. Transaction in ACID database
• single master, success is atomic (either yes or no)
• atomic rollback is possible
2. Cache cluster refresh
• many replicas, no master
• no rollback, partial failures are possible
28
Making new friendship
29
Cache cluster refresh
add(Friend)
p = 0 N01 N02 N03 . . .
But replicas state will diverge otherwise
Retries are meaningless
• Background data sync process
• Reads updated records from ACID store



SELECT * FROM users WHERE modified > ?
• Applies them into its memory
• Loads updates on node startup
• Retry can be omitted then

30
Syncing cache from DB
31
Death by timeout
GC
Make Friends
wait; timeout
thread pool 

exhausted
1. Clients stop sending requests to server
After X continuous failures for the last second
2. Clients monitor server availability
In background, once a minute
3. And turn it back on
32
Server cut-off
33
Death by slowing down
Avg = 1.5ms
Max = 1.5c
24 cpu cores
Cap = 24,000 ops
Choose 2.4ms timeout ?
Cut it off from client if latency avg > 2.4ms ?
Avg = 24ms
Max = 1.5s
24 cpu cores
Cap = 1,000 ops
10,000 ops
34
Speculative retry
Idemponent Op
wait; timeout
Retry
Result Response
• Makes requests to replicas before timeout
• Better 99%, even average latencies
• More stable system
• Not always applicable:
• Idempotent ops, additional load, traffic (to consider)
• Can be balanced: always, >avg, >99p
35
Speculative retry
More failures !
Distributed systems @ OK.RU
• Excessive load
• Excessive paranoia
• Bugs
• Human error
• Massive outages
37
All replicas failure
38
Use of non-authoritative datasources,
degrade consistency
Use of incomplete data in UI,
partial feature degradation

Single feature full degradation
Degrade (gracefully) !
39
The code
interface UserCache {

@RemoteMethod
Distributed<Collection<User>> getUsersByIds(long[] keys);
}
interface Distributed<D>
{
boolean isInconsistency();
D getData();
}
class UserCacheStub implements UserCache {


Distributed<Collection<User>> getUsersByIds(long[] keys) {
return Distributed.inconsistent();
}
}
Resilience testing
Distributed systems at OK.RU
41
The product you make
Operations in production env
What to test for failure ?
“Standard” products - with special care !
• What is does:
• Detects network connections between servers
• Disables them (iptables drop)
• Runs auto tests
• What we check
• No crashes, nice UI messages are rendered
• Server does start and can serve requests
42
The product we make : “Guerrilla”
Production diagnostics
Distributed systems at OK.RU
• To know an accident exists. Fast.
• To track down to the source of accident. Fast.
• To prevent accidents before they happen.
44
Why
• Zabbix
• Cacti
• Operational metrics
• Names od operations, e.g. “Graph.getFriendsByFilter”
• Call count, their success or failure
• Latency of calls
45
Is (will) there be accident ?
• Current metrics and trends
• Aggregated call and failure counts
• Aggregated latencies
• Average, Max
• Percentiles 50,75,98,99,99.9
46
What charts show to us
47
More charts
48
Anomaly detection
• The possibilities for failure in distributed systems are endless
• Don't “prevent”, but mask failures through redundancy
• Degrade gracefully on unmask-able failure
• Test failures
• Production diagnostics are key to failure detection and prevention
49
Short summary
50 Distributed Systems at OK.RU
slideshare.net/m0nstermind
https://v.ok.ru/publishing.html
http://www.cs.yale.edu/homes/aspnes/classes/465/notes.pdf
Notes on Theory of Distributed Systems CS 465/565: 

Spring 2014
James Aspnes
Try these links for more

More Related Content

What's hot

HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)
akirahiguchi
 
Do we need Unsafe in Java?
Do we need Unsafe in Java?Do we need Unsafe in Java?
Do we need Unsafe in Java?
Andrei Pangin
 
Python twisted
Python twistedPython twisted
Python twisted
Mahendra M
 
Disruptor
DisruptorDisruptor
Disruptor
Larry Nung
 
Down to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsDown to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap Dumps
Andrei Pangin
 
HandlerSocket - A NoSQL plugin for MySQL
HandlerSocket - A NoSQL plugin for MySQLHandlerSocket - A NoSQL plugin for MySQL
HandlerSocket - A NoSQL plugin for MySQL
Jui-Nan Lin
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
Server Density
 
How to cook lettuce @Java casual
How to cook lettuce @Java casualHow to cook lettuce @Java casual
How to cook lettuce @Java casual
Go Hagiwara
 
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudyスローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudyYusuke Yamamoto
 
Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...
Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...
Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...
Ontico
 
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Ontico
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Ontico
 
JavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesJavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesCharles Nutter
 
Elastic 101 tutorial - Percona Europe 2018
Elastic 101 tutorial - Percona Europe 2018 Elastic 101 tutorial - Percona Europe 2018
Elastic 101 tutorial - Percona Europe 2018
Antonios Giannopoulos
 
Базы данных. HDFS
Базы данных. HDFSБазы данных. HDFS
Базы данных. HDFS
Vadim Tsesko
 
Introduction httpClient on Java11 / Java11時代のHTTPアクセス再入門
Introduction httpClient on Java11 / Java11時代のHTTPアクセス再入門Introduction httpClient on Java11 / Java11時代のHTTPアクセス再入門
Introduction httpClient on Java11 / Java11時代のHTTPアクセス再入門
tamtam180
 
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
Ontico
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
Tier1 App
 
Node.js in production
Node.js in productionNode.js in production
Node.js in production
Felix Geisendörfer
 
Lucene revolution 2011
Lucene revolution 2011Lucene revolution 2011
Lucene revolution 2011Takahiko Ito
 

What's hot (20)

HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)
 
Do we need Unsafe in Java?
Do we need Unsafe in Java?Do we need Unsafe in Java?
Do we need Unsafe in Java?
 
Python twisted
Python twistedPython twisted
Python twisted
 
Disruptor
DisruptorDisruptor
Disruptor
 
Down to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsDown to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap Dumps
 
HandlerSocket - A NoSQL plugin for MySQL
HandlerSocket - A NoSQL plugin for MySQLHandlerSocket - A NoSQL plugin for MySQL
HandlerSocket - A NoSQL plugin for MySQL
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
How to cook lettuce @Java casual
How to cook lettuce @Java casualHow to cook lettuce @Java casual
How to cook lettuce @Java casual
 
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudyスローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
 
Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...
Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...
Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...
 
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
 
JavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesJavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for Dummies
 
Elastic 101 tutorial - Percona Europe 2018
Elastic 101 tutorial - Percona Europe 2018 Elastic 101 tutorial - Percona Europe 2018
Elastic 101 tutorial - Percona Europe 2018
 
Базы данных. HDFS
Базы данных. HDFSБазы данных. HDFS
Базы данных. HDFS
 
Introduction httpClient on Java11 / Java11時代のHTTPアクセス再入門
Introduction httpClient on Java11 / Java11時代のHTTPアクセス再入門Introduction httpClient on Java11 / Java11時代のHTTPアクセス再入門
Introduction httpClient on Java11 / Java11時代のHTTPアクセス再入門
 
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
 
Node.js in production
Node.js in productionNode.js in production
Node.js in production
 
Lucene revolution 2011
Lucene revolution 2011Lucene revolution 2011
Lucene revolution 2011
 

Viewers also liked

тестирование распределенных систем
тестирование распределенных системтестирование распределенных систем
тестирование распределенных систем
Nikita Makarov
 
Распределенные системы в Одноклассниках
Распределенные системы в ОдноклассникахРаспределенные системы в Одноклассниках
Распределенные системы в Одноклассниках
odnoklassniki.ru
 
Как, используя Lucene, построить высоконагруженную систему поиска разнородных...
Как, используя Lucene, построить высоконагруженную систему поиска разнородных...Как, используя Lucene, построить высоконагруженную систему поиска разнородных...
Как, используя Lucene, построить высоконагруженную систему поиска разнородных...
odnoklassniki.ru
 
Distributed Middleware Reliability & Fault Tolerance Support in System S
Distributed Middleware Reliability & Fault Tolerance Support in System SDistributed Middleware Reliability & Fault Tolerance Support in System S
Distributed Middleware Reliability & Fault Tolerance Support in System SHarini Sirisena
 
Тестирование аварий. Андрей Губа. Highload++ 2015
Тестирование аварий. Андрей Губа. Highload++ 2015Тестирование аварий. Андрей Губа. Highload++ 2015
Тестирование аварий. Андрей Губа. Highload++ 2015
odnoklassniki.ru
 
Класс!ная Cassandra
Класс!ная CassandraКласс!ная Cassandra
Класс!ная Cassandra
odnoklassniki.ru
 
Тюним память и сетевой стек в Linux: история перевода высоконагруженных серве...
Тюним память и сетевой стек в Linux: история перевода высоконагруженных серве...Тюним память и сетевой стек в Linux: история перевода высоконагруженных серве...
Тюним память и сетевой стек в Linux: история перевода высоконагруженных серве...
odnoklassniki.ru
 
Distributed Operating System_4
Distributed Operating System_4Distributed Operating System_4
Distributed Operating System_4
Dr Sandeep Kumar Poonia
 
Communications is distributed systems
Communications is distributed systemsCommunications is distributed systems
Communications is distributed systems
SHATHAN
 
Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked...
Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked...Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked...
Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked...David Freitas
 
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Neo4j
 

Viewers also liked (12)

тестирование распределенных систем
тестирование распределенных системтестирование распределенных систем
тестирование распределенных систем
 
Распределенные системы в Одноклассниках
Распределенные системы в ОдноклассникахРаспределенные системы в Одноклассниках
Распределенные системы в Одноклассниках
 
Как, используя Lucene, построить высоконагруженную систему поиска разнородных...
Как, используя Lucene, построить высоконагруженную систему поиска разнородных...Как, используя Lucene, построить высоконагруженную систему поиска разнородных...
Как, используя Lucene, построить высоконагруженную систему поиска разнородных...
 
Distributed Middleware Reliability & Fault Tolerance Support in System S
Distributed Middleware Reliability & Fault Tolerance Support in System SDistributed Middleware Reliability & Fault Tolerance Support in System S
Distributed Middleware Reliability & Fault Tolerance Support in System S
 
Тестирование аварий. Андрей Губа. Highload++ 2015
Тестирование аварий. Андрей Губа. Highload++ 2015Тестирование аварий. Андрей Губа. Highload++ 2015
Тестирование аварий. Андрей Губа. Highload++ 2015
 
Класс!ная Cassandra
Класс!ная CassandraКласс!ная Cassandra
Класс!ная Cassandra
 
Тюним память и сетевой стек в Linux: история перевода высоконагруженных серве...
Тюним память и сетевой стек в Linux: история перевода высоконагруженных серве...Тюним память и сетевой стек в Linux: история перевода высоконагруженных серве...
Тюним память и сетевой стек в Linux: история перевода высоконагруженных серве...
 
Distributed Operating System_4
Distributed Operating System_4Distributed Operating System_4
Distributed Operating System_4
 
Communications is distributed systems
Communications is distributed systemsCommunications is distributed systems
Communications is distributed systems
 
Patterns for distributed systems
Patterns for distributed systemsPatterns for distributed systems
Patterns for distributed systems
 
Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked...
Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked...Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked...
Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked...
 
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
 

Similar to Distributed systems at ok.ru #rigadevday

Cassandra drivers and libraries
Cassandra drivers and librariesCassandra drivers and libraries
Cassandra drivers and libraries
Duyhai Doan
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on Purpose
Aman Kohli
 
Intro to Databases
Intro to DatabasesIntro to Databases
Intro to Databases
Sargun Dhillon
 
Finding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLFinding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQL
Olivier Doucet
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Data Con LA
 
Cassandra Drivers and Tools
Cassandra Drivers and ToolsCassandra Drivers and Tools
Cassandra Drivers and Tools
Duyhai Doan
 
Big data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysBig data 101 for beginners riga dev days
Big data 101 for beginners riga dev days
Duyhai Doan
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
Revolution Analytics
 
Building and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsBuilding and Scaling Node.js Applications
Building and Scaling Node.js Applications
Ohad Kravchick
 
Node.js for enterprise - JS Conference
Node.js for enterprise - JS ConferenceNode.js for enterprise - JS Conference
Node.js for enterprise - JS Conference
Timur Shemsedinov
 
Concurrency (Fisher Syer S2GX 2010)
Concurrency (Fisher Syer S2GX 2010)Concurrency (Fisher Syer S2GX 2010)
Concurrency (Fisher Syer S2GX 2010)
Dave Syer
 
Seven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch BenchmarkingSeven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch Benchmarking
Fan Robbin
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Martin Zapletal
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easy
nathanmarz
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
orkaplan
 
Rails israel 2013
Rails israel 2013Rails israel 2013
Rails israel 2013
Reuven Lerner
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
zeeg
 
How Secure Are Docker Containers?
How Secure Are Docker Containers?How Secure Are Docker Containers?
How Secure Are Docker Containers?
Ben Hall
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
Martin Zapletal
 

Similar to Distributed systems at ok.ru #rigadevday (20)

Cassandra drivers and libraries
Cassandra drivers and librariesCassandra drivers and libraries
Cassandra drivers and libraries
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on Purpose
 
Intro to Databases
Intro to DatabasesIntro to Databases
Intro to Databases
 
Finding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLFinding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQL
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
Cassandra Drivers and Tools
Cassandra Drivers and ToolsCassandra Drivers and Tools
Cassandra Drivers and Tools
 
Big data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysBig data 101 for beginners riga dev days
Big data 101 for beginners riga dev days
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Building and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsBuilding and Scaling Node.js Applications
Building and Scaling Node.js Applications
 
Node.js for enterprise - JS Conference
Node.js for enterprise - JS ConferenceNode.js for enterprise - JS Conference
Node.js for enterprise - JS Conference
 
Concurrency (Fisher Syer S2GX 2010)
Concurrency (Fisher Syer S2GX 2010)Concurrency (Fisher Syer S2GX 2010)
Concurrency (Fisher Syer S2GX 2010)
 
Seven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch BenchmarkingSeven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch Benchmarking
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easy
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
Rails israel 2013
Rails israel 2013Rails israel 2013
Rails israel 2013
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
How Secure Are Docker Containers?
How Secure Are Docker Containers?How Secure Are Docker Containers?
How Secure Are Docker Containers?
 
Handout3o
Handout3oHandout3o
Handout3o
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
 

More from odnoklassniki.ru

Кадры решают все, или стриминг видео в «Одноклассниках». Александр Тоболь
Кадры решают все, или стриминг видео в «Одноклассниках». Александр ТобольКадры решают все, или стриминг видео в «Одноклассниках». Александр Тоболь
Кадры решают все, или стриминг видео в «Одноклассниках». Александр Тоболь
odnoklassniki.ru
 
За гранью NoSQL: NewSQL на Cassandra
За гранью NoSQL: NewSQL на CassandraЗа гранью NoSQL: NewSQL на Cassandra
За гранью NoSQL: NewSQL на Cassandra
odnoklassniki.ru
 
Платформа для видео сроком в квартал. Александр Тоболь.
Платформа для видео сроком в квартал. Александр Тоболь.Платформа для видео сроком в квартал. Александр Тоболь.
Платформа для видео сроком в квартал. Александр Тоболь.
odnoklassniki.ru
 
Франкенштейнизация Voldemort или key-value данные в Одноклассниках. Роман Ан...
Франкенштейнизация Voldemort или key-value данные в Одноклассниках. Роман Ан...Франкенштейнизация Voldemort или key-value данные в Одноклассниках. Роман Ан...
Франкенштейнизация Voldemort или key-value данные в Одноклассниках. Роман Ан...
odnoklassniki.ru
 
Аварийный дамп – чёрный ящик упавшей JVM. Андрей Паньгин
Аварийный дамп – чёрный ящик упавшей JVM. Андрей ПаньгинАварийный дамп – чёрный ящик упавшей JVM. Андрей Паньгин
Аварийный дамп – чёрный ящик упавшей JVM. Андрей Паньгин
odnoklassniki.ru
 
Управление тысячами серверов в Одноклассниках. Алексей Чудов.
Управление тысячами серверов в Одноклассниках. Алексей Чудов.Управление тысячами серверов в Одноклассниках. Алексей Чудов.
Управление тысячами серверов в Одноклассниках. Алексей Чудов.
odnoklassniki.ru
 
Незаурядная Java как инструмент разработки высоконагруженного сервера
Незаурядная Java как инструмент разработки высоконагруженного сервераНезаурядная Java как инструмент разработки высоконагруженного сервера
Незаурядная Java как инструмент разработки высоконагруженного сервера
odnoklassniki.ru
 
Cистема внутренней статистики Odnoklassniki.ru
Cистема внутренней статистики Odnoklassniki.ruCистема внутренней статистики Odnoklassniki.ru
Cистема внутренней статистики Odnoklassniki.ru
odnoklassniki.ru
 

More from odnoklassniki.ru (8)

Кадры решают все, или стриминг видео в «Одноклассниках». Александр Тоболь
Кадры решают все, или стриминг видео в «Одноклассниках». Александр ТобольКадры решают все, или стриминг видео в «Одноклассниках». Александр Тоболь
Кадры решают все, или стриминг видео в «Одноклассниках». Александр Тоболь
 
За гранью NoSQL: NewSQL на Cassandra
За гранью NoSQL: NewSQL на CassandraЗа гранью NoSQL: NewSQL на Cassandra
За гранью NoSQL: NewSQL на Cassandra
 
Платформа для видео сроком в квартал. Александр Тоболь.
Платформа для видео сроком в квартал. Александр Тоболь.Платформа для видео сроком в квартал. Александр Тоболь.
Платформа для видео сроком в квартал. Александр Тоболь.
 
Франкенштейнизация Voldemort или key-value данные в Одноклассниках. Роман Ан...
Франкенштейнизация Voldemort или key-value данные в Одноклассниках. Роман Ан...Франкенштейнизация Voldemort или key-value данные в Одноклассниках. Роман Ан...
Франкенштейнизация Voldemort или key-value данные в Одноклассниках. Роман Ан...
 
Аварийный дамп – чёрный ящик упавшей JVM. Андрей Паньгин
Аварийный дамп – чёрный ящик упавшей JVM. Андрей ПаньгинАварийный дамп – чёрный ящик упавшей JVM. Андрей Паньгин
Аварийный дамп – чёрный ящик упавшей JVM. Андрей Паньгин
 
Управление тысячами серверов в Одноклассниках. Алексей Чудов.
Управление тысячами серверов в Одноклассниках. Алексей Чудов.Управление тысячами серверов в Одноклассниках. Алексей Чудов.
Управление тысячами серверов в Одноклассниках. Алексей Чудов.
 
Незаурядная Java как инструмент разработки высоконагруженного сервера
Незаурядная Java как инструмент разработки высоконагруженного сервераНезаурядная Java как инструмент разработки высоконагруженного сервера
Незаурядная Java как инструмент разработки высоконагруженного сервера
 
Cистема внутренней статистики Odnoklassniki.ru
Cистема внутренней статистики Odnoklassniki.ruCистема внутренней статистики Odnoklassniki.ru
Cистема внутренней статистики Odnoklassniki.ru
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 

Distributed systems at ok.ru #rigadevday

  • 1. Distributed Systems @ OK.RU Oleg Anastasyev @m0nstermind oa@ok.ru
  • 2. 1. Absolutely reliable network 2. with negligible Latency 3. and practically unlimited Bandwidth 4. It is homogenous 5. Nobody can break into our LAN 6. Topology changes are unnoticeable 7. All managed by single genius admin 8. So data transport cost is zero now 2 OK.ru has come to:
  • 3. 1. Absolutely reliable network 2. with negligible Latency 3. and practically unlimited Bandwidth 4. It is homogenous (same HW and hop cnt to every server) 5. Nobody can break into our LAN 6. Topology changes are unnoticeable 7. All managed by single genius admin 8. So data transport cost is zero now 3 Fallacies of distributed computing https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing [Peter Deutsch, 1994; James Gosling 1997]
  • 6. 6 My friends page 1. Retrieve friends ids 2. Filter by friendship type 3. Apply black list 4. Resolve ids to profiles 5. Sort profiles 6. Retrieve stickers 7. Calculate summaries
  • 7. 7 The Simple WayTM SELECT * FROM friendlist, users 
 WHERE userId=? AND f.kind=? AND u.name LIKE ? AND NOT EXISTS( SELECT * FROM blacklist …) …
  • 8. • Friendships • 12 billions of edges, 300GB • 500 000 requests per sec 8 Simple ways don't work • User profiles • > 350 millions, • 3 500 000 requests/sec, 50 Gbit/sec
  • 9. 9 How stuff works web frontend API frontend app server one-graph user-cache black-list microservices
  • 10. 10 Micro-service dissected Remote interface Business logic, caches [ Local storage ] 1 JVM
  • 11. 11 Micro-service dissected Remote interface https://github.com/odnoklassniki/one-nio interface GraphService extends RemoteService { @RemoteMethod long[] getFriendsByFilter(@Partition long vertexId, long relationMask); } interface UserCache {
 @RemoteMethod User getUserById(long id); }
  • 12. 12 App Server code https://github.com/odnoklassniki/one-nio long []friendsIds = graphService.getFriendsByFilter(userId, mask); List<User> users = new ArrayList<Long>(friendsIds.length); for (long id : friendsIds) { if(blackList.isAllowed(userId,id)) { users.add(userCache.getUserById(id)); } } … return users;
  • 13. • Partition by this parameter value • Using partitioning strategy • long id -> int partitionId(id) -> node1,node2,… • Strategies can be different • Cassandra ring, Voldemort partitions • or … 13 interface GraphService extends RemoteService { @RemoteMethod long[] getFriendsByFilter(@Partition long vertexId, long relationMask); }
  • 14. 14 Weighted quadrant p = id % 16 p = 0 p = 15 p = 1 N01 N02 N03 . . . 019 020 W=1 W=100 N11 node = wrr(p) SET
  • 15. 15 A coding issue https://github.com/odnoklassniki/one-nio long []friendsIds = graphService.getFriendsByFilter(userId, mask); List<User> users = new ArrayList<Long>(friendsIds.length); for (long id : friendsIds) { if(blackList.isAllowed(userId,id)) { users.add(userCache.getUserById(id)); } } … return users;
  • 16. 16 latency 
 = 1.0ms * 2 reqs * 200 friends
 = 400 ms
 A roundtrip price 0.1-0.3 ms 0.7-1.0 ms remote datacenter * this price is tightly coupled with the specific infrastructure and frameworks 10k friends latency = 20 seconds
  • 17. 17 Batch requests to the rescue public interface UserCache {
 @RemoteMethod( split = true ) Collection<User> getUsersByIds(long[] keys); } long []friendsIds = graphService.getFriendsByFilter(userId, mask); 
 friendsIds = blackList.filterAllowed(userId, friendsIds ); List<User> users = userCache.getUsersByIds(friendsIds); … return users;
  • 18. 18 split & merge split ( ids by p ) -> ids0, ids1 p = 0 p = 1 N01 N02 N03 . . . N11 ids0 ids1 users = merge (users0, users1)
  • 19. 19 1. Client crash 2. Server crash 3. Request omission 4. Response omission 5. Server timeout 6. Invalid value response 7. Arbitrary failure What could possibly fail ?
  • 21. • We can not prevent failures - only mask them • If a Failure can occur it will occur • Redundancy is a must to mask failures • Information ( error correction codes ) • Hardware (replicas, substitute hardware) • Time (transactions, retries) 21 What to do with failures ?
  • 22. 22 What happened to transaction ? Don’t give up! Must retry ! Must give up! 
 Don't retry ! ? ? Add Friend
  • 23. • Client does not really know • What client can do ? • Don’t make any guarantees. • Never retry. At Most Once. • Always retry. At Least Once. 23 Was friendship succeeded ?
  • 24. 1. Transaction in ACID database • single master, success is atomic (either yes or no) • atomic rollback is possible 2. Cache cluster refresh • many replicas, no master • no rollback, partial failures are possible 24 Making new friendship
  • 25. • Operation can be reapplied multiple times with same result • e.g.: read, Set.add(), Math.max(x,y) • Atomic change with order and dup control
 25 Idempotence “Always retry” policy can be applied
 only on
 Idempotent Operations https://en.wikipedia.org/wiki/Idempotence
  • 26. 26 Idempotence in ACID database Make friends wait; timeout Make friends (retry) Friendship, peace and bubble gum ! Already friends ? No, let’s make it ! Already friends ? Yes, NOP !
  • 27. 27 Sequencing MakeFriends (OpId) Made friends! Is Dup (OpId) ? No, making changes OpId := Generate() Generate() examples: • OpId+=1 • OpId=currentTimeMillis() • OpId=TimeUUID http://johannburkard.de/software/uuid/
  • 28. 1. Transaction in ACID database • single master, success is atomic (either yes or no) • atomic rollback is possible 2. Cache cluster refresh • many replicas, no master • no rollback, partial failures are possible 28 Making new friendship
  • 29. 29 Cache cluster refresh add(Friend) p = 0 N01 N02 N03 . . . But replicas state will diverge otherwise Retries are meaningless
  • 30. • Background data sync process • Reads updated records from ACID store
 
 SELECT * FROM users WHERE modified > ? • Applies them into its memory • Loads updates on node startup • Retry can be omitted then
 30 Syncing cache from DB
  • 31. 31 Death by timeout GC Make Friends wait; timeout thread pool 
 exhausted
  • 32. 1. Clients stop sending requests to server After X continuous failures for the last second 2. Clients monitor server availability In background, once a minute 3. And turn it back on 32 Server cut-off
  • 33. 33 Death by slowing down Avg = 1.5ms Max = 1.5c 24 cpu cores Cap = 24,000 ops Choose 2.4ms timeout ? Cut it off from client if latency avg > 2.4ms ? Avg = 24ms Max = 1.5s 24 cpu cores Cap = 1,000 ops 10,000 ops
  • 34. 34 Speculative retry Idemponent Op wait; timeout Retry Result Response
  • 35. • Makes requests to replicas before timeout • Better 99%, even average latencies • More stable system • Not always applicable: • Idempotent ops, additional load, traffic (to consider) • Can be balanced: always, >avg, >99p 35 Speculative retry
  • 36. More failures ! Distributed systems @ OK.RU
  • 37. • Excessive load • Excessive paranoia • Bugs • Human error • Massive outages 37 All replicas failure
  • 38. 38 Use of non-authoritative datasources, degrade consistency Use of incomplete data in UI, partial feature degradation
 Single feature full degradation Degrade (gracefully) !
  • 39. 39 The code interface UserCache {
 @RemoteMethod Distributed<Collection<User>> getUsersByIds(long[] keys); } interface Distributed<D> { boolean isInconsistency(); D getData(); } class UserCacheStub implements UserCache { 
 Distributed<Collection<User>> getUsersByIds(long[] keys) { return Distributed.inconsistent(); } }
  • 41. 41 The product you make Operations in production env What to test for failure ? “Standard” products - with special care !
  • 42. • What is does: • Detects network connections between servers • Disables them (iptables drop) • Runs auto tests • What we check • No crashes, nice UI messages are rendered • Server does start and can serve requests 42 The product we make : “Guerrilla”
  • 44. • To know an accident exists. Fast. • To track down to the source of accident. Fast. • To prevent accidents before they happen. 44 Why
  • 45. • Zabbix • Cacti • Operational metrics • Names od operations, e.g. “Graph.getFriendsByFilter” • Call count, their success or failure • Latency of calls 45 Is (will) there be accident ?
  • 46. • Current metrics and trends • Aggregated call and failure counts • Aggregated latencies • Average, Max • Percentiles 50,75,98,99,99.9 46 What charts show to us
  • 49. • The possibilities for failure in distributed systems are endless • Don't “prevent”, but mask failures through redundancy • Degrade gracefully on unmask-able failure • Test failures • Production diagnostics are key to failure detection and prevention 49 Short summary
  • 50. 50 Distributed Systems at OK.RU slideshare.net/m0nstermind https://v.ok.ru/publishing.html http://www.cs.yale.edu/homes/aspnes/classes/465/notes.pdf Notes on Theory of Distributed Systems CS 465/565: 
 Spring 2014 James Aspnes Try these links for more