This document discusses strategies for scaling applications and services across multiple data centers and cloud regions. It provides examples of patterns for queue-based load leveling and publishing/subscribing to channels. It also outlines the structure of a Redis cache with different data types to store user profiles and activity. Scaling considerations for DocumentDB are reviewed based on request units.
19. Request units
S0 250 kr 177,00
S1 1000 kr 353,00
S2 2500 kr 706,00
Sekunder mellom hver ping S1 RU's 1000
120,00 Antall 10,00 kr 3 530,00
API klienter Request pr. minutt Pr. sekund MS pr melding RU's needed RU's total RU Balance
100 000,00 50 000,00 833,33 1,20 10 791,67 10 000,00 -791,67
50 000,00 25 000,00 416,67 2,40 5 395,83 10 000,00 4 604,17
20 000,00 10 000,00 166,67 6,00 2 158,33 10 000,00 7 841,67
10 000,00 5 000,00 83,33 12,00 1 079,17 10 000,00 8 920,83
Request charge
Save recently watched Document lookup 2,28
Replace existing 10,67
Total 12,95
20.
21. Redis cache structure
Name Type Key / Score Data
Plo Hashset ProgramId Program list item
sp_{seriesId} Set ProgramIds
uf_{userId} Hashset ProgramId Serialized data
usf_{userId} SortedSet Unix epoch date added ProgramId
uh_{userId} Hashset ProgramId Serialized data
ush_{userId} SortedSet Unix epoch, last
watched
ProgramId
User loaded ul_{userId} Bool
User last write time uw_{userId} Date time
Hva er personalisering for NRK.
Favoritter, mine programmer. Analyse, kobles til innholdsplakat.
20-40K API requests / minutt
Median 13-15ms
95 persentil 150-200ms
99 persentil 600-800ms
TV, Radio og Klipp.
Read only
Ingen «kontroll» på når data endres.
Web, App,
100K samtidige seere on-demand
Oppdateringsfrekvens
Play start, delayed
Hvert 2 minutt
Pause
Onbeforeunload
Program slutt
ca 55K request / minutt
Separate API kall / Resources
Berikelse av eksisterende kall
Mengden data øker over tid, hvem er master.
Presentasjon av personaliserte data
Synkronisering mellom to datasentre
Caching
Skalering
8 fallacies of distributed computing
The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn't change.
There is one administrator.
Transport cost is zero.
The network is homogeneous.
Command Query Responsibility Segregation
Naturlig med forskjellig skriveløp, gir mulighet å skalere de to individuelt.
Queue based load leveling
Ujevn ytelse
Competing consumers. - mulighet å øke ytelsen på skrivelaget ved behov. Reliability, feiler en så fortsetter resten. En treg melding vil ikke forstyrre resten.
Publish subscribe – fra observer. Her melding dupliseres og behandles individuelt.
Content based router - filtrering
Command Query Responsibility Segregation
Naturlig med forskjellig skriveløp, gir mulighet å skalere de to individuelt.
Partitioning component of Amazon's storage system Dynamo[4][5]
Data partitioning in Apache Cassandra[6]
Data Partitioning in Voldemort[7]
Akka's consistent hashing router[8]
Forbedringer:
Egen kø hvis noe må sendes på nytt for en worker, topic sender alltid alt til alle.
Cassandra – data senter aware storage.Eventhub -> Stream analytics – multiple datacenter