0
S Anand, Chief Data Scientist, Gramener
Scaling Real-time
Visualisations for
Elections 2014
@sanand0
https://gramener.com/election/story.ddp
What’s the largest number of
people that stood in an election?
“We’ll cross 5 million
visitors tomorrow”
Nielsen’s server
ETL
Candidate Votes
Visualisation
template
1 2 3 4
Azure Ubuntu server
Singapore
Gramener
Visualisation s...
WHY NGINX?
http://wiki.dreamhost.com/Web_Server_
Performance_Comparison
Split load
Cache it
Serve static files
directly
Compress content
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●...
wiki.nginx.org
h5bp.github.io
Only 1 image
… but a
3MB SVG
Kraken.io
Inkscape
2 decimal places 3 decimal places 4 decimal places
●●●●●●●●● ●●●●●●●●●●
●●●●●
●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●
...
Nielsen’s server
ETL
Candidate Votes
Visualisation
template
1 2 3 4
Azure Ubuntu server
Singapore
Gramener
Visualisation s...
We need these
filters to work
instantly
We cannot
afford a server
request for
every filter
change
We need client-side cont...
HTML
XML
Prolog
Javascript
Python
Java
How content is written
Declarative Procedural
How data is used to write it
Map attributes to functions
Templates
Binding
Create HTML strings
Declarative Procedural
Templates
Binding
Underscore
knockout
jQuery
d3
Let’s make a
bar chart
with each of
these
Examples ...
underscore: declare a template
jQuery: procedurally create the HTML
knockout: declaratively bind data to HTML
d3: procedurally bind data to elements and attribu
Nielsen’s server
ETL
Candidate Votes
Visualisation
template
1 2 3 4
Azure Ubuntu server
Singapore
Gramener
Visualisation s...
1.5 MB of data every second
but some of it is static
some is redundant
and some misspelt or wrong
Correct mis-spellings
Load just what you need (query time reduced by 70%)
Normalise static data
Refresh only the changed data
When gzipped, JSON is no larger than CSV
JSON is natively parsed and more flexibleJSON?
Redu...
“We’ll cross 5 million
visitors tomorrow”
0
200
400
600
800
1000
1200
1400
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Thousands
Half a million just...
0
200
400
600
800
1000
1200
1400
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Thousands
Over 1.3 million in...
0
200
400
600
800
1000
1200
1400
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Thousands
10 million visits e...
Does age make a difference?
Do old candidates win less often?
1%
2%
4%
6%
9%
11%
14%
11%
16%
18%
22% 22%
33%
0%
10%
20%
30%
40%
25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70...
Name length
Scaling real-time visualisations for Elections 2014
Scaling real-time visualisations for Elections 2014
Scaling real-time visualisations for Elections 2014
Scaling real-time visualisations for Elections 2014
Scaling real-time visualisations for Elections 2014
Scaling real-time visualisations for Elections 2014
Scaling real-time visualisations for Elections 2014
Scaling real-time visualisations for Elections 2014
Scaling real-time visualisations for Elections 2014
Scaling real-time visualisations for Elections 2014
Upcoming SlideShare
Loading in...5
×

Scaling real-time visualisations for Elections 2014

900

Published on

How Gramener scaled up the Indian Elections 2014 live website

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
900
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
24
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Scaling real-time visualisations for Elections 2014"

  1. 1. S Anand, Chief Data Scientist, Gramener Scaling Real-time Visualisations for Elections 2014 @sanand0
  2. 2. https://gramener.com/election/story.ddp
  3. 3. What’s the largest number of people that stood in an election?
  4. 4. “We’ll cross 5 million visitors tomorrow”
  5. 5. Nielsen’s server ETL Candidate Votes Visualisation template 1 2 3 4 Azure Ubuntu server Singapore Gramener Visualisation server Real time nginx 1 2 3 4 SQL Server CNN Windows server Noida, India ETL rsync Candidate Votes CNN WinXP laptop Noida, India Every 10 seconds Every 10s Let’s optimize backwards
  6. 6. WHY NGINX? http://wiki.dreamhost.com/Web_Server_ Performance_Comparison
  7. 7. Split load Cache it
  8. 8. Serve static files directly
  9. 9. Compress content
  10. 10. ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● 1,518 KB 379 KBgzipped to
  11. 11. wiki.nginx.org
  12. 12. h5bp.github.io
  13. 13. Only 1 image … but a 3MB SVG
  14. 14. Kraken.io
  15. 15. Inkscape
  16. 16. 2 decimal places 3 decimal places 4 decimal places ●●●●●●●●● ●●●●●●●●●● ●●●●● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●● ●●● 95KB 145KB 613KB SVG Compression
  17. 17. Nielsen’s server ETL Candidate Votes Visualisation template 1 2 3 4 Azure Ubuntu server Singapore Gramener Visualisation server Real time nginx 1 2 3 4 SQL Server CNN Windows server Noida, India ETL rsync Candidate Votes CNN WinXP laptop Noida, India Every 10 seconds Every 10s Now, optimize the rendering
  18. 18. We need these filters to work instantly We cannot afford a server request for every filter change We need client-side content generation, driven by data
  19. 19. HTML XML Prolog Javascript Python Java How content is written Declarative Procedural
  20. 20. How data is used to write it Map attributes to functions Templates Binding Create HTML strings
  21. 21. Declarative Procedural Templates Binding Underscore knockout jQuery d3 Let’s make a bar chart with each of these Examples of representative libraries https://github.com/sanand0/fifthel-2
  22. 22. underscore: declare a template
  23. 23. jQuery: procedurally create the HTML
  24. 24. knockout: declaratively bind data to HTML
  25. 25. d3: procedurally bind data to elements and attribu
  26. 26. Nielsen’s server ETL Candidate Votes Visualisation template 1 2 3 4 Azure Ubuntu server Singapore Gramener Visualisation server Real time nginx 1 2 3 4 SQL Server CNN Windows server Noida, India ETL rsync Candidate Votes CNN WinXP laptop Noida, India Every 10 seconds Every 10s Finally, optimize data
  27. 27. 1.5 MB of data every second but some of it is static some is redundant and some misspelt or wrong
  28. 28. Correct mis-spellings Load just what you need (query time reduced by 70%)
  29. 29. Normalise static data
  30. 30. Refresh only the changed data When gzipped, JSON is no larger than CSV JSON is natively parsed and more flexibleJSON? Redundancy 27KB
  31. 31. “We’ll cross 5 million visitors tomorrow”
  32. 32. 0 200 400 600 800 1000 1200 1400 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Thousands Half a million just in the first hour
  33. 33. 0 200 400 600 800 1000 1200 1400 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Thousands Over 1.3 million in the next!
  34. 34. 0 200 400 600 800 1000 1200 1400 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Thousands 10 million visits election day
  35. 35. Does age make a difference? Do old candidates win less often?
  36. 36. 1% 2% 4% 6% 9% 11% 14% 11% 16% 18% 22% 22% 33% 0% 10% 20% 30% 40% 25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90 0 500 1000 1500 2000 2500 Win % The number of winning candidates as a % of candidates in the age group Candidates The number of candidates in each age group LokSabha(2004onwards)
  37. 37. Name length
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×