Scaling real-time visualisations for Elections 2014

1,339 views

Published on

How Gramener scaled up the Indian Elections 2014 live website

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,339
On SlideShare
0
From Embeds
0
Number of Embeds
228
Actions
Shares
0
Downloads
28
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Scaling real-time visualisations for Elections 2014

  1. 1. S Anand, Chief Data Scientist, Gramener Scaling Real-time Visualisations for Elections 2014 @sanand0
  2. 2. https://gramener.com/election/story.ddp
  3. 3. What’s the largest number of people that stood in an election?
  4. 4. “We’ll cross 5 million visitors tomorrow”
  5. 5. Nielsen’s server ETL Candidate Votes Visualisation template 1 2 3 4 Azure Ubuntu server Singapore Gramener Visualisation server Real time nginx 1 2 3 4 SQL Server CNN Windows server Noida, India ETL rsync Candidate Votes CNN WinXP laptop Noida, India Every 10 seconds Every 10s Let’s optimize backwards
  6. 6. WHY NGINX? http://wiki.dreamhost.com/Web_Server_ Performance_Comparison
  7. 7. Split load Cache it
  8. 8. Serve static files directly
  9. 9. Compress content
  10. 10. ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● 1,518 KB 379 KBgzipped to
  11. 11. wiki.nginx.org
  12. 12. h5bp.github.io
  13. 13. Only 1 image … but a 3MB SVG
  14. 14. Kraken.io
  15. 15. Inkscape
  16. 16. 2 decimal places 3 decimal places 4 decimal places ●●●●●●●●● ●●●●●●●●●● ●●●●● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●● ●●● 95KB 145KB 613KB SVG Compression
  17. 17. Nielsen’s server ETL Candidate Votes Visualisation template 1 2 3 4 Azure Ubuntu server Singapore Gramener Visualisation server Real time nginx 1 2 3 4 SQL Server CNN Windows server Noida, India ETL rsync Candidate Votes CNN WinXP laptop Noida, India Every 10 seconds Every 10s Now, optimize the rendering
  18. 18. We need these filters to work instantly We cannot afford a server request for every filter change We need client-side content generation, driven by data
  19. 19. HTML XML Prolog Javascript Python Java How content is written Declarative Procedural
  20. 20. How data is used to write it Map attributes to functions Templates Binding Create HTML strings
  21. 21. Declarative Procedural Templates Binding Underscore knockout jQuery d3 Let’s make a bar chart with each of these Examples of representative libraries https://github.com/sanand0/fifthel-2
  22. 22. underscore: declare a template
  23. 23. jQuery: procedurally create the HTML
  24. 24. knockout: declaratively bind data to HTML
  25. 25. d3: procedurally bind data to elements and attribu
  26. 26. Nielsen’s server ETL Candidate Votes Visualisation template 1 2 3 4 Azure Ubuntu server Singapore Gramener Visualisation server Real time nginx 1 2 3 4 SQL Server CNN Windows server Noida, India ETL rsync Candidate Votes CNN WinXP laptop Noida, India Every 10 seconds Every 10s Finally, optimize data
  27. 27. 1.5 MB of data every second but some of it is static some is redundant and some misspelt or wrong
  28. 28. Correct mis-spellings Load just what you need (query time reduced by 70%)
  29. 29. Normalise static data
  30. 30. Refresh only the changed data When gzipped, JSON is no larger than CSV JSON is natively parsed and more flexibleJSON? Redundancy 27KB
  31. 31. “We’ll cross 5 million visitors tomorrow”
  32. 32. 0 200 400 600 800 1000 1200 1400 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Thousands Half a million just in the first hour
  33. 33. 0 200 400 600 800 1000 1200 1400 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Thousands Over 1.3 million in the next!
  34. 34. 0 200 400 600 800 1000 1200 1400 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Thousands 10 million visits election day
  35. 35. Does age make a difference? Do old candidates win less often?
  36. 36. 1% 2% 4% 6% 9% 11% 14% 11% 16% 18% 22% 22% 33% 0% 10% 20% 30% 40% 25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90 0 500 1000 1500 2000 2500 Win % The number of winning candidates as a % of candidates in the age group Candidates The number of candidates in each age group LokSabha(2004onwards)
  37. 37. Name length

×