Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Using Elixir to scale
Video User Profile
Service
Emerson Macedo
@emerleite
CONTEXT
User Profile API was developed in 2012.
It’s a well written Ruby on Rails
application , responsible for track
logg...
CONTEXT
User Profile API was developed in 2012.
It’s a well written Ruby on Rails
application , responsible for track
logg...
CONTEXT
Our Video Player POST Watched
Percentage to our Endpoint so we can
provide Keep Watching Percentage
to our Logged ...
CONTEXT
Our Video Player POST Watched
Percentage to our Endpoint so we can
provide Keep Watching Percentage
to our Logged ...
CONTEXT
Globo is making
Binge Watching
experiments combined
with Online First for
new TV Shows
TV Shows
CONTEXT
Prepare all Video
Applications for the
Olympics
Online First
Binge Watching
THE PROBLEM
2013
10k
The throughput increased
between 2013 and 2016
2013 2016
10k
60k
THE PROBLEM
The throughput increased
between 2013 and 2016
2013 2016
10k
60k
THE PROBLEM
The throughput increased
between 2013 and 2016 ~600%
2013 2016
10k
60kTHE PROBLEM
It was hard to predict the new
Appication Throughput
2016 / 2
?
INFRASTRUCTURE
2 bare metals, each one
with 24 CPUs and 64GB
of RAM
THE PROBLEM
The average response time
was good, but percentiles
were hurting the application
Avg
70ms
70ms
THE PROBLEM
The average response time
was good, but percentiles
were hurting the application
99
2.5s
95
1s
Avg
FIRST CHANGE
We increased the
computational resources
from 2 to 4 bare metals
FIRST CHANGE
We increased the
computational resources
from 2 to 4 bare metals
FIRST CHANGE
We increased the
computational resources
from 2 to 4 bare metals Each one with
24 CPUs and
64GB of RAM
70ms
99
2.5s
95
1s
Avg
FIRST CHANGE
This change improved
metrics by ~32%
FIRST CHANGE
This change improved
metrics by ~32%
9995
1.7s
0.7s
Avg
47ms
FIRST CHANGE
This change improved
metrics by ~32%
9995
1.7s
0.7s
Avg
47ms
~32%
better
SECOND CHANGE
We decided to try a deployment with
Tsuru containers with auto scaling, each
one with 1-4 vCPU and 2GB of RAM
SECOND CHANGE
We decided to try a deployment with
Tsuru containers with auto scaling, each
one with 1-4 vCPU and 2GB of RA...
SECOND CHANGE
9995
1.7s
0.7s
Avg
47ms
Migrate to containers improved
metrics by ~12%
SECOND CHANGE
9995
1.5s
0.5s
Avg
41ms
Migrate to containers improved
metrics by ~12%
SECOND CHANGE
9995
1.5s
0.5s
Avg
41ms
Migrate to containers improved
metrics by ~12%
~12%
better
CONTEXT
Our Video Player POST Watched
Percentage to our Endpoint so we can
provide Keep Watching Percentage
to our Logged ...
CONTEXT
Our Video Player POST Watched
Percentage to our Endpoint so we can
provide Keep Watching Percentage
to our Logged ...
Online First
Binge Watching
ARCHITECTURAL OVERVIEW
User Profile API is a
classical Ruby on Rails
application, which also
uses Resque for
background jo...
ARCHITECTURAL OVERVIEW
User Profile API is a
classical Ruby on Rails
application, which also
uses Resque for
background jo...
ARCHITECTURAL OVERVIEW
User Profile API is a
classical Ruby on Rails
application, which also
uses Resque for
background jo...
ARCHITECTURAL OVERVIEW
User Profile API is a
classical Ruby on Rails
application, which also
uses Resque for
background jo...
CONTAINER DISTRIBUTION
Our containers was distributed for the
Rails App, Resque Workers and
Resque Scheduler
93 containers
30 containers63 containers
30 containers63 containers
Resque
Rails
THIRD CHANGE
We saw that just ONE
endpoint was responsible for
80%, of application
throughput
THIRD CHANGE
We saw that just ONE
endpoint was responsible for
80%, of application
throughput
THIRD CHANGE
x
x
We saw that just ONE
endpoint was responsible for
80%, of application
throughput
CONTEXT
Our Video Player POST Watched
Percentage to our Endpoint so we can
provide Keep Watching Percentage
to our Logged ...
CONTEXT
Our Video Player POST Watched
Percentage to our Endpoint so we can
provide Keep Watching Percentage
to our Logged ...
THIRD CHANGE
We saw that just ONE endpoint was
responsible for 80%, of application
throughput
THIRD CHANGE
We saw that just ONE endpoint was
responsible for 80%, of application
throughput
Pareto
THIRD CHANGE
We saw that just ONE endpoint was
responsible for 80%, of application
throughput
Pareto
20% of effort will so...
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
WHY ELIXIR
Elixir is being the
cutting-edge for the
Ruby community
WHY ELIXIR
Elixir is being the
cutting-edge for the
Ruby community
WHY ELIXIR
Elixir is being the
cutting-edge for the
Ruby community
x
WHY ELIXIR
Elixir is being the
cutting-edge for the
Ruby community
WHY ELIXIR
Elixir is being the
cutting-edge for the
Ruby community
x
WHY ELIXIR
Elixir is being the
cutting-edge for the
Ruby community
WHY ELIXIR
Elixir is being the
cutting-edge for the
Ruby community
x
WHY ELIXIR
Elixir is the language
Java programmers
were looking when
they choose Ruby
WHY ELIXIR
Elixir is the language
Java programmers
were looking when
they choose Ruby
x
x
x
WHY ELIXIR
Elixir generates Erlang
byte code and runs
on BEAM VM which
has 30 years of
development
WHY ELIXIR
Elixir generates Erlang
byte code and runs
on BEAM VM which
has 30 years of
development
x
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
SAME APPLICATION
THIRD CHANGE
We rewrote the Ruby POST
endpoint from scratch to an
Elixir version
SAME APPLICATION
ERL PROCESS
ERL PROCESS
THIRD CHANGE
The result was a starting
point to change the app to
CQRS Architecture
THIRD CHANGE
The result was a starting
point to change the app to
CQRS Architecture
COMMAND
QUERY
THIRD CHANGE
9995
1.5s
0.5s
Avg
41ms
Migrate to Elixir improved
metrics by ~95%
THIRD CHANGE
9995
30ms
15ms
Avg
4ms
Migrate to Elixir improved
metrics by ~95%
THIRD CHANGE
9995
30ms
15ms
Avg
4ms
Migrate to Elixir improved
metrics by ~95%
~95%
better
30 containers63 containers
Resque
Rails
THIRD CHANGE
Migrate to Elixir reduced
containers by ~35%
3 containers30 containers
ElixirRuby
THIRD CHANGE
Migrate to Elixir reduced
containers by ~35%
33 containers
TOOLS
We chosse phoenix
framework to create
our Elixir API and
we’re using many other
community libs
TOOLS
We chosse phoenix
framework to create
our Elixir API and
we’re using many other
community libs
Ecto HTTPoison
Exrm C...
PROBLEMS
MongoDB Driver did not has
Replica Sets support. We had
to implement it
PROBLEMS
MongoDB Driver did not has
Replica Sets support. We had
to implement it
http//github.com/emerleite/mongox
http//g...
PROBLEMS
Elixir did not has NewRelic support.
We need to create an ad-hoc
implementation using Exometer
PROBLEMS
Elixir did not has NewRelic support.
We need to create an ad-hoc
implementation using Exometer
https://github.com...
2013 2016
10k
60kFINAL RESULT
After the Olympics and with
Binge Watching, the
throughput increased ~50%
2013 2016
10k
60kFINAL RESULT
After the Olympics and with
Binge Watching, the
throughput increased ~50%
2016 / 2
90k
70ms
99
2.5s
95
1s
Avg
FINAL RESULT
Migrate to Elixir improved
metrics by ~95%
FINAL RESULT
9995
30ms
15ms
Avg
4ms
Migrate to Elixir improved
metrics by ~95%
FINAL RESULT
9995
30ms
15ms
Avg
4ms
Migrate to Elixir improved
metrics by ~95%
~95%
better
Perguntas?
Emerson Macedo
@emerleite
https://blog.emerleite.com
How Elixir helped us scale our Video User Profile Service for the Olympics
Upcoming SlideShare
Loading in …5
×

How Elixir helped us scale our Video User Profile Service for the Olympics

662 views

Published on

A case presenting the use of Elixir to scale our Video User Profile Service for the Olympics Rio 2016

Published in: Technology

How Elixir helped us scale our Video User Profile Service for the Olympics

  1. 1. Using Elixir to scale Video User Profile Service Emerson Macedo @emerleite
  2. 2. CONTEXT User Profile API was developed in 2012. It’s a well written Ruby on Rails application , responsible for track logged user actions
  3. 3. CONTEXT User Profile API was developed in 2012. It’s a well written Ruby on Rails application , responsible for track logged user actions
  4. 4. CONTEXT Our Video Player POST Watched Percentage to our Endpoint so we can provide Keep Watching Percentage to our Logged Users. It does it every 10 seconds
  5. 5. CONTEXT Our Video Player POST Watched Percentage to our Endpoint so we can provide Keep Watching Percentage to our Logged Users. It does it every 10 secondsx x
  6. 6. CONTEXT Globo is making Binge Watching experiments combined with Online First for new TV Shows TV Shows
  7. 7. CONTEXT Prepare all Video Applications for the Olympics
  8. 8. Online First Binge Watching
  9. 9. THE PROBLEM 2013 10k The throughput increased between 2013 and 2016
  10. 10. 2013 2016 10k 60k THE PROBLEM The throughput increased between 2013 and 2016
  11. 11. 2013 2016 10k 60k THE PROBLEM The throughput increased between 2013 and 2016 ~600%
  12. 12. 2013 2016 10k 60kTHE PROBLEM It was hard to predict the new Appication Throughput 2016 / 2 ?
  13. 13. INFRASTRUCTURE 2 bare metals, each one with 24 CPUs and 64GB of RAM
  14. 14. THE PROBLEM The average response time was good, but percentiles were hurting the application Avg 70ms
  15. 15. 70ms THE PROBLEM The average response time was good, but percentiles were hurting the application 99 2.5s 95 1s Avg
  16. 16. FIRST CHANGE We increased the computational resources from 2 to 4 bare metals
  17. 17. FIRST CHANGE We increased the computational resources from 2 to 4 bare metals
  18. 18. FIRST CHANGE We increased the computational resources from 2 to 4 bare metals Each one with 24 CPUs and 64GB of RAM
  19. 19. 70ms 99 2.5s 95 1s Avg FIRST CHANGE This change improved metrics by ~32%
  20. 20. FIRST CHANGE This change improved metrics by ~32% 9995 1.7s 0.7s Avg 47ms
  21. 21. FIRST CHANGE This change improved metrics by ~32% 9995 1.7s 0.7s Avg 47ms ~32% better
  22. 22. SECOND CHANGE We decided to try a deployment with Tsuru containers with auto scaling, each one with 1-4 vCPU and 2GB of RAM
  23. 23. SECOND CHANGE We decided to try a deployment with Tsuru containers with auto scaling, each one with 1-4 vCPU and 2GB of RAM 93 containers
  24. 24. SECOND CHANGE 9995 1.7s 0.7s Avg 47ms Migrate to containers improved metrics by ~12%
  25. 25. SECOND CHANGE 9995 1.5s 0.5s Avg 41ms Migrate to containers improved metrics by ~12%
  26. 26. SECOND CHANGE 9995 1.5s 0.5s Avg 41ms Migrate to containers improved metrics by ~12% ~12% better
  27. 27. CONTEXT Our Video Player POST Watched Percentage to our Endpoint so we can provide Keep Watching Percentage to our Logged Users. It does it every 10 seconds
  28. 28. CONTEXT Our Video Player POST Watched Percentage to our Endpoint so we can provide Keep Watching Percentage to our Logged Users. It does it every 10 secondsx x
  29. 29. Online First Binge Watching
  30. 30. ARCHITECTURAL OVERVIEW User Profile API is a classical Ruby on Rails application, which also uses Resque for background jobs
  31. 31. ARCHITECTURAL OVERVIEW User Profile API is a classical Ruby on Rails application, which also uses Resque for background jobs
  32. 32. ARCHITECTURAL OVERVIEW User Profile API is a classical Ruby on Rails application, which also uses Resque for background jobs BLOCK BLOCK BLOCK
  33. 33. ARCHITECTURAL OVERVIEW User Profile API is a classical Ruby on Rails application, which also uses Resque for background jobs
  34. 34. CONTAINER DISTRIBUTION Our containers was distributed for the Rails App, Resque Workers and Resque Scheduler 93 containers
  35. 35. 30 containers63 containers
  36. 36. 30 containers63 containers Resque Rails
  37. 37. THIRD CHANGE We saw that just ONE endpoint was responsible for 80%, of application throughput
  38. 38. THIRD CHANGE We saw that just ONE endpoint was responsible for 80%, of application throughput
  39. 39. THIRD CHANGE x x We saw that just ONE endpoint was responsible for 80%, of application throughput
  40. 40. CONTEXT Our Video Player POST Watched Percentage to our Endpoint so we can provide Keep Watching Percentage to our Logged Users. It does it every 10 seconds
  41. 41. CONTEXT Our Video Player POST Watched Percentage to our Endpoint so we can provide Keep Watching Percentage to our Logged Users. It does it every 10 secondsx x
  42. 42. THIRD CHANGE We saw that just ONE endpoint was responsible for 80%, of application throughput
  43. 43. THIRD CHANGE We saw that just ONE endpoint was responsible for 80%, of application throughput Pareto
  44. 44. THIRD CHANGE We saw that just ONE endpoint was responsible for 80%, of application throughput Pareto 20% of effort will solve 80% of the problem
  45. 45. THIRD CHANGE We rewrote the Ruby POST endpoint from scratch to an Elixir version
  46. 46. THIRD CHANGE We rewrote the Ruby POST endpoint from scratch to an Elixir version
  47. 47. THIRD CHANGE We rewrote the Ruby POST endpoint from scratch to an Elixir version
  48. 48. WHY ELIXIR Elixir is being the cutting-edge for the Ruby community
  49. 49. WHY ELIXIR Elixir is being the cutting-edge for the Ruby community
  50. 50. WHY ELIXIR Elixir is being the cutting-edge for the Ruby community x
  51. 51. WHY ELIXIR Elixir is being the cutting-edge for the Ruby community
  52. 52. WHY ELIXIR Elixir is being the cutting-edge for the Ruby community x
  53. 53. WHY ELIXIR Elixir is being the cutting-edge for the Ruby community
  54. 54. WHY ELIXIR Elixir is being the cutting-edge for the Ruby community x
  55. 55. WHY ELIXIR Elixir is the language Java programmers were looking when they choose Ruby
  56. 56. WHY ELIXIR Elixir is the language Java programmers were looking when they choose Ruby x x x
  57. 57. WHY ELIXIR Elixir generates Erlang byte code and runs on BEAM VM which has 30 years of development
  58. 58. WHY ELIXIR Elixir generates Erlang byte code and runs on BEAM VM which has 30 years of development x
  59. 59. THIRD CHANGE We rewrote the Ruby POST endpoint from scratch to an Elixir version
  60. 60. THIRD CHANGE We rewrote the Ruby POST endpoint from scratch to an Elixir version
  61. 61. THIRD CHANGE We rewrote the Ruby POST endpoint from scratch to an Elixir version
  62. 62. THIRD CHANGE We rewrote the Ruby POST endpoint from scratch to an Elixir version SAME APPLICATION
  63. 63. THIRD CHANGE We rewrote the Ruby POST endpoint from scratch to an Elixir version SAME APPLICATION ERL PROCESS ERL PROCESS
  64. 64. THIRD CHANGE The result was a starting point to change the app to CQRS Architecture
  65. 65. THIRD CHANGE The result was a starting point to change the app to CQRS Architecture COMMAND QUERY
  66. 66. THIRD CHANGE 9995 1.5s 0.5s Avg 41ms Migrate to Elixir improved metrics by ~95%
  67. 67. THIRD CHANGE 9995 30ms 15ms Avg 4ms Migrate to Elixir improved metrics by ~95%
  68. 68. THIRD CHANGE 9995 30ms 15ms Avg 4ms Migrate to Elixir improved metrics by ~95% ~95% better
  69. 69. 30 containers63 containers Resque Rails
  70. 70. THIRD CHANGE Migrate to Elixir reduced containers by ~35% 3 containers30 containers ElixirRuby
  71. 71. THIRD CHANGE Migrate to Elixir reduced containers by ~35% 33 containers
  72. 72. TOOLS We chosse phoenix framework to create our Elixir API and we’re using many other community libs
  73. 73. TOOLS We chosse phoenix framework to create our Elixir API and we’re using many other community libs Ecto HTTPoison Exrm CacheX GenRetry FakeServer Corsica
  74. 74. PROBLEMS MongoDB Driver did not has Replica Sets support. We had to implement it
  75. 75. PROBLEMS MongoDB Driver did not has Replica Sets support. We had to implement it http//github.com/emerleite/mongox http//github.com/emerleite/mongox_ecto
  76. 76. PROBLEMS Elixir did not has NewRelic support. We need to create an ad-hoc implementation using Exometer
  77. 77. PROBLEMS Elixir did not has NewRelic support. We need to create an ad-hoc implementation using Exometer https://github.com/Feuerlabs/exometer_core
  78. 78. 2013 2016 10k 60kFINAL RESULT After the Olympics and with Binge Watching, the throughput increased ~50%
  79. 79. 2013 2016 10k 60kFINAL RESULT After the Olympics and with Binge Watching, the throughput increased ~50% 2016 / 2 90k
  80. 80. 70ms 99 2.5s 95 1s Avg FINAL RESULT Migrate to Elixir improved metrics by ~95%
  81. 81. FINAL RESULT 9995 30ms 15ms Avg 4ms Migrate to Elixir improved metrics by ~95%
  82. 82. FINAL RESULT 9995 30ms 15ms Avg 4ms Migrate to Elixir improved metrics by ~95% ~95% better
  83. 83. Perguntas? Emerson Macedo @emerleite https://blog.emerleite.com

×