0
Amdahl’s Law Redux
Alexander Gounares
Concurix Corporation
June 26, 2013
Using Concurix’s Node.js Analytics Tools
Themanycoreeraishere.
Themanycoreeraishere.
Themanycoreeraishere.
The Manycore era is here now
AMD Opteron family 15h
64 cores: 16 per chip x 4 sockets
Streaming SIMD...
Themanycoreeraishere.
Themanycoreeraishere.
Traditional software does not scale on
manycore
http://pdos.csail.mit.edu/papers/linux:osdi10.p
df
L...
Themanycoreeraishere.
Erlang and NodeJS to the rescue!
Themanycoreeraishere.
Erlang Node.JS
Single assignment
No shared state
Lightweight processes
Message passing
Recursion and...
Themanycoreeraishere.
Themanycoreeraishere.
NodeJS example
Themanycoreeraishere.
Now something more complex….
Themanycoreeraishere.
Ruby Rails Erlang Style!
Themanycoreeraishere.
Real world Chicago Boss application (concurix.com)
December 2012
No performance difference between 1...
Themanycoreeraishere.
Pic of dark day
Themanycoreeraishere.
Back to Amdahl's Law—it’s all about the locks
Themanycoreeraishere.
Where we are now… 45x+!!!
*: A different load tester was used in the December 2012 numbers, this is ...
Themanycoreeraishere.
We are not alone…
Concurix has over 1.5 million (and growing!) profile
data sets.
100% of these prof...
Themanycoreeraishere.
How we did it…
Themanycoreeraishere.
Profiling with Real Time
Visualizations and Big Data
Themanycoreeraishere.
Gprof!
Source: http://linuxforengineers.blogspot.com/2012/07/its-time-to-speed-up-your-code-with.html
Themanycoreeraishere.
Can we do better…?
Themanycoreeraishere.
Yes!
Too much data—2Gz+ * 64 cores
Not enough insight (“it’s slow”)Depthofinstrumentation
Themanycoreeraishere.
Automatically detect and instrument a semantic
“middle”
Too much data—2Gz+ * 64 cores
Not enough ins...
Themanycoreeraishere.
Measuring Similarity
If we line up the data in an ordered vector, treat the vector as a point in
N d...
Themanycoreeraishere.
Clustering
Grouping data points by similarity on some metric
Algorithm: repeat until converge
Assign...
Themanycoreeraishere.
Diff DTW
INSERT Shift time out
DELETE Shift time in
Time series analysis
Use “dynamic time warping” ...
Themanycoreeraishere.
Network analysis: centrality
Centrality: relative importance of a vertex in the
network
Computation
...
Themanycoreeraishere.
Predictive Bottleneck Detection
Build a statistical model of how an application performs, and try to...
Themanycoreeraishere.
Bottleneck detection (simplified)
Order cars by arrival
Later cars are delayed more
Themanycoreeraishere.
Python MapReduce Bottleneck
Pipe synchronization
Works cross-language: C and Erlang implemented, Nod...
Themanycoreeraishere.
The Bet
Themanycoreeraishere.
Code for the bet…
ONTARGETONTARGET
DEMO
Themanycoreeraishere.
Message passing between processes
Inets
Mochiweb
Socket
ChicagoBoss
Tiny MQ
Mochiweb
Acceptor
Themanycoreeraishere.
Excessive Memory Usage
Themanycoreeraishere.
Uneven CPU Core utilization
Use the +sbt nnts flag to lock threads to schedulers!
Themanycoreeraishere.
Try it yourself--Erlang!
1. Add concurix_runtime to your rebar.config file:
{concurix_runtime, ".*",...
Themanycoreeraishere.
Try it yourself--NodeJS! (Just launched!)
1. Install the Concurix NodeJS runtime:
>npm install concu...
Visit us at
www.concurix.com!
Q&A
Themanycoreeraishere.
Almost…even very parallelizable workloads had
trouble scaling
Themanycoreeraishere.
After a bit of work….
Upcoming SlideShare
Loading in...5
×

Seattle Scalability Meetup 6-26-13

250

Published on

Amdahl's Law Redux - Using Concurix's Node.js Analytics Tools
Presentation by Alexander Gounares to Seattle Scalability Meetup Group on June 26, 2013

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
250
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Seattle Scalability Meetup 6-26-13"

  1. 1. Amdahl’s Law Redux Alexander Gounares Concurix Corporation June 26, 2013 Using Concurix’s Node.js Analytics Tools
  2. 2. Themanycoreeraishere.
  3. 3. Themanycoreeraishere.
  4. 4. Themanycoreeraishere. The Manycore era is here now AMD Opteron family 15h 64 cores: 16 per chip x 4 sockets Streaming SIMD extensions (SSE4) 128 GB RAM—512GB max 8 NUMA Domains with Hypertransport ~$4.7K ~$4.2K
  5. 5. Themanycoreeraishere.
  6. 6. Themanycoreeraishere. Traditional software does not scale on manycore http://pdos.csail.mit.edu/papers/linux:osdi10.p df Locks in traditional O-O languages limit scalability per Amdahl’s Law
  7. 7. Themanycoreeraishere. Erlang and NodeJS to the rescue!
  8. 8. Themanycoreeraishere. Erlang Node.JS Single assignment No shared state Lightweight processes Message passing Recursion and pattern matching Concurrency built in Supervisor based reliability model Server side Javascript with Google V8 engine Single threaded asynchronous callback model Multi-core via OS processes (1 or more per core) Existing libraries wrapped in closures— functional programming for the uninitiated Message passing via [domain] sockets
  9. 9. Themanycoreeraishere.
  10. 10. Themanycoreeraishere. NodeJS example
  11. 11. Themanycoreeraishere. Now something more complex….
  12. 12. Themanycoreeraishere. Ruby Rails Erlang Style!
  13. 13. Themanycoreeraishere. Real world Chicago Boss application (concurix.com) December 2012 No performance difference between 1 and 64 cores!!!!!
  14. 14. Themanycoreeraishere. Pic of dark day
  15. 15. Themanycoreeraishere. Back to Amdahl's Law—it’s all about the locks
  16. 16. Themanycoreeraishere. Where we are now… 45x+!!! *: A different load tester was used in the December 2012 numbers, this is rough approximation of the December numbers in the new tool.
  17. 17. Themanycoreeraishere. We are not alone… Concurix has over 1.5 million (and growing!) profile data sets. 100% of these profiles show at least 1 bottleneck candidate
  18. 18. Themanycoreeraishere. How we did it…
  19. 19. Themanycoreeraishere. Profiling with Real Time Visualizations and Big Data
  20. 20. Themanycoreeraishere. Gprof! Source: http://linuxforengineers.blogspot.com/2012/07/its-time-to-speed-up-your-code-with.html
  21. 21. Themanycoreeraishere. Can we do better…?
  22. 22. Themanycoreeraishere. Yes! Too much data—2Gz+ * 64 cores Not enough insight (“it’s slow”)Depthofinstrumentation
  23. 23. Themanycoreeraishere. Automatically detect and instrument a semantic “middle” Too much data—2Gz+ * 64 cores Not enough insight (“it’s slow”)Depthofinstrumentation Semantic level (e.g. code modules) Big Data Analytics
  24. 24. Themanycoreeraishere. Measuring Similarity If we line up the data in an ordered vector, treat the vector as a point in N dimensional space, then similarity between data sets can be measured by distance between the points One implementation is the cosine similarity Application example Identify transitions Data = the number of messages sent between any pair of sender-receiver, during a time window Vector = line up the data along all possible pairs, , like "mochiweb-to-poolboy”, in fixed order Data for each time window is a vector of length of NxN (N = number of processes) Big change in similarity between the data vectors means big shift in message passing activity Benchmark repeatability 1.000 0.999 0.923 0.999 benchrun-399 ----- 1.000 0.917 0.999 benchrun-403 ----- ----- 1.000 0.910 benchrun-404 ----- ----- ----- 1.000 benchrun-406
  25. 25. Themanycoreeraishere. Clustering Grouping data points by similarity on some metric Algorithm: repeat until converge Assignment to clusters: Update cluster centers Applications Often used for discovery tasks when there are little prior knowledge about data Example: Bucket the many processes to a few types according to their activities over time
  26. 26. Themanycoreeraishere. Diff DTW INSERT Shift time out DELETE Shift time in Time series analysis Use “dynamic time warping” distance to measure how well two time series match
  27. 27. Themanycoreeraishere. Network analysis: centrality Centrality: relative importance of a vertex in the network Computation Degree centrality Closeness centrality Eigenvector centrality Betweenness centrality Application Detecting most important process in message passing network Bottleneck detection
  28. 28. Themanycoreeraishere. Predictive Bottleneck Detection Build a statistical model of how an application performs, and try to predict when bottlenecks can occur *before* they occur. We accomplish this by watching the relationship between events and time over time…or: fit a line time = a + b * task to points {(task1, time1), (task2, time2), …, (taskN, timeN)} stask = 1 (N -1) taski -task( ) 2 i=1 N å rtask,time = taski - task( ) timei - time( )i=1 N å (N -1)staskstime stime = 1 N -1 timei - time( ) 2 i=1 N å b = rtask,time stime stask a = time- b*task
  29. 29. Themanycoreeraishere. Bottleneck detection (simplified) Order cars by arrival Later cars are delayed more
  30. 30. Themanycoreeraishere. Python MapReduce Bottleneck Pipe synchronization Works cross-language: C and Erlang implemented, NodeJS soon
  31. 31. Themanycoreeraishere. The Bet
  32. 32. Themanycoreeraishere. Code for the bet…
  33. 33. ONTARGETONTARGET DEMO
  34. 34. Themanycoreeraishere. Message passing between processes Inets Mochiweb Socket ChicagoBoss Tiny MQ Mochiweb Acceptor
  35. 35. Themanycoreeraishere. Excessive Memory Usage
  36. 36. Themanycoreeraishere. Uneven CPU Core utilization Use the +sbt nnts flag to lock threads to schedulers!
  37. 37. Themanycoreeraishere. Try it yourself--Erlang! 1. Add concurix_runtime to your rebar.config file: {concurix_runtime, ".*", {git, "https://github.com/Concurix/cx_runtime.git"}} 2. Start the concurix_runtime system: concurix_runtime:start() 3. Navigate to http://concurix.com/main/bench
  38. 38. Themanycoreeraishere. Try it yourself--NodeJS! (Just launched!) 1. Install the Concurix NodeJS runtime: >npm install concurixjs 2. Start the concurix_runtime system: var cx = require(‘concurixjs’).cx.start(); >node –enable-debug-as=v8debug app.js 1. Navigate to http://concurix.com/main/bench
  39. 39. Visit us at www.concurix.com! Q&A
  40. 40. Themanycoreeraishere. Almost…even very parallelizable workloads had trouble scaling
  41. 41. Themanycoreeraishere. After a bit of work….
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×