Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GPARS: Lessons from the parallel universe - Itamar Tayer, CoolaData

558 views

Published on

As presented in CodeMotion Tel Aviv:
http://telaviv.codemotionworld.com

Published in: Technology
  • Be the first to comment

GPARS: Lessons from the parallel universe - Itamar Tayer, CoolaData

  1. 1. Lessons from the Parallel Universe November 2015 Itamar Tayer Otiot.org Itamar.tayer@cooladata.com GPARS @CoolaData LinkedIn.com/company/CoolaData www.facebook.com/Cooladata plus.google.com/+CooladataGplus/posts
  2. 2. AGENDA Enters Gpars Actors Try groovy! Agents Data parallelism & Fork/Join Data flow Parallel programming crisis What’s wrong with threads?
  3. 3. 3 Try Groovy • Almost zero learning curve for Java developers • Seamless integration with Java classes and libraries • Lots of good libraries • Grails • Gradle • Gorm • Spock • Json • And much more…
  4. 4. 4 Parallel Computing Crisis • Computational power is achieved by increasing the number of cores on the chip, rather than making each core faster • More and more models are using a (large) cluster of machines to solve complex problems • We need a computing paradigm that allow us to create reliable parallelism in a natural way
  5. 5. 5 What About 100 Cores?
  6. 6. 6
  7. 7. 7 What’s Wrong with Java Threads? • “Threads are evil” • Easy to get wrong, hard to get right • Hard to reason about the model state • Bugs (Dead locks, race conditions etc. ) are difficult to discover and track • Couples applicative and parallelism code • Limited, expensive and low level resources • Threading model is only applicable to local parallelism, not for distribution
  8. 8. 8 Pointing to the Problem… class Point { private int x,y; public setX(int newVal){ | //T1 x=newVal; | p.setX(2) } | p.setY(2) public setY(int newVal){ | //T2 y=newVal; | p.setX(4) } | p.setY(4) }
  9. 9. 9 Pointing to the Problem… class Point { private int x,y; public synchronized setX(int newVal){ x=newVal; } public synchronized setY(int newVal){ y=newVal; } }
  10. 10. 10 Pointing to the Problem… class Point { private int x,y; public synchronized set(int newX, int newY){ x=newX; y=newY; } }
  11. 11. 11 Pointing to the Problem… class Point { private int x,y; public synchronized set(Integer newX, Integer newY){ if (newX != null) x=newX; if (newY != null) y=newY; } }
  12. 12. 12 Another Observation public class ValueHolder { private List listeners = new LinkedList(); private int value; public interface Listener { public void valueChanged(int newValue); } public void addListener(Listener listener) { Listeners.add(listener); } public void setValue(int newValue) { value = newValue; Iterator i = listeners.iterator(); while(i.hasNext()) { ((Listener)i.next()).valueChanged(newValue); } } }
  13. 13. 13 We Need a Better Paradigm • Describe parallelism problems in an abstract, application agnostic model • Solve concurrency issues at the infrastructure level • Allow developer concentrate on application logic • Prevent developers from dealing with concurrency primitives • Just like memory is handled in JAVA!
  14. 14. 14 Enters GPARS • The groovy library for parallelism • Bundled in groovy since 1.8 • Offers various parallelism models • Clean and concise code due to groovy flexibility
  15. 15. 15 Data Parallelism def range = 1..100 println Runtime.getRuntime().availableProcessors() //sequential range.each { println it + ',' + Thread.currentThread()} //concurrent ParallelEnhancer.enhanceInstance(range) range.asConcurrent { range.each {println it + ',' + Thread.currentThread()} }
  16. 16. 16 Data Parallelism def numbers = [10,6,3,8,1,5,7,2,4,9] def sleeper = {number -> sleep number*100; print "${number}, "} withPool(numbers.size()){ def sorter = sleeper.asyncFun() numbers.each{sorter it} }
  17. 17. 17 Actor • Very common and reliable parallelism model • System is built from independent actors that communicate with each other via messages • Every actor has a ‘mail box’ for messages, and messages are processed one by one • A scheduler provide threads to actors that need to process a message • Program is naturally concurrent, no need to consider concurrency in application code
  18. 18. 18 Actor Model
  19. 19. ONLINE GAMING MODEL Actor Case Study 19 ZONE 01 ZONE 03 ZONE 02
  20. 20. 20 Groovy Actor final class MyDDA extends DynamicDispatchActor { def myState=0; void onMessage(String message) { myState+=1 println "Received string ${myState}" } void onMessage(Integer message) { myState+=2 println "Received integer ${myState}" }
  21. 21. 21 Using The Actor def myActor = new MyDDA().start() final Thread t1 = Thread.start {while (true) { myActor.send "Hello“ sleep 1000 } } final Thread t2 = Thread.start {while (true) { myActor << 12 sleep 1000 } }
  22. 22. 22 Actor-Based Concurrency
  23. 23. 23 Agents • Sometimes sharing memory does make sense • Agent offers another level of indirection between a reference and it’s actual state • Agent encapsulates the execution Ref X X1 X2 Agent X
  24. 24. 24 Groovy Agent class Point { int x,y } agent = new Agent(new Point()) agent.send{it.x=2;it.y=2} agent<<{it.x=4;it.y=4} sleep(1000) def value = agent.val; println "$value.x“ println "$value.y";
  25. 25. 25 Data Flow • Data flow model allow us to divide our problem to smaller parallel computations • Each sub task is written independently, and the framework takes care for putting it all together • each data flow task can get a value only once during it’s life time. Reading the value will block until it’s ready • This will align all the tasks in the right order, and create a completely deterministic flow
  26. 26. 26 Data Flow final def x = new DataflowVariable() final def y = new DataflowVariable() final def z = new DataflowVariable() task { z << x.val + y.val } task { x << 10 //or get some value from the network } task { y << 5 //or do a long computation } println "Result: ${z.val}"
  27. 27. Build vs. Buy CoolaData Offers the Best of Both Worlds 27 ETL REAL TIME PROCESSING HBASE Couchbase CASSANDRA INTERACTIVE PROCESSING Exasol Vertica Redshift BATCH PROCESSING HADOOP HIVE REAL-TIME PROCESSING (STORM, KINSESIS) DATA VISUALIZATION (EXCEL, TABLEAU, QlikView) STRUCTURED AND UNSTRUCTURED DATA (HDFS, S3) vs • Lower cost of ownership • Faster time to market • Future proof: customizable and open • Stronger analytical power enabling predictive and proactive analytics • Development requires specific know-how and experience, not core to the business
  28. 28. 28 Low Risk, Quick 2 Results Grows with your organizational needs Proven

×