Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Getting Data (Analysis) to the User

151 views

Published on

Presentation given to the first Belgian Spark Meetup group about our work in the ExaScience Life Lab at VDA-lab with J&J R&D and Intel.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Getting Data (Analysis) to the User

  1. 1. Getting Data (analysis) to the User Toni Verbeiren @tverbeiren
  2. 2. Historical Perspective Edward Charles Pickering 1846 - 1919
  3. 3. Visualization Distribution
  4. 4. Use-case: l1000 / ComPass http://medicablogs.diariomedico.com/laboratorio/2010/03/04/la-escritura-de-los-genes-i/
  5. 5. Effect of compounds on diseases? Chemical similarity Molecule structure Size Effect on genes http://pixabay.com/nl/photos/drug/
  6. 6. http://www.lincscloud.org/l1000/ l1000 data
  7. 7. t statistics p values ranks 1000 genes compounds
  8. 8. ordered unordered
  9. 9. -10 4 1 5 -8 -2 0 9 -3 6 -7 Reference - Ranks 2 1 -3 0 10 8 -4 -7 5 9 -6 Query - Ranks Cmax = 10^2 + 9^2 + … = 385 Similarity = -0.25
  10. 10. -10 4 1 5 -8 -2 0 9 -3 6 -7 -10 4 1 5 -8 Reference - Ranks Query - Ordered Signature Cmax = 10*5 + 9*4 + 8*3 + 7*2 + 6*1 = 130 Similarity = 0.11 0 3 0 0 4 2 0 0 -1 0 -5 Query - Ranks
  11. 11. -10 4 1 5 -8 -2 0 9 -3 6 -7 -10 4 1 5 -8 Reference - Ranks Query - Unordered Signature Cmax = 10 + 9 + 8 + 7 + 6 = 45 Similarity = -0.4 0 1 0 0 1 1 0 0 -1 0 -1 Query - Ranks
  12. 12. https://www.flickr.com/photos/mkuram/4872078284/ The Challenge Part I: Data Scientist
  13. 13. Data Scientist
  14. 14. Data Scientist cluster
  15. 15. Data Scientist E xp en sive cluster
  16. 16. Data Scientist E xp en sive
  17. 17. Data Scientist E xp en sive
  18. 18. Data Scientist E xp en sive
  19. 19. ? spark-­‐shell   spark-­‐submit
  20. 20. Nx memory Nx pre-processing?
  21. 21. Zeppelin iPython (Thunder) Spark-notebook
  22. 22. iPython + PySpark + Matplotlib + Seaborn
  23. 23. iPython + PySpark + Matplotlib + Seaborn cmpd1 cmpd2 cmpd3 cmpd4 cmpd5
  24. 24. iPython + PySpark + Matplotlib + Seaborn
  25. 25. Zeppelin
  26. 26. Zeppelin
  27. 27. Zeppelin
  28. 28. @thomasjmoerman
  29. 29. Zeppelin iPython (Thunder) Spark-notebook Very good (binary) support Complete configuration Dependency management Very extensive examples Code completion Angular App Shared Spark Context %table display system
  30. 30. The Challenge Part 2: Technical Users https://hu.wikipedia.org/wiki/Thomas_Bayes
  31. 31. Nx memory Nx pre-processing? ! Shared context ?!
  32. 32. The Challenge Part 3: End Users https://commons.wikimedia.org/wiki/File:Child_and_Computer_08473.jpg
  33. 33. Apache Spark REST Interface
  34. 34. demo@herkulano
  35. 35. Apache Spark REST Interface Sourire* *https://github.com/tmoerman/sourire webcomponents Polymer D3
  36. 36. https://github.com/tverbeiren/minimal-­‐spark-­‐jobserver-­‐project
  37. 37. object  initialize  extends  SparkJob  with  NamedRddSupport  { override  def  runJob(sc:  SparkContext,  config:  Config):  Any  =  { val  location  =  Try(config.getString(“location”)).getOrElse(..) this.namedRdds.update("t_values",  t_values.data)
  38. 38. curl  -­‐X  DELETE  $jobserver’:8090/contexts/compass'   curl  -­‐-­‐data-­‐binary  @target/scala-­‐2.10/interface_2.10-­‐0.1-­‐SNAPSHOT.jar              $jobserver:8090/jars/interface   curl  -­‐d  ''              $jobserver’:8090/contexts/compass?num-­‐cpu-­‐cores=4&memory-­‐per-­‐node=8g'   curl  -­‐d  'location="s3n://jnj.exasci/L1000/"'              $jobserver’:8090/jobs?context=compass&appName=interface&classPath=l1000.initialize' >  curl  -­‐d  $'query=  HSPA1A  DNAJB1  BAG3  P4HA2  HSPA8  TMEM97  SPR  DDIT4  HMOX1  -­‐TSEN2  n  sorted=false'                    $jobserver’:8090/jobs?context=compass&appName=interface&classPath=l1000.zhang&sync=true'      "status":  "OK",      "result":  [[0,  0.16061228299421318,  0],  [1,  -­‐0.510957625536681,  1],  [2,  0.9957252193391823,  2],  [3,  0.4809034907597536,  3],  [4,   0.14026507373529962,  4],  [5,  0.033246219899197314,  5],  [6,  0.29708792234459586,  6],  [7,  -­‐0.1867649803994773,  7],  [8,  0.0872316595109203,   8],  [9,  -­‐0.42350196005226803,  9],  [10,  0.04439051708045548,  10],  [11,  -­‐0.33427291394437186,  11],  [12,  0.11161097629270114,  12],  [13,   -­‐0.5319955198805302,  13],  [14,  0.03923837969012507,  14],  [15,  -­‐0.11678178084748926,  15],  [16,  -­‐0.19945865223072615,  16],  [17,   0.21080828822101924,  17],  [18,  -­‐0.22120589882396863,  18],  [19,  -­‐0.19462385663617696,  19],  [20,  0.18030614149710658,  20],  [21,   -­‐0.24883330222139258,  21],  [22,  0.1544894530520814,  22],  [23,  0.21885383610229606,  23],  [24,  -­‐0.15159604256113496,  24],  [25,   0.18749299981332834,  25],  [26,  -­‐0.2639163711032294,  26],  [27,  0.9610416277767407,  27],  [28,  0.1631883516893784,  28],  [29,   -­‐0.13477692738473027,  29],  [30,  -­‐0.023576628710098937,  30],  [31,  0.0799141310434945,  31],  [32,  0.42159790927758073,  32],  [33,   0.9930371476572709,  33],  [34,  -­‐0.13255553481426172,  34],  [35,  0.15366809781594176,  35],  [36,  -­‐0.3778794101176031,  36],  [37,   -­‐0.402389397050588,  37],  [38,  -­‐0.4551241366436438,  38],  [39,  0.12350196005226806,  39],  [40,  0.05962292327795408,  40],  [41,   0.4972372596602576,  41],  [42,  -­‐0.26344969199178647,  42],  [43,  -­‐0.41638977039387715,  43],  [44,  -­‐0.425499346649244,  44],  [45,   -­‐0.4757700205338809,  45],  [46,  -­‐0.1707298861302968,  46],  [47,  -­‐0.4394250513347023,  47],
  39. 39. The Challenge Part 4: Binding together
  40. 40. Apache Spark REST Interface Sourire Common Functionality (lib)
  41. 41. https://github.com/tverbeiren/incubator-­‐zeppelin
  42. 42. ? @herkulano @thomasjmoerman @tverbeiren

×