Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger Menezes

1,610 views

Published on

In this talk, we’ll present techniques for visualizing large scale machine learning systems in Spark. These are techniques that are employed by Netflix to understand and refine the machine learning models behind Netflix’s famous recommender systems that are used to personalize the Netflix experience for their 99 millions members around the world. Essential to these techniques is Vegas, a new OSS Scala library that aims to be the “missing MatPlotLib” for Spark/Scala. We’ll talk about the design of Vegas and its usage in Scala notebooks to visualize Machine Learning Models.

Published in: Data & Analytics

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger Menezes

  1. 1. Vegas The Missing Matplotlib for Scala/Spark DB Tsai Roger Menezes
  2. 2. Homepage Kids Page Downloads Page Netflix Recommendations Every aspect of the Experience is Machine Learned
  3. 3. 3 2017 > 100M members > 190 countries
  4. 4. Multiple Devices
  5. 5. Genres: 23 rows/page average Sims: 10 rows/page average
  6. 6. My List: Continue Watching: Popular on Netflix: Trending Now: Watch It Again: Top Picks: Because You Watched: Genres: New Releases: Recently Added: Originals RowBillboard:
  7. 7. Machine Learning at Netflix ● Optimize the Experimentation usecase vs Productionization ● Experimentation ○ Opportunity sizing, Data Exploration ○ Feature Identification and Selection ○ Tweaks to ML algos ○ Model Evaluation
  8. 8. Experimenter’s loop Problem Explore Data Identify Features Produce Model Evaluate Model Share Findings
  9. 9. Notebooks ● Optimal for Experimentation ● Sharing reproducible research ○ Facilitates feedback loop with Product Managers ● End to end ML experiment. ○ Interactivity drives productivity
  10. 10. Python Notebooks
  11. 11. Python Notebooks ● Seamless Experience - ML experimentation ● Well known Scientific computing libraries ● Huge catalog of Visualization plotting libraries ○ Matplotlib, Seaborn, Bokeh, BQPlot, Lightning, etc.
  12. 12. Scala Notebooks ● Zeppelin, Jupyter, Databricks, Spark-Notebooks, ... ● Computing library gap filling up ● Lack of Visualization Libraries ○ Main friction point in adoption ○ End to End ML use case not convincing
  13. 13. Introducing Vegas ● Visualization Library in Scala ● Mainly built for the notebook use case ● Scala wrapper around Vega-Lite ○ Missing MatPlotLib for the Scala/Spark world.
  14. 14. DECLARATIVE STATISTICAL VISUALIZATION GRAMMAR IN SCALA You tell it WHAT should be done with the data, and it knows HOW to do it! Operations such as filtering, aggregation, faceting are built into the visualization, rather than putting the burden on the user to massage the data into shape. Complex visualizations can be built with a few high level abstractions: DATA TRANS- FORMS SCALES GUIDES MARKS cf : Altair Talk by Brian Granger in PyData 2016 https://youtu.be/v5mrwq7yJc4
  15. 15. Added Bonus of Declarative Visualizations: INTERACTIVITY! D3JS VEGAS VEGAS CODE EXPANDS OUT TO D3JS CODE!
  16. 16. Anatomy of a plot: Channels X/Y channel Shape Channel Size Channel Color Channel
  17. 17. Features…
  18. 18. 1. Supports most plot types
  19. 19. 2. Trellis plots
  20. 20. 3. Layers Layer 1. Layer 2. Layer 3.
  21. 21. 4. Notebook and Consoles
  22. 22. 5. Built-in spark support Vegas .withDataFrame(myDataFrame) .encodeX(“population”) .encodeY(“age”) Mapped Columns Pass In DF.
  23. 23. 6. Visual statistics ● Advanced Binning ● Sorting ● Scaling ● Custom Transforms ● Time Series ● Aggregation ● Filtering ● Math functions (log, etc) ● Descriptive Statistics
  24. 24. How It Works !
  25. 25. 1. Specify in Scala 2. Embed HTML (iFrame) 3. Render within iFrame using JS
  26. 26. VEGA D3JS VEGA-LITE* VEGAS MOREABSTRACTION SCALA DSL EMITS TYPE-CHECKED VEGA-LITE JSON VEGA-LITE CONVERTS INTERNALLY TO VEGA JSON SPEC VEGA TRANSLATES JSON TO D3JS CODE THAT CAN BE VERY VERBOSE A SCALA DSL FOR VEGA-LITE * Vega-Lite
  27. 27. What’s coming
  28. 28. 1. Interactive selections
  29. 29. 2. Selections transforms
  30. 30. Contributors Aish DB Roger Sudeep Jeremy
  31. 31. Thank you.
  32. 32. @NetflixResearch @rogermenezes @dbtsai The missing MatPlotLib for Scala/Spark http://vegas-viz.org

×