Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Hassle-Free Data Science Apps with Bokeh
Presenters
Peter Wang is the CTO and Co-founder of Continuum
Analytics and the creator of Bokeh.
He has been developing co...
Overview
• What is Bokeh?
• Overview and tour of major features
• Demo 1: Scikit-learn clustering
• Demo 2: Gapminder
• De...
Overview of Anaconda
is….
the modern open source analytics platform
powered by Python
the fastest growing open data science language
• Easy to ...
Anaconda
Accelerating Adoption of Python for Enterprises
COLLABORATIVE NOTEBOOKS
with publication,authentication,& search
...
Anaconda for Data Science
Empowering Everyone on the Team
Data Scientist
• Advanced analytics with Python & R
• Simplified...
Modern Analytics Stack
Write Once, Deploy Anywhere
MANAGED
PYTHON
Explore & Visualize
Python & R
Advanced Analytics
High Performance & Scalabilit...
Bokeh Overview & Tour
Bokeh
11
http://bokeh.pydata.org
• Interactive visualization
• Novel graphics
• Streaming, dynamic, large data
• For the b...
Versatile Plots
12
Novel Graphics
13
14
Linked Plots (Notebook 2)
• Easy to show multiple plots and link them
• Easy to link data selections between plots
• Ca...
15
Flexible Tools (Notebook 3)
• Many useful tools with built-in functionality
• Easy to extend with Javascript, if so inc...
rBokeh
16
http://hafen.github.io/rbokeh
Plays well with R ecosystem: HTMLwidget, RMarkdown…
rBokeh with RStudio & Shiny
17
Architecture
19
Server-side Data Processing: Python, Java, etc.
HTML
Javascript
D3
Highcharts
Flot
nvd3
dcjs
JavaScript
Plotting librar...
Browser
HTML
20
HTML
CSSJavascript
User
Data
Python, Ruby, Java, .NET
Server
Traditional Web Viz - Interaction
Javascript
...
Server
Bokeh BokehJS
JSON
(HTML, CSS)
Client
Bokeh Conceptual Architecture
User
Python, R,
Scala
Data
Simple dashboard: Si...
• Skills required: 5-10 skills
• Time to market: weeks to months
• Server code: 100s to 1000s lines
• Skills required: ~1 ...
Some Bokeh Users
Community & Adoption
Github
• 3500+ watchers
• 680 forks
Mailing list
• 400+ members
• 150+ posts in November
Downloads
• ...
25
http://cecp.mit.edu
Embeds Well
Demo: Clustering with Scikit-learn
Demo Overview
In this demo, we will build a basic application which lets us visualize
different kinds of clustering approa...
Demo: Gapminder
Demo Overview
This demo shows how we can embed a little bit of Javascript to make a
server-less but very capable interacti...
Demo: Animation & Streaming example
Demo Overview
In this demo, we will demonstrate how the Bokeh server makes it easy to
visualize streaming and dynamic data...
32
• Realtime audio sampling via
PyAudio, realtime FFT via Numpy
• 30 fps
• ~200 lines of code
Bokeh: Progress and Future
Visualizing Big Data:
Preview of “Data Shading”
35
Billions and billions…
36
Data Shading Main Points
• When trying to visualize millions of points, browser vs. rich client
doesn’t really matter
•...
37
Data Shading Pipeline
Data
Project /
Synthesize
Scene Aggregates
Sample /
Raster Transfer
Image
Visual
Abstraction
Data...
Anaconda Subscriptions and
Resources
Priority 1 support with
Dedicated Customer
Support Rep
ANACONDA
ENTERPRISE
CONTACT USCONTACT US
ANACONDA
PRO
Priority 1 su...
Contact Information and Additional Details
• Contact sales@continuum.io for more information about

Anaconda subscriptions...
Thank you
Email: sales@continuum.io
Twitter: @ContinuumIO
Peter Wang
Twitter: @pwang
Bokeh
Twitter: @bokehplots
Upcoming SlideShare
Loading in …5
×

Hassle Free Data Science Apps with Bokeh Webinar

6,973 views

Published on

DOWNLOAD the video for this webinar here: http://go.continuum.io/hassle-free-data-science-apps/

Data visualization is where your work comes to fruition - without communication, your insights don't turn into action, and your organization won't realize the value of your analytical work.

But creating and deploying data science apps is hard. You're a data scientist - not a web developer or designer. There has to be a better way.

That's why we created Bokeh, an interactive visualization framework for Python. Over the past 6 months, we've added a ton of powerful features and dramatically improved ease of use. Continuum Analytics CTO Peter Wang & Bokeh lead developer Bryan Van de Ven show you how to create rich, interactive visualizations in the browser - without writing a line of JavaScript or HTML.

In the webinar, you'll learn to:

-Use the Bokeh Visualization Framework to Easily Make Data Science Apps
-Reproduce the Famous GapMinder Example - No JavaScript or HTML Required
-Transform & Visualize Streaming Data with Scikit-Learn and Bokeh

Published in: Data & Analytics

Hassle Free Data Science Apps with Bokeh Webinar

  1. 1. Hassle-Free Data Science Apps with Bokeh
  2. 2. Presenters Peter Wang is the CTO and Co-founder of Continuum Analytics and the creator of Bokeh. He has been developing commercial scientific computing and visualization software for over 15 years. As a creator of the PyData conference, he devotes time and energy to growing the Python data community, and advocating and teaching Python at conferences worldwide. Bryan Van de Ven is the lead developer on the Bokeh project. He holds an undergraduate degree in Computer Science & Mathematics form UT Austin, and a Masters degree in Physics from UCLA. Previously Bryan developed data exploration and visualization software for sonar feature detection, financial risk modeling, and fluid mixing simulation.
  3. 3. Overview • What is Bokeh? • Overview and tour of major features • Demo 1: Scikit-learn clustering • Demo 2: Gapminder • Demo 3: Streaming data • Really big data: Preview of data shading • Q&A
  4. 4. Overview of Anaconda
  5. 5. is…. the modern open source analytics platform powered by Python the fastest growing open data science language • Easy to Build, Maintain & Deploy Analytics • Talks with Everything, Runs Anywhere • High Performance, Scalable Analytics
  6. 6. Anaconda Accelerating Adoption of Python for Enterprises COLLABORATIVE NOTEBOOKS with publication,authentication,& search Jupyter/ IPython PYTHON & PACKAGE MANAGEMENT for Hadoop & Apache stack Spark PERFORMANCE with compiled Python for lightning fast execution Numba VISUAL APPS for interactivity, streaming,& Big Bokeh SECURE & ROBUST REPOSITORY of data science libraries,scripts, & notebooks Conda ENTERPRISE DATA INTEGRATION with optimized connectors & out-of-core processing NumPy & Pandas
  7. 7. Anaconda for Data Science Empowering Everyone on the Team Data Scientist • Advanced analytics with Python & R • Simplified library management • Easily share data science notebooks & packages Developer • Support for common APIs & data formats • Common language with data scientists • Python extensibility with C, C++, etc. Business Analyst • Collaborative interactive analytics with notebooks • Rich browser based visualizations • Powerful MS Excel integration Data Engineer • Powerful & efficient libraries for data transformations • Robust processing for noisy dirty data • Support for common APIs & data formats Ops • Validated source of up-to-date packages including indemnification • Agile Enterprise Package Management • Supported across platforms Computational Scientist • Rich set of advanced analytics • Trusted & production ready libraries for numerics • Simplified scale up & scale out on clusters & GPUs
  8. 8. Modern Analytics Stack
  9. 9. Write Once, Deploy Anywhere MANAGED PYTHON Explore & Visualize Python & R Advanced Analytics High Performance & Scalability Data Engineering & Analysis Collaboration & Integration Servers Linux, Windows OSX GPUs & High End Workstations Linux & Windows NVIDIA, AMD, X86/ARM Clusters Yarn, Mesos, MPI Power8, LSF, Sungrid Engine NoSQL MongoDB Cassandra / DataStax Hadoop Cloudera, Hortonworks Apache Hadoop & Spark Files Microsoft Excel Trifacta, Import.io DW & SQL Any SQL DB Any SQL DW, Impala
  10. 10. Bokeh Overview & Tour
  11. 11. Bokeh 11 http://bokeh.pydata.org • Interactive visualization • Novel graphics • Streaming, dynamic, large data • For the browser, with or without a server • No need to write Javascript
  12. 12. Versatile Plots 12
  13. 13. Novel Graphics 13
  14. 14. 14 Linked Plots (Notebook 2) • Easy to show multiple plots and link them • Easy to link data selections between plots • Can easily customize the kind of linkage straight from Python, without needing to fiddle around with JS
  15. 15. 15 Flexible Tools (Notebook 3) • Many useful tools with built-in functionality • Easy to extend with Javascript, if so inclined
  16. 16. rBokeh 16 http://hafen.github.io/rbokeh Plays well with R ecosystem: HTMLwidget, RMarkdown…
  17. 17. rBokeh with RStudio & Shiny 17
  18. 18. Architecture
  19. 19. 19 Server-side Data Processing: Python, Java, etc. HTML Javascript D3 Highcharts Flot nvd3 dcjs JavaScript Plotting library CSV, SQL Data Traditional Web Visualization CSS Tech: • Python/R/Java • HTML & browser compat • CSS/LESS/Sass • JS plotting library API • Javascript • jQuery, underscore • svg, canvas2D • webGL, three.js • React • Angular • node.js, browserify, gulp, grunt, npm, …
  20. 20. Browser HTML 20 HTML CSSJavascript User Data Python, Ruby, Java, .NET Server Traditional Web Viz - Interaction Javascript Javascript Data’ Simple dashboard: Server language generating HTML, JS, CSS styling, subset of data Handling user interaction: Custom Javascript, calling Server endpoint, which generates updated JSON or JS that
 gets pushed back to client via websocket
  21. 21. Server Bokeh BokehJS JSON (HTML, CSS) Client Bokeh Conceptual Architecture User Python, R, Scala Data Simple dashboard: Single language, no need to write HTML, JS, CSS Handling user interaction: Single language that you already know; interactive data updates feel seamless to the user
  22. 22. • Skills required: 5-10 skills • Time to market: weeks to months • Server code: 100s to 1000s lines • Skills required: ~1 skill • Time to market: minutes • Server code: 0 Client Data BokehJS Python, R Bokeh Server Python, Ruby Java, .NET Data Client CSS Data Comparison Chart
  23. 23. Some Bokeh Users
  24. 24. Community & Adoption Github • 3500+ watchers • 680 forks Mailing list • 400+ members • 150+ posts in November Downloads • 21,500 / month (conda) • 10,000 / month (pip)
  25. 25. 25 http://cecp.mit.edu Embeds Well
  26. 26. Demo: Clustering with Scikit-learn
  27. 27. Demo Overview In this demo, we will build a basic application which lets us visualize different kinds of clustering approaches with Scikit-learn. • We will use a drop-down to select the algorithm • We will write a Python handler function which responds to the user action, and pushes an update to the plot in the browser. • Notebook for basic viz: ~25 LOC • Example app with 1 dropdown: < 100 LOC • Multiple dropdown and sliders: < 200 LOC
  28. 28. Demo: Gapminder
  29. 29. Demo Overview This demo shows how we can embed a little bit of Javascript to make a server-less but very capable interactive visualization. • We will build up the visualization from the ground up, showing different kinds of Bokeh plotting primitives • We will do it inside the Jupyter Notebook, so we can see our changes immediately • Then we will wire up an interactive slider The resulting interactive visualization will be embedded in the browser, with no reliance on a server to handle user interactions.
  30. 30. Demo: Animation & Streaming example
  31. 31. Demo Overview In this demo, we will demonstrate how the Bokeh server makes it easy to visualize streaming and dynamic data. • A minimal example with < 50 LOC • Demonstrates ease of pushing data from Python code into the browser
  32. 32. 32 • Realtime audio sampling via PyAudio, realtime FFT via Numpy • 30 fps • ~200 lines of code
  33. 33. Bokeh: Progress and Future
  34. 34. Visualizing Big Data: Preview of “Data Shading”
  35. 35. 35 Billions and billions…
  36. 36. 36 Data Shading Main Points • When trying to visualize millions of points, browser vs. rich client doesn’t really matter • Raft of common problems that are ignored: Overdraw, over- & under- saturation, clipping, coarse binning • Statistical transformations of data are a first-class aspect of the visualization • Rapid iteration of visual styles & configs, interactive selections and filtering are key concerns in data exploration When data is large, you don’t know when the viz is lying.
  37. 37. 37 Data Shading Pipeline Data Project / Synthesize Scene Aggregates Sample / Raster Transfer Image Visual Abstraction Data Transforms Visual Mappings View Transforms Data Tables Source Data Views Selection Aggregation Transfer Significant Set Aggregates
  38. 38. Anaconda Subscriptions and Resources
  39. 39. Priority 1 support with Dedicated Customer Support Rep ANACONDA ENTERPRISE CONTACT USCONTACT US ANACONDA PRO Priority 1 support DOWNLOAD ANACONDA Community Support FREE FOREVER Open Source Modern Analytics Platform Powered by Python Anaconda with Support & Indemnification Priority 1 support ANACONDA WORKGROUP CONTACT US Anaconda with High Performance and Team Collaboration Anaconda with Scalable High Performance and Team Collaboration per year + $1,000 per year for additional users $10,000 Starting at + $3,000 per year for additional users per year $30,000 Starting at + $6,000 per year for additional users per year $60,000 Starting at Anaconda Subscriptions
  40. 40. Contact Information and Additional Details • Contact sales@continuum.io for more information about
 Anaconda subscriptions, consulting, or training • View documentation and examples at bokeh.pydata.org • View demo notebooks on Anaconda Cloud notebooks.anaconda.org/pwang/
  41. 41. Thank you Email: sales@continuum.io Twitter: @ContinuumIO Peter Wang Twitter: @pwang Bokeh Twitter: @bokehplots

×