SlideShare a Scribd company logo
1 of 26
Download to read offline
© 2016 Continuum Analytics - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & Proprietary
Interactive Visual Statistics
on Massive Datasets
Peter Wang
CTO, Co-founder Continuum Analytics
@pwang
© 2016 Continuum Analytics - Confidential & Proprietary 2
• Introductions
• Company Overview
• Goals of Analytics and IT teams
• Why Python for Data Science
• Anaconda - Making Python Better for Data Science
• Package Management
• Cluster Environment Management
• Notebook Computing
• Demonstrations
• Q&A / Next steps
AGENDA
© 2016 Continuum Analytics - Confidential & Proprietary
© 2016 Continuum Analytics - Confidential & Proprietary
What’s the Problem?
© 2016 Continuum Analytics - Confidential & Proprietary
Big data magnifies small problems
4
• Of course, big data presents storage and computation problems
• More importantly, standard plotting tools have problems that are
magnified by big data:
• Overdrawing/Overplotting
• Saturation
• Undersaturation
• Binning issues
• We’ll first explain these problems, and then present a new technique
called datashading to address them head-on.
© 2016 Continuum Analytics - Confidential & Proprietary
Overdrawing
5
• For a scatterplot, the order in
which points are drawn is very
important
• The same distribution can look
entirely different depending on
plotting order
• Last data plotted overplots
© 2016 Continuum Analytics - Confidential & Proprietary
Overdrawing
6
• Underlying issue is just
occlusion
• Same problem happens with
one category, but less
obvious
• Can prevent occlusion using
transparency
© 2016 Continuum Analytics - Confidential & Proprietary
Saturation
7
• E.g. for alpha = 0.1, up to 10 points can
overlap before saturating the available
brightness
• Now the order of plotting matters less
• After 10 points, first-plotted data still lost
• For one category, 10, 20, or 2000 points
overlapping will look identical
© 2016 Continuum Analytics - Confidential & Proprietary
Saturation
8
• Same alpha value, more points:
• Now is highly misleading
• alpha value depends on size, overlap of
dataset
• Difficult-to-set parameter, hard to know
when data is misrepresented
© 2016 Continuum Analytics - Confidential & Proprietary
Saturation
9
• Can try to reduce point size to reduce
overplotting and saturation
• Now points are hard to see, with no
guarantee of avoiding problems
• Another difficult-to-set parameter
• For really big data, scatterplots start to
become very inefficient, because there
are many datapoints per pixel — may
as well be binning by pixel
© 2016 Continuum Analytics - Confidential & Proprietary
Binning issues
10
• Can use heatmap instead
of scatter
• Avoids saturation by auto-
ranging on bins
• Result independent of data
size
• Here two merged normal
distributions look very
different at different binning
• Another difficult-to-set
parameter
© 2016 Continuum Analytics - Confidential & Proprietary
Plotting big data
11
• When exploring really big data, the visualization is all you have — there’s
no way to look at each of the individual data points
• Common plotting problems can lead to completely incorrect conclusions
based on misleading visualizations
• Slow processing makes trial and error approach ineffective
When data is large, you don’t know when the viz is lying.
© 2016 Continuum Analytics - Confidential & Proprietary
© 2016 Continuum Analytics - Confidential & Proprietary
Datashading
© 2016 Continuum Analytics - Confidential & Proprietary
Datashading
13
• Flexible, configurable pipeline for automatic plotting
• Provides flexible plugins for viz stages, like in graphics shaders
• Completely prevents overplotting, saturation, and undersaturation
• Mitigates binning issues by providing fully interactive exploration in web
browsers, even of very large datasets on ordinary machines
• Statistical transformations of data are a first-class aspect of the
visualization
• Allows rapid iteration of visual styles & configs, interactive selections and
filtering, to support data exploration
© 2016 Continuum Analytics - Confidential & Proprietary
Datashading Pipeline: Projection
14
Data
Project /
Synthesize
Scene
• Stage 1: select variables (columns) to project onto the screen
• Data often filtered at this stage
© 2016 Continuum Analytics - Confidential & Proprietary
Datashading Pipeline: Aggregation
15
Data
Project /
Synthesize
Scene Aggregates
Sample /
Raster
• Stage 2: Aggregate data into a fixed set of bins
• Each bin yields one or more scalars (total count, mean, stddev, etc.)
© 2016 Continuum Analytics - Confidential & Proprietary
Datashading Pipeline: Transfer
16
Data
Project /
Synthesize
Scene Aggregates
Sample /
Raster Transfer
Image
• Stage 3: Transform data using one or more transfer functions, culminating in a function
that yields a visible image
• Each stage can be replaced and configured separately
© 2016 Continuum Analytics - Confidential & Proprietary
© 2016 Continuum Analytics - Confidential & Proprietary
Demos
© 2016 Continuum Analytics - Confidential & Proprietary
© 2016 Continuum Analytics - Confidential & Proprietary
New Developments
© 2016 Continuum Analytics - Confidential & Proprietary
Flexible Statistics
19
Normalized Vegetation
Difference Index
© 2016 Continuum Analytics - Confidential & Proprietary
Flexible Statistics
20
Slope & Aspect Ratio
from pure Elevation
© 2016 Continuum Analytics - Confidential & Proprietary
© 2016 Continuum Analytics - Confidential & Proprietary
Anaconda
© 2016 Continuum Analytics - Confidential & Proprietary 22
• Simplify setup for non-engineers

• Enable easy development on and
deployment to multiple platforms.

• Enable data scientists to experiment
and iterate even more rapidly

• Eliminate the pains associated with
package and dependency management
Why Did We Create Anaconda?
To Enhance Python and Enable Data Scientist to Quickly Engage with Their Data
© 2016 Continuum Analytics - Confidential & Proprietary 23
Anaconda
Modern, Open-Source Analytics Platform
powered by Python
Quickly Engage w/ Your Data
• 500+ Popular Python Packages
• Optimized & Compiled
• Free for Everyone
• Extensible via Conda Package Manager
• Sandbox Packages & Libraries
• Cross-Platform – Windows, Linux, Mac
• Not just Python - over 230 R packages
• Foundation of our Enterprise Products
© 2016 Continuum Analytics - Confidential & Proprietary 24
On-premises package repository and sharing platform
• Governance for your analytics environment - maintain
control of the packages used by your analysts

• Easily replicate and share analysts’ environments

• Centrally store proprietary libraries and manage versioning
Cluster environment management
• Manages Python, R, Java, Scala packages

across the cluster

• Easily replicate analysts’ environments for different jobs/
users/groups

• Strong support for Hadoop & Spark
Anaconda Enterprise
© 2016 Continuum Analytics - Confidential & Proprietary 25
Anaconda Enterprise
Scalable Computing and Collaboration
• Multi-user notebook deployments
• Scalable notebook deployment model
• Project-based management
• Notebook versioning and locking
• Extended support for Hadoop Stack
(Storm, Spark Streaming, Kafka)
• Single sign-on support(PKI, Kerberos etc.)
• Burst Compute support
© 2016 Continuum Analytics - Confidential & Proprietary 26
Consulting
Customers include:
• JPL
• DARPA
• Sandia National Labs
• AMD
• Bank of America
• Bloomberg
We Will Help Design, Architect, and Build the Right Analytics For You
Leverage our Open-Source Projects

More Related Content

What's hot

Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...DataStax
 
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseBuilding the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseFormant
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
 
A brief history of data warehousing
A brief history of data warehousingA brief history of data warehousing
A brief history of data warehousingRob Winters
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchCloudera, Inc.
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
Modernizing Data Management Through Metadata
Modernizing Data Management Through MetadataModernizing Data Management Through Metadata
Modernizing Data Management Through MetadataMANTA
 
Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Mark Rittman
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesRob Winters
 
How to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeHow to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeAtScale
 
Altis AWS Snowflake Practice
Altis AWS Snowflake PracticeAltis AWS Snowflake Practice
Altis AWS Snowflake PracticeSamanthaSwain7
 
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Data Con LA
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldCloudera, Inc.
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionDataStax
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightCloudera, Inc.
 
Introduction to Cloud Applications
Introduction to Cloud ApplicationsIntroduction to Cloud Applications
Introduction to Cloud ApplicationsDataStax
 
What is Big Data Discovery, and how it complements traditional business anal...
What is Big Data Discovery, and how it complements  traditional business anal...What is Big Data Discovery, and how it complements  traditional business anal...
What is Big Data Discovery, and how it complements traditional business anal...Mark Rittman
 

What's hot (20)

Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...
 
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseBuilding the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and Tableau
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
A brief history of data warehousing
A brief history of data warehousingA brief history of data warehousing
A brief history of data warehousing
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Modernizing Data Management Through Metadata
Modernizing Data Management Through MetadataModernizing Data Management Through Metadata
Modernizing Data Management Through Metadata
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil Games
 
How to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeHow to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on Snowflake
 
Altis AWS Snowflake Practice
Altis AWS Snowflake PracticeAltis AWS Snowflake Practice
Altis AWS Snowflake Practice
 
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
Introduction to Cloud Applications
Introduction to Cloud ApplicationsIntroduction to Cloud Applications
Introduction to Cloud Applications
 
What is Big Data Discovery, and how it complements traditional business anal...
What is Big Data Discovery, and how it complements  traditional business anal...What is Big Data Discovery, and how it complements  traditional business anal...
What is Big Data Discovery, and how it complements traditional business anal...
 

Viewers also liked

PLOTCON NYC: Building Products Out of Data
PLOTCON NYC:  Building Products Out of DataPLOTCON NYC:  Building Products Out of Data
PLOTCON NYC: Building Products Out of DataPlotly
 
PLOTCON NYC: The Future of Business Intelligence: Data Visualization
PLOTCON NYC:  The Future of Business Intelligence: Data VisualizationPLOTCON NYC:  The Future of Business Intelligence: Data Visualization
PLOTCON NYC: The Future of Business Intelligence: Data VisualizationPlotly
 
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of WranglingPLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of WranglingPlotly
 
PLOTCON NYC: Custom Colormaps for Your Field
PLOTCON NYC: Custom Colormaps for Your FieldPLOTCON NYC: Custom Colormaps for Your Field
PLOTCON NYC: Custom Colormaps for Your FieldPlotly
 
PLOTCON NYC: Enterprise Dataviz' Unicorn Problem
PLOTCON NYC: Enterprise Dataviz' Unicorn ProblemPLOTCON NYC: Enterprise Dataviz' Unicorn Problem
PLOTCON NYC: Enterprise Dataviz' Unicorn ProblemPlotly
 
PLOTCON NYC: New Open Viz in R
PLOTCON NYC: New Open Viz in RPLOTCON NYC: New Open Viz in R
PLOTCON NYC: New Open Viz in RPlotly
 
PLOTCON NYC: Domain Specific Visualization
PLOTCON NYC: Domain Specific VisualizationPLOTCON NYC: Domain Specific Visualization
PLOTCON NYC: Domain Specific VisualizationPlotly
 
PLOTCON NYC: Get Your Point Across: The Art of Choosing the Right Visualizati...
PLOTCON NYC: Get Your Point Across: The Art of Choosing the Right Visualizati...PLOTCON NYC: Get Your Point Across: The Art of Choosing the Right Visualizati...
PLOTCON NYC: Get Your Point Across: The Art of Choosing the Right Visualizati...Plotly
 
PLOTCON NYC: New Data Viz in Data Journalism
PLOTCON NYC: New Data Viz in Data JournalismPLOTCON NYC: New Data Viz in Data Journalism
PLOTCON NYC: New Data Viz in Data JournalismPlotly
 
PLOTCON NYC: Data Science in the Enterprise From Concept to Execution
PLOTCON NYC: Data Science in the Enterprise From Concept to ExecutionPLOTCON NYC: Data Science in the Enterprise From Concept to Execution
PLOTCON NYC: Data Science in the Enterprise From Concept to ExecutionPlotly
 
PLOTCON NYC: Building a Flexible Analytics Stack
PLOTCON NYC: Building a Flexible Analytics StackPLOTCON NYC: Building a Flexible Analytics Stack
PLOTCON NYC: Building a Flexible Analytics StackPlotly
 
PLOTCON NYC: Mapping Networked Attention: What We Learn from Social Data
PLOTCON NYC: Mapping Networked Attention: What We Learn from Social DataPLOTCON NYC: Mapping Networked Attention: What We Learn from Social Data
PLOTCON NYC: Mapping Networked Attention: What We Learn from Social DataPlotly
 
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...Plotly
 
PLOTCON NYC: PlotlyJS.jl: Interactive plotting in Julia
PLOTCON NYC: PlotlyJS.jl: Interactive plotting in JuliaPLOTCON NYC: PlotlyJS.jl: Interactive plotting in Julia
PLOTCON NYC: PlotlyJS.jl: Interactive plotting in JuliaPlotly
 
PLOTCON NYC: Text is data! Analysis and Visualization Methods
PLOTCON NYC: Text is data! Analysis and Visualization MethodsPLOTCON NYC: Text is data! Analysis and Visualization Methods
PLOTCON NYC: Text is data! Analysis and Visualization MethodsPlotly
 
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016Tanya Cashorali
 
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackWhat’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackTuri, Inc.
 

Viewers also liked (17)

PLOTCON NYC: Building Products Out of Data
PLOTCON NYC:  Building Products Out of DataPLOTCON NYC:  Building Products Out of Data
PLOTCON NYC: Building Products Out of Data
 
PLOTCON NYC: The Future of Business Intelligence: Data Visualization
PLOTCON NYC:  The Future of Business Intelligence: Data VisualizationPLOTCON NYC:  The Future of Business Intelligence: Data Visualization
PLOTCON NYC: The Future of Business Intelligence: Data Visualization
 
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of WranglingPLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
 
PLOTCON NYC: Custom Colormaps for Your Field
PLOTCON NYC: Custom Colormaps for Your FieldPLOTCON NYC: Custom Colormaps for Your Field
PLOTCON NYC: Custom Colormaps for Your Field
 
PLOTCON NYC: Enterprise Dataviz' Unicorn Problem
PLOTCON NYC: Enterprise Dataviz' Unicorn ProblemPLOTCON NYC: Enterprise Dataviz' Unicorn Problem
PLOTCON NYC: Enterprise Dataviz' Unicorn Problem
 
PLOTCON NYC: New Open Viz in R
PLOTCON NYC: New Open Viz in RPLOTCON NYC: New Open Viz in R
PLOTCON NYC: New Open Viz in R
 
PLOTCON NYC: Domain Specific Visualization
PLOTCON NYC: Domain Specific VisualizationPLOTCON NYC: Domain Specific Visualization
PLOTCON NYC: Domain Specific Visualization
 
PLOTCON NYC: Get Your Point Across: The Art of Choosing the Right Visualizati...
PLOTCON NYC: Get Your Point Across: The Art of Choosing the Right Visualizati...PLOTCON NYC: Get Your Point Across: The Art of Choosing the Right Visualizati...
PLOTCON NYC: Get Your Point Across: The Art of Choosing the Right Visualizati...
 
PLOTCON NYC: New Data Viz in Data Journalism
PLOTCON NYC: New Data Viz in Data JournalismPLOTCON NYC: New Data Viz in Data Journalism
PLOTCON NYC: New Data Viz in Data Journalism
 
PLOTCON NYC: Data Science in the Enterprise From Concept to Execution
PLOTCON NYC: Data Science in the Enterprise From Concept to ExecutionPLOTCON NYC: Data Science in the Enterprise From Concept to Execution
PLOTCON NYC: Data Science in the Enterprise From Concept to Execution
 
PLOTCON NYC: Building a Flexible Analytics Stack
PLOTCON NYC: Building a Flexible Analytics StackPLOTCON NYC: Building a Flexible Analytics Stack
PLOTCON NYC: Building a Flexible Analytics Stack
 
PLOTCON NYC: Mapping Networked Attention: What We Learn from Social Data
PLOTCON NYC: Mapping Networked Attention: What We Learn from Social DataPLOTCON NYC: Mapping Networked Attention: What We Learn from Social Data
PLOTCON NYC: Mapping Networked Attention: What We Learn from Social Data
 
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
 
PLOTCON NYC: PlotlyJS.jl: Interactive plotting in Julia
PLOTCON NYC: PlotlyJS.jl: Interactive plotting in JuliaPLOTCON NYC: PlotlyJS.jl: Interactive plotting in Julia
PLOTCON NYC: PlotlyJS.jl: Interactive plotting in Julia
 
PLOTCON NYC: Text is data! Analysis and Visualization Methods
PLOTCON NYC: Text is data! Analysis and Visualization MethodsPLOTCON NYC: Text is data! Analysis and Visualization Methods
PLOTCON NYC: Text is data! Analysis and Visualization Methods
 
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
 
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackWhat’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics Stack
 

Similar to PLOTCON NYC: Interactive Visual Statistics on Massive Datasets

PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona KeynoteTravis Oliphant
 
Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend
 
Semantic E-Commerce - Use Cases in Enterprise Web Applications
Semantic E-Commerce - Use Cases in Enterprise Web ApplicationsSemantic E-Commerce - Use Cases in Enterprise Web Applications
Semantic E-Commerce - Use Cases in Enterprise Web ApplicationsLinked Enterprise Date Services
 
Christian Opitz | Semantic E-Commerce - Use Cases in Enterprise Web Applications
Christian Opitz | Semantic E-Commerce - Use Cases in Enterprise Web ApplicationsChristian Opitz | Semantic E-Commerce - Use Cases in Enterprise Web Applications
Christian Opitz | Semantic E-Commerce - Use Cases in Enterprise Web Applicationssemanticsconference
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsDenodo
 
5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce Analytics5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce AnalyticsInformatica Cloud
 
Cognos Analytics Release 6: March 2017 Enhancements
Cognos Analytics Release 6: March 2017 EnhancementsCognos Analytics Release 6: March 2017 Enhancements
Cognos Analytics Release 6: March 2017 EnhancementsSenturus
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data opsLars Albertsson
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?OVHcloud
 
Alluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio, Inc.
 
Redefining the Role of IT in a Self-Help Data Integration Environment
Redefining the Role of IT in a Self-Help Data Integration EnvironmentRedefining the Role of IT in a Self-Help Data Integration Environment
Redefining the Role of IT in a Self-Help Data Integration EnvironmentUNIFI Software
 
OpenSource and the Cloud ApacheCon.pptx
OpenSource and the Cloud  ApacheCon.pptxOpenSource and the Cloud  ApacheCon.pptx
OpenSource and the Cloud ApacheCon.pptxlohitvijayarenu
 
Ken Czekaj & Robert Wright - Leveraging APM NPM Solutions to Compliment Cyber...
Ken Czekaj & Robert Wright - Leveraging APM NPM Solutions to Compliment Cyber...Ken Czekaj & Robert Wright - Leveraging APM NPM Solutions to Compliment Cyber...
Ken Czekaj & Robert Wright - Leveraging APM NPM Solutions to Compliment Cyber...centralohioissa
 
Presentación Paco Bermejo - La Noche del Sector Financiero
Presentación Paco Bermejo - La Noche del Sector FinancieroPresentación Paco Bermejo - La Noche del Sector Financiero
Presentación Paco Bermejo - La Noche del Sector FinancieroJorge Puebla Fernández
 
TidalScale Overview
TidalScale OverviewTidalScale Overview
TidalScale OverviewPete Jarvis
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data miningmaxonlinetr
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersRevolution Analytics
 

Similar to PLOTCON NYC: Interactive Visual Statistics on Massive Datasets (20)

PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
 
DataVirtulization
DataVirtulizationDataVirtulization
DataVirtulization
 
Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?
 
Semantic E-Commerce - Use Cases in Enterprise Web Applications
Semantic E-Commerce - Use Cases in Enterprise Web ApplicationsSemantic E-Commerce - Use Cases in Enterprise Web Applications
Semantic E-Commerce - Use Cases in Enterprise Web Applications
 
Christian Opitz | Semantic E-Commerce - Use Cases in Enterprise Web Applications
Christian Opitz | Semantic E-Commerce - Use Cases in Enterprise Web ApplicationsChristian Opitz | Semantic E-Commerce - Use Cases in Enterprise Web Applications
Christian Opitz | Semantic E-Commerce - Use Cases in Enterprise Web Applications
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce Analytics5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce Analytics
 
Cognos Analytics Release 6: March 2017 Enhancements
Cognos Analytics Release 6: March 2017 EnhancementsCognos Analytics Release 6: March 2017 Enhancements
Cognos Analytics Release 6: March 2017 Enhancements
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
Alluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio Use Cases and Future Directions
Alluxio Use Cases and Future Directions
 
Redefining the Role of IT in a Self-Help Data Integration Environment
Redefining the Role of IT in a Self-Help Data Integration EnvironmentRedefining the Role of IT in a Self-Help Data Integration Environment
Redefining the Role of IT in a Self-Help Data Integration Environment
 
OpenSource and the Cloud ApacheCon.pptx
OpenSource and the Cloud  ApacheCon.pptxOpenSource and the Cloud  ApacheCon.pptx
OpenSource and the Cloud ApacheCon.pptx
 
Ken Czekaj & Robert Wright - Leveraging APM NPM Solutions to Compliment Cyber...
Ken Czekaj & Robert Wright - Leveraging APM NPM Solutions to Compliment Cyber...Ken Czekaj & Robert Wright - Leveraging APM NPM Solutions to Compliment Cyber...
Ken Czekaj & Robert Wright - Leveraging APM NPM Solutions to Compliment Cyber...
 
Presentación Paco Bermejo - La Noche del Sector Financiero
Presentación Paco Bermejo - La Noche del Sector FinancieroPresentación Paco Bermejo - La Noche del Sector Financiero
Presentación Paco Bermejo - La Noche del Sector Financiero
 
TidalScale Overview
TidalScale OverviewTidalScale Overview
TidalScale Overview
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data mining
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
 

Recently uploaded

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 

Recently uploaded (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 

PLOTCON NYC: Interactive Visual Statistics on Massive Datasets

  • 1. © 2016 Continuum Analytics - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & Proprietary Interactive Visual Statistics on Massive Datasets Peter Wang CTO, Co-founder Continuum Analytics @pwang
  • 2. © 2016 Continuum Analytics - Confidential & Proprietary 2 • Introductions • Company Overview • Goals of Analytics and IT teams • Why Python for Data Science • Anaconda - Making Python Better for Data Science • Package Management • Cluster Environment Management • Notebook Computing • Demonstrations • Q&A / Next steps AGENDA
  • 3. © 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary What’s the Problem?
  • 4. © 2016 Continuum Analytics - Confidential & Proprietary Big data magnifies small problems 4 • Of course, big data presents storage and computation problems • More importantly, standard plotting tools have problems that are magnified by big data: • Overdrawing/Overplotting • Saturation • Undersaturation • Binning issues • We’ll first explain these problems, and then present a new technique called datashading to address them head-on.
  • 5. © 2016 Continuum Analytics - Confidential & Proprietary Overdrawing 5 • For a scatterplot, the order in which points are drawn is very important • The same distribution can look entirely different depending on plotting order • Last data plotted overplots
  • 6. © 2016 Continuum Analytics - Confidential & Proprietary Overdrawing 6 • Underlying issue is just occlusion • Same problem happens with one category, but less obvious • Can prevent occlusion using transparency
  • 7. © 2016 Continuum Analytics - Confidential & Proprietary Saturation 7 • E.g. for alpha = 0.1, up to 10 points can overlap before saturating the available brightness • Now the order of plotting matters less • After 10 points, first-plotted data still lost • For one category, 10, 20, or 2000 points overlapping will look identical
  • 8. © 2016 Continuum Analytics - Confidential & Proprietary Saturation 8 • Same alpha value, more points: • Now is highly misleading • alpha value depends on size, overlap of dataset • Difficult-to-set parameter, hard to know when data is misrepresented
  • 9. © 2016 Continuum Analytics - Confidential & Proprietary Saturation 9 • Can try to reduce point size to reduce overplotting and saturation • Now points are hard to see, with no guarantee of avoiding problems • Another difficult-to-set parameter • For really big data, scatterplots start to become very inefficient, because there are many datapoints per pixel — may as well be binning by pixel
  • 10. © 2016 Continuum Analytics - Confidential & Proprietary Binning issues 10 • Can use heatmap instead of scatter • Avoids saturation by auto- ranging on bins • Result independent of data size • Here two merged normal distributions look very different at different binning • Another difficult-to-set parameter
  • 11. © 2016 Continuum Analytics - Confidential & Proprietary Plotting big data 11 • When exploring really big data, the visualization is all you have — there’s no way to look at each of the individual data points • Common plotting problems can lead to completely incorrect conclusions based on misleading visualizations • Slow processing makes trial and error approach ineffective When data is large, you don’t know when the viz is lying.
  • 12. © 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary Datashading
  • 13. © 2016 Continuum Analytics - Confidential & Proprietary Datashading 13 • Flexible, configurable pipeline for automatic plotting • Provides flexible plugins for viz stages, like in graphics shaders • Completely prevents overplotting, saturation, and undersaturation • Mitigates binning issues by providing fully interactive exploration in web browsers, even of very large datasets on ordinary machines • Statistical transformations of data are a first-class aspect of the visualization • Allows rapid iteration of visual styles & configs, interactive selections and filtering, to support data exploration
  • 14. © 2016 Continuum Analytics - Confidential & Proprietary Datashading Pipeline: Projection 14 Data Project / Synthesize Scene • Stage 1: select variables (columns) to project onto the screen • Data often filtered at this stage
  • 15. © 2016 Continuum Analytics - Confidential & Proprietary Datashading Pipeline: Aggregation 15 Data Project / Synthesize Scene Aggregates Sample / Raster • Stage 2: Aggregate data into a fixed set of bins • Each bin yields one or more scalars (total count, mean, stddev, etc.)
  • 16. © 2016 Continuum Analytics - Confidential & Proprietary Datashading Pipeline: Transfer 16 Data Project / Synthesize Scene Aggregates Sample / Raster Transfer Image • Stage 3: Transform data using one or more transfer functions, culminating in a function that yields a visible image • Each stage can be replaced and configured separately
  • 17. © 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary Demos
  • 18. © 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary New Developments
  • 19. © 2016 Continuum Analytics - Confidential & Proprietary Flexible Statistics 19 Normalized Vegetation Difference Index
  • 20. © 2016 Continuum Analytics - Confidential & Proprietary Flexible Statistics 20 Slope & Aspect Ratio from pure Elevation
  • 21. © 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary Anaconda
  • 22. © 2016 Continuum Analytics - Confidential & Proprietary 22 • Simplify setup for non-engineers
 • Enable easy development on and deployment to multiple platforms.
 • Enable data scientists to experiment and iterate even more rapidly
 • Eliminate the pains associated with package and dependency management Why Did We Create Anaconda? To Enhance Python and Enable Data Scientist to Quickly Engage with Their Data
  • 23. © 2016 Continuum Analytics - Confidential & Proprietary 23 Anaconda Modern, Open-Source Analytics Platform powered by Python Quickly Engage w/ Your Data • 500+ Popular Python Packages • Optimized & Compiled • Free for Everyone • Extensible via Conda Package Manager • Sandbox Packages & Libraries • Cross-Platform – Windows, Linux, Mac • Not just Python - over 230 R packages • Foundation of our Enterprise Products
  • 24. © 2016 Continuum Analytics - Confidential & Proprietary 24 On-premises package repository and sharing platform • Governance for your analytics environment - maintain control of the packages used by your analysts
 • Easily replicate and share analysts’ environments
 • Centrally store proprietary libraries and manage versioning Cluster environment management • Manages Python, R, Java, Scala packages
 across the cluster
 • Easily replicate analysts’ environments for different jobs/ users/groups
 • Strong support for Hadoop & Spark Anaconda Enterprise
  • 25. © 2016 Continuum Analytics - Confidential & Proprietary 25 Anaconda Enterprise Scalable Computing and Collaboration • Multi-user notebook deployments • Scalable notebook deployment model • Project-based management • Notebook versioning and locking • Extended support for Hadoop Stack (Storm, Spark Streaming, Kafka) • Single sign-on support(PKI, Kerberos etc.) • Burst Compute support
  • 26. © 2016 Continuum Analytics - Confidential & Proprietary 26 Consulting Customers include: • JPL • DARPA • Sandia National Labs • AMD • Bank of America • Bloomberg We Will Help Design, Architect, and Build the Right Analytics For You Leverage our Open-Source Projects