SlideShare a Scribd company logo
@DTAIEB55
Taking Jupyter Notebooks and
Apache Spark to the next level with
PixieDust
David Taieb
Distinguished Engineer
IBM Watson Data Platform, Developer Advocacy
@DTAIEB55
@DTAIEB55
•  More companies making bet-the-business data driven decisions
•  Good news: they are drowning in Data
•  Bad news: they are drowning in Data
•  Solving the Data problems of tomorrow cannot be done by data
scientists alone.
•  Developers are getting more involved with Data Science, moving
from stovepipe applications to “data pipelines” that integrate data
and analytics.
WHY ARE YOU HERE?
@DTAIEB55
Disclaimer: All characters and events depicted in this story are entirely
fictitious. Any similarity to actual use cases, events or persons is actually
intentional.
Let’s start with a story… we all know too well.
How do we blur the lines between
developers and data scientists?
@DTAIEB55
•  Hold a master degree in computer science
•  10 year experience, 6 years with the company
•  Languages of choice: Java, Node.js, HTML5/CSS3
•  Data: No SQL (Cloudant, Mongo), relational
•  No major experience with Big Data
T H E D E V E L O P E R
“The best line of code is the one I didn't have to write!”
MEET BEN
@DTAIEB55
•  Hold a PHD in data science
•  5 year experience, 2 years with the company
•  Experienced in Python and R
•  Expert in Machine Learning and Data visualization
•  Software engineering is not her thing
T H E D A T A S C I E N T I S T
“In God we trust. All others bring data.”
— W. Edwards Deming
MEET NATASHA
@DTAIEB55
SURPRISE MEETING
With the VP of Development
“We have an urgent need for our marketing department
to build an application that can provide real-time sentiment
analysis on Twitter data.”
https://unsplash.com/search/meeting?photo=3fPXt37X6UQ
@DTAIEB55
•  You only have 6 weeks to build the application
•  Target consumer is the business-focused user
•  Must be easy to use even for non technical people
•  It must scale out of the box
•  I want you to look at Apache Spark
KEY CONSTRAINTS
@DTAIEB55
— NATASHA
“What exactly is Apache Spark?”
SOME LEARNING TO DO…
@DTAIEB55
Great Question Natasha!
B e s t w a y t o a n s w e r i t i s t o a r r a n g e a t i c k e t t o t h e
S p a r k S u m m i t f o r y o u t o f i n d o u t
@DTAIEB55
Batch Job
(spark-submit)
Interactive
NotebookSpark
Application
(driver)
Master
(cluster manager)
Spark Cluster
Worker Node Worker Node
...
Notebook Server
Browser
Kernel
Master
(cluster manager)
Spark Cluster
Worker Node Worker Node
...
RDD Partitioning
Task packaging and
dispatching
Worker node scheduling
CONSUMING SPARK
@DTAIEB55
— NATASHA
“What exactly is a Notebook?”
— BEN
CAN WE COLLABORATE USING NOTEBOOKS?
@DTAIEB55
http://ia.media-imdb.com/images/M/MV5BMTk3OTM5Njg5M15BMl5BanBnXkFtZTYwMzA0ODI3._V1_SX640_SY720_.jpg
RYAN
GOSLING
MOVIE?
@DTAIEB55
WHAT IS A NOTEBOOK?
•  Web based UI for
running Apache
Spark console
commands
•  Easy, no install spark
accelerator
•  Best way to start
working with spark
•  Multiple flavors
•  Jupyter
•  Zeppelin
•  Local or cloud hosted
•  IBM Data Science
Experience
•  Databricks
Text
Annotations
Code
Data
Visualizations
Widgets
Output
@DTAIEB55
What is Jupyter?
"Open	source,	interac0ve	data	science	and	scien0fic	
compu0ng"	
– Formerly	IPython	
– Large,	open,	growing	community	and	ecosystem	
Very	popular	
– “~2	million	users	for	IPython”	[1]	
– $6m	in	funding	in	2015	[3]	
– 200	contributors	to	notebook	subproject	alone	[4]	
– 275,000	public	notebooks	on	GitHub	[2]	
	
Browser
Kernel
Code Output
https://www.bluetrack.com/uploads/items_images/kernel-of-corn-stress-balls1_thumb.jpg?r=1
@DTAIEB55
BIG DATA ANALYSIS
code
results
Kernel with Spark
support
Services:
Congitive, …
Libraries:
Statistics, Math,
Machine Learning,
Plotting,
Data
(flat files, relational database,
NoSQL database, …)
Worker
Worker
...
Worker
@DTAIEB55
— BEN
“But they seem complicated for
developers like me”
NOTEBOOKS ARE POWERFUL TOOLS FOR DATA SCIENTISTS
@DTAIEB55
•  Visualize data (e.g., Table, Charts, Map, etc)
•  Data Management with PixieApps
•  Download/export data (e.g., File, Cloudant, etc.)
•  Use Scala directly in a Python notebook
•  Install Spark packages into Python notebook
•  Spark job progress monitor
•  Extensible
Open Source Python helper library for Jupyter Notebooks
ENTER PIXIEDUST
https://github.com/ibm-cds-labs/pixiedust
@DTAIEB55
DEMO: PIXIEDUST DATA
VISUALIZATION
Easy Visualizations
https://github.com/ibm-cds-labs/pixiedust/blob/master/notebook/PixieDust%201%20-%20Easy%20Visualizations.ipynb
Working with External Data
https://github.com/ibm-cds-labs/pixiedust/blob/master/notebook/PixieDust%202%20-%20Working%20with%20External%20Data.ipynb
@DTAIEB55
— BEN
“But I am really more comfortable with Scala”
I AM OK TO USE PYTHON
@DTAIEB55
SCALA NOTEBOOKS
PixieDust also
works with Scala
Notebooks
Same PixieDust Scala
APIs as in Python
@DTAIEB55
— NATASHA
“Expressing everything in code is
nice but LOB users will not be
able to linearly run large number
of cells”
WHAT ABOUT THE LINE OF BUSINESS USER?
@DTAIEB55
Enter PixieApps
•  PixieApps are Python classes used to write UI for your analytics that runs directly in a Jupyter
Notebook
•  Easy to build: mostly HTML and CSS with some custom attributes (micro-format style)
•  Leverage PixieDust Display visualization for charting
•  With PixieApps you can:
•  Create different html views with routes to invoke them
•  Invoke Python Scripts from user interactions
•  Run in the notebook cell output or in a Dialog
•  and much more…
•  Use cases:
•  Dashboards
•  Data Browsers
•  Data Pipeline Management
@DTAIEB55
Demo: PeaceTech GroundTruth Global Dashboard
@DTAIEB55
Demo: PeaceTech GroundTruth Global Dashboard
@DTAIEB55
Demo: Data Browser for Cloudant/CouchDB
@DTAIEB55
— BEN
“Do I need to learn yet another framework?”
WHAT DOES IT TAKE TO BUILD A PIXIEAPP?
@DTAIEB55
PIXIEAPP HELLO WORLD
Import app package to start things off
Simple annotation to tell PixieDust it’s an app
Define the default route (no args).
Method will return the view’s html fragment
Html fragment for the view.
Allows Jinja2 template macros
Define a new route that triggers when option
clicked is set to true
set option clicked to true when button is
pushed, so that correct route is loaded next
Import app package to start things off
@DTAIEB55
PIXIEAPP HELLO WORLD WITH DATA
Placeholder div for displaying data
Display the output in the specified target
Entity binding: use the df passed by user
Allows binding of any entity created by the app
Specify Display options for visualization
Pass data to the app
@DTAIEB55
OK, I’M SOLD…
L E T ’ S A G R E E O N T H E A R C H I T E C T U R E
@DTAIEB55
•  I’ll work on data acquisition from Twitter
and enrichment with sentiment analysis
scores using Spark Streaming
•  I know Java very well, but I don’t have
time to learn Python.
•  However, I am willing to learn Scala if
that helps improve my productivity
S T A R T B R A I N S T O R M I N G
•  I’ll perform the data exploration and
analysis
•  I know Python and R, but I am not
familiar enough with Java or Scala
•  I like pandas and numpy. I’m ok to learn
Spark but expect the same level of apis
•  I need to work iteratively with the data
I’ll need to do some data exploration too. I’ll need APIs to access my data.
BEN and NATASHA
@DTAIEB55
•  Implement a Spark Streaming
connector to Twitter
•  Call Watson Tone Analyzer for each
tweets
•  Return a Spark DataFrame with the
tweets enriched with Tone scores
•  Code written in Scala, delivered as a
Jar
D I V I D I N G T H E T A S K S
BEN and NATASHA
•  Works in a Python Notebook
•  Using PixieDust PackageManager,
install the Scala library delivered by
Ben to load the twitter data with Tone
scores
•  Using PixieDust display() api, perform
the data exploration and analysis:
trending hashtags and sentiments
•  Produce visualizations to LOB Users
@DTAIEB55
http://www.ibm.com/watson/developercloud/tone-analyzer.html
WATSON TONE ANALYZER
•  Uses linguistic analysis to detect 3
types of tones
•  Emotion
•  Social Tendencies
•  Language Styles
•  Available as a cloud service on IBM
Bluemix
@DTAIEB55
DEMO
Twitter Sentiment Analysis Dashboard with PixieApp
Sentiment Analysis of Twitter Hashtags with Spark
https://github.com/ibm-cds-labs/pixiedust/blob/master/notebook/Twitter%20Sentiment%20with%20Watson%20and%20Pixiedust.ipynb
https://medium.com/ibm-watson-data-lab/real-time-sentiment-analysis-of-twitter-hashtags-with-spark-7ee6ca5c1585
@DTAIEB55
MEETING WITH THE VP
“SUCCESS!!”
https://unsplash.com/search/meeting?photo=3fPXt37X6UQ
@DTAIEB55
What’s next for PixieDust
•  Support Visualization for Streaming data
–  Start with Structured Streaming and IBM Streams
•  Ability to publish/embed PixieApps into Web Application (Nodejs to
begin with)
•  PixieDust visualization enhancements
–  Custom colors
–  Custom GeoJSON layers for maps
–  Sorting/filtering
–  More renderers: Brunel, ArcGIS, etc.
–  …
•  Ability to run Node.js code to load and visualize data
•  Support for Jupyter Labs and Jupyter Hub
@DTAIEB55
As always…
We look forward for your feedback
and pull requests on GitHub
https://github.com/ibm-cds-labs/pixiedust
@DTAIEB55
CONCLUSION
•  Solving the Data problems of tomorrow cannot be done by
data scientists alone.
•  Notebooks, considered by most to be the domain of data
scientists, can help break down traditional silos and help
team of all types who are working on data problems
Try it for yourself today:
•  IBM Data Science Experience
http://datascience.ibm.com/
•  Locally using PixieDust automated installer
https://ibm-cds-labs.github.io/pixiedust/install.html
@DTAIEB55
RESOURCES
•  https://github.com/ibm-cds-labs/pixiedust
•  https://ibm-cds-labs.github.io/pixiedust
•  https://medium.com/ibm-watson-data-lab/i-am-not-a-data-scientist-efe7ca6ceba2
•  https://spark.apache.org
•  https://www.ibm.com/us-en/marketplace/spark-as-a-service
•  http://datascience.ibm.com
•  https://www.ibm.com/watson/developercloud/tone-analyzer.html
•  https://medium.com/ibm-watson-data-lab/real-time-sentiment-analysis-of-twitter-hashtags-with-
spark-7ee6ca5c1585
•  https://ibm.biz/pixiedustvis
•  https://ibm.biz/pixiedustlab
@DTAIEB55
Questions

More Related Content

What's hot

Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Databricks
 
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Databricks
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache Superset
Carl W. Handlin
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
Jozo Kovac
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Databricks
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
Prasad Wagle
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Databricks
 
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Databricks
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Spark Summit
 
Saving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AISaving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AI
Databricks
 
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
Albert Wong
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Data Con LA
 
Shifting Data Science into High Gear
Shifting Data Science into High GearShifting Data Science into High Gear
Shifting Data Science into High Gear
Spark Summit
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at Twitter
Prasad Wagle
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
Databricks
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science Workflows
Databricks
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Spark Summit
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
Jasjeet Thind
 

What's hot (20)

Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
 
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache Superset
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
 
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
 
Saving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AISaving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AI
 
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
 
Shifting Data Science into High Gear
Shifting Data Science into High GearShifting Data Science into High Gear
Shifting Data Science into High Gear
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at Twitter
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science Workflows
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
 

Similar to Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with David Taieb

Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
Travis Oliphant
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Rehgan Avon
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
Krishna Sankar
 
Eye-catching science: free tools to create data visualizations and infographics
Eye-catching science: free tools to create data visualizations and infographicsEye-catching science: free tools to create data visualizations and infographics
Eye-catching science: free tools to create data visualizations and infographics
Future Earth
 
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
Thiago de Faria
 
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
Codemotion
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
Adam Muise
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
Big Data Spain
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
Paolo Platter
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
Peter Wang
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
DataWorks Summit
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017
Rittman Analytics
 
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
Dataiku
 
Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)
stelligence
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014Tom LaGatta
 
Splunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data ScienceSplunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data Science
Splunk
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 

Similar to Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with David Taieb (20)

Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Eye-catching science: free tools to create data visualizations and infographics
Eye-catching science: free tools to create data visualizations and infographicsEye-catching science: free tools to create data visualizations and infographics
Eye-catching science: free tools to create data visualizations and infographics
 
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
 
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017
 
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
 
Splunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data ScienceSplunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data Science
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 

Recently uploaded (20)

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 

Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with David Taieb

  • 1. @DTAIEB55 Taking Jupyter Notebooks and Apache Spark to the next level with PixieDust David Taieb Distinguished Engineer IBM Watson Data Platform, Developer Advocacy @DTAIEB55
  • 2. @DTAIEB55 •  More companies making bet-the-business data driven decisions •  Good news: they are drowning in Data •  Bad news: they are drowning in Data •  Solving the Data problems of tomorrow cannot be done by data scientists alone. •  Developers are getting more involved with Data Science, moving from stovepipe applications to “data pipelines” that integrate data and analytics. WHY ARE YOU HERE?
  • 3. @DTAIEB55 Disclaimer: All characters and events depicted in this story are entirely fictitious. Any similarity to actual use cases, events or persons is actually intentional. Let’s start with a story… we all know too well. How do we blur the lines between developers and data scientists?
  • 4. @DTAIEB55 •  Hold a master degree in computer science •  10 year experience, 6 years with the company •  Languages of choice: Java, Node.js, HTML5/CSS3 •  Data: No SQL (Cloudant, Mongo), relational •  No major experience with Big Data T H E D E V E L O P E R “The best line of code is the one I didn't have to write!” MEET BEN
  • 5. @DTAIEB55 •  Hold a PHD in data science •  5 year experience, 2 years with the company •  Experienced in Python and R •  Expert in Machine Learning and Data visualization •  Software engineering is not her thing T H E D A T A S C I E N T I S T “In God we trust. All others bring data.” — W. Edwards Deming MEET NATASHA
  • 6. @DTAIEB55 SURPRISE MEETING With the VP of Development “We have an urgent need for our marketing department to build an application that can provide real-time sentiment analysis on Twitter data.” https://unsplash.com/search/meeting?photo=3fPXt37X6UQ
  • 7. @DTAIEB55 •  You only have 6 weeks to build the application •  Target consumer is the business-focused user •  Must be easy to use even for non technical people •  It must scale out of the box •  I want you to look at Apache Spark KEY CONSTRAINTS
  • 8. @DTAIEB55 — NATASHA “What exactly is Apache Spark?” SOME LEARNING TO DO…
  • 9. @DTAIEB55 Great Question Natasha! B e s t w a y t o a n s w e r i t i s t o a r r a n g e a t i c k e t t o t h e S p a r k S u m m i t f o r y o u t o f i n d o u t
  • 10. @DTAIEB55 Batch Job (spark-submit) Interactive NotebookSpark Application (driver) Master (cluster manager) Spark Cluster Worker Node Worker Node ... Notebook Server Browser Kernel Master (cluster manager) Spark Cluster Worker Node Worker Node ... RDD Partitioning Task packaging and dispatching Worker node scheduling CONSUMING SPARK
  • 11. @DTAIEB55 — NATASHA “What exactly is a Notebook?” — BEN CAN WE COLLABORATE USING NOTEBOOKS?
  • 13. @DTAIEB55 WHAT IS A NOTEBOOK? •  Web based UI for running Apache Spark console commands •  Easy, no install spark accelerator •  Best way to start working with spark •  Multiple flavors •  Jupyter •  Zeppelin •  Local or cloud hosted •  IBM Data Science Experience •  Databricks Text Annotations Code Data Visualizations Widgets Output
  • 15. @DTAIEB55 BIG DATA ANALYSIS code results Kernel with Spark support Services: Congitive, … Libraries: Statistics, Math, Machine Learning, Plotting, Data (flat files, relational database, NoSQL database, …) Worker Worker ... Worker
  • 16. @DTAIEB55 — BEN “But they seem complicated for developers like me” NOTEBOOKS ARE POWERFUL TOOLS FOR DATA SCIENTISTS
  • 17. @DTAIEB55 •  Visualize data (e.g., Table, Charts, Map, etc) •  Data Management with PixieApps •  Download/export data (e.g., File, Cloudant, etc.) •  Use Scala directly in a Python notebook •  Install Spark packages into Python notebook •  Spark job progress monitor •  Extensible Open Source Python helper library for Jupyter Notebooks ENTER PIXIEDUST https://github.com/ibm-cds-labs/pixiedust
  • 18. @DTAIEB55 DEMO: PIXIEDUST DATA VISUALIZATION Easy Visualizations https://github.com/ibm-cds-labs/pixiedust/blob/master/notebook/PixieDust%201%20-%20Easy%20Visualizations.ipynb Working with External Data https://github.com/ibm-cds-labs/pixiedust/blob/master/notebook/PixieDust%202%20-%20Working%20with%20External%20Data.ipynb
  • 19. @DTAIEB55 — BEN “But I am really more comfortable with Scala” I AM OK TO USE PYTHON
  • 20. @DTAIEB55 SCALA NOTEBOOKS PixieDust also works with Scala Notebooks Same PixieDust Scala APIs as in Python
  • 21. @DTAIEB55 — NATASHA “Expressing everything in code is nice but LOB users will not be able to linearly run large number of cells” WHAT ABOUT THE LINE OF BUSINESS USER?
  • 22. @DTAIEB55 Enter PixieApps •  PixieApps are Python classes used to write UI for your analytics that runs directly in a Jupyter Notebook •  Easy to build: mostly HTML and CSS with some custom attributes (micro-format style) •  Leverage PixieDust Display visualization for charting •  With PixieApps you can: •  Create different html views with routes to invoke them •  Invoke Python Scripts from user interactions •  Run in the notebook cell output or in a Dialog •  and much more… •  Use cases: •  Dashboards •  Data Browsers •  Data Pipeline Management
  • 25. @DTAIEB55 Demo: Data Browser for Cloudant/CouchDB
  • 26. @DTAIEB55 — BEN “Do I need to learn yet another framework?” WHAT DOES IT TAKE TO BUILD A PIXIEAPP?
  • 27. @DTAIEB55 PIXIEAPP HELLO WORLD Import app package to start things off Simple annotation to tell PixieDust it’s an app Define the default route (no args). Method will return the view’s html fragment Html fragment for the view. Allows Jinja2 template macros Define a new route that triggers when option clicked is set to true set option clicked to true when button is pushed, so that correct route is loaded next Import app package to start things off
  • 28. @DTAIEB55 PIXIEAPP HELLO WORLD WITH DATA Placeholder div for displaying data Display the output in the specified target Entity binding: use the df passed by user Allows binding of any entity created by the app Specify Display options for visualization Pass data to the app
  • 29. @DTAIEB55 OK, I’M SOLD… L E T ’ S A G R E E O N T H E A R C H I T E C T U R E
  • 30. @DTAIEB55 •  I’ll work on data acquisition from Twitter and enrichment with sentiment analysis scores using Spark Streaming •  I know Java very well, but I don’t have time to learn Python. •  However, I am willing to learn Scala if that helps improve my productivity S T A R T B R A I N S T O R M I N G •  I’ll perform the data exploration and analysis •  I know Python and R, but I am not familiar enough with Java or Scala •  I like pandas and numpy. I’m ok to learn Spark but expect the same level of apis •  I need to work iteratively with the data I’ll need to do some data exploration too. I’ll need APIs to access my data. BEN and NATASHA
  • 31. @DTAIEB55 •  Implement a Spark Streaming connector to Twitter •  Call Watson Tone Analyzer for each tweets •  Return a Spark DataFrame with the tweets enriched with Tone scores •  Code written in Scala, delivered as a Jar D I V I D I N G T H E T A S K S BEN and NATASHA •  Works in a Python Notebook •  Using PixieDust PackageManager, install the Scala library delivered by Ben to load the twitter data with Tone scores •  Using PixieDust display() api, perform the data exploration and analysis: trending hashtags and sentiments •  Produce visualizations to LOB Users
  • 32. @DTAIEB55 http://www.ibm.com/watson/developercloud/tone-analyzer.html WATSON TONE ANALYZER •  Uses linguistic analysis to detect 3 types of tones •  Emotion •  Social Tendencies •  Language Styles •  Available as a cloud service on IBM Bluemix
  • 33. @DTAIEB55 DEMO Twitter Sentiment Analysis Dashboard with PixieApp Sentiment Analysis of Twitter Hashtags with Spark https://github.com/ibm-cds-labs/pixiedust/blob/master/notebook/Twitter%20Sentiment%20with%20Watson%20and%20Pixiedust.ipynb https://medium.com/ibm-watson-data-lab/real-time-sentiment-analysis-of-twitter-hashtags-with-spark-7ee6ca5c1585
  • 34. @DTAIEB55 MEETING WITH THE VP “SUCCESS!!” https://unsplash.com/search/meeting?photo=3fPXt37X6UQ
  • 35. @DTAIEB55 What’s next for PixieDust •  Support Visualization for Streaming data –  Start with Structured Streaming and IBM Streams •  Ability to publish/embed PixieApps into Web Application (Nodejs to begin with) •  PixieDust visualization enhancements –  Custom colors –  Custom GeoJSON layers for maps –  Sorting/filtering –  More renderers: Brunel, ArcGIS, etc. –  … •  Ability to run Node.js code to load and visualize data •  Support for Jupyter Labs and Jupyter Hub
  • 36. @DTAIEB55 As always… We look forward for your feedback and pull requests on GitHub https://github.com/ibm-cds-labs/pixiedust
  • 37. @DTAIEB55 CONCLUSION •  Solving the Data problems of tomorrow cannot be done by data scientists alone. •  Notebooks, considered by most to be the domain of data scientists, can help break down traditional silos and help team of all types who are working on data problems Try it for yourself today: •  IBM Data Science Experience http://datascience.ibm.com/ •  Locally using PixieDust automated installer https://ibm-cds-labs.github.io/pixiedust/install.html
  • 38. @DTAIEB55 RESOURCES •  https://github.com/ibm-cds-labs/pixiedust •  https://ibm-cds-labs.github.io/pixiedust •  https://medium.com/ibm-watson-data-lab/i-am-not-a-data-scientist-efe7ca6ceba2 •  https://spark.apache.org •  https://www.ibm.com/us-en/marketplace/spark-as-a-service •  http://datascience.ibm.com •  https://www.ibm.com/watson/developercloud/tone-analyzer.html •  https://medium.com/ibm-watson-data-lab/real-time-sentiment-analysis-of-twitter-hashtags-with- spark-7ee6ca5c1585 •  https://ibm.biz/pixiedustvis •  https://ibm.biz/pixiedustlab