SlideShare a Scribd company logo
What makes Data driven
environments more efficient and how to
build a data science toolchain around
Notebook technologies
Creator of Apache Zeppelin
Co-Founder, CTO
Moon soo Lee
moon@zepl.com
#GDSC 2018
Who am I
A true believer that data science notebook changes how
people collaborate
Creator of Apache Zeppelin
Co-founder
https://github.com/Leemoonsoo
#GDSC 2018
It was 2013, really wanted to have
interactive analytics interface for .
#GDSC 2018
Started an opensource project -
Zeppelin http://zeppelin-project.org/
data science notebook.Became an project in 2016.
http://zeppelin.apache.org
#GDSC 2018
Iterations REPL interface (2012)
Editor / Result interface (2013)
Notebook interface (2014)
#GDSC 2018
Pilot to Production in 1 day
Hey, take a look
I need an update every morning!
#GDSC 2018
More notebook consumers than producers
#GDSC 2018
At the same time
Opensource project receiving contributions like
Authentication
Access control
#GDSC 2018
Realized that notebook is a great collaboration tool
Why notebook?
#GDSC 2018
Notebook is
- Interactive
- Flexible
- Visualized
- Inline description
- Contain a story
- Shareable
#GDSC 2018
How to build collaborative environment
with notebook technology
Data sharing
Multi-user
environment
Notebook sharing
#GDSC 2018
Data scientist
Data engineer Data Analyst
Marketing
SW
engineer
Sales
Executive
You
Notebook Sharing
#GDSC 2018
You’re using only half of its
potential if not sharing
#GDSC 2018
Github
nbviewer
Zeppelin
Airbnb/knowledge-repo
Commercial services for notebook sharing
VCS
Open
source
Service
#GDSC 2018
Github
● Store notebook in github
● Versioning
● Github provides .ipynb viewer
● Fork / pull request / merge
● Private / Public / Team / Org
● Hard to apply Notebook level ACL
● Not easy for Non-engineers
#GDSC 2018
nbviewer
● Publishing notebook
● Share notebook by
sharing link
● Easy use
● No access control
Nbconvert (endering ipynb to static HTML) as a webservice
#GDSC 2018
Apache Zeppelin
● Share notebook with ACL, Read/Write/Execute
● In case of Jupyter notebook, need to convert .ipynb to
zeppelin format in command line.
#GDSC 2018
Airbnb/knowledge-repo
https://github.com/airbnb/knowledge-repo
● .ipynb, md as a post
● Git repo for version
control
● Feeds
● Search
● No access control
#GDSC 2018
Commercial services for notebook sharing
Google Colab
● Share notebook through google drive
● View/Edit/Run ipynb notebook using Colab
● Realtime collaboration
ZEPL
● Notebook level ACL
● View/Edit/Run .ipynb and Zeppelin notebook
● Realtime collaboration
● Import existing notebook from git/s3 storage
www.zepl.com
#GDSC 2018
Data Sharing
#GDSC 2018
DON’Ts
● Email attach
● Direct send
● Share through USB
● ...
Email attach
Local copy in laptop
USB drive
#GDSC 2018
DO’s
● Provide access to the same
dataset
● Access control capability
● Horizontal scalability
#GDSC 2018
Data catalog
● Provides location of data, what it means and how to load
○ e.g.
● Catalogue need to be accessible / searchable / annotatable
● Many different way to build depends on team / infra
○ Hive Metastore as a data catalog
○ Cloud infrastructure service (e.g. AWS glue data catalog, Azure data catalog)
○ Data catalog / publishing software (e.g. CKAN, DKAN)
○ Custom built on top of RDBMS, Nosql, Indexing engine
○ Build data catalog using Notebook
Dataset Location Schema Note
Activity s3://service/activity Date (DateTime), type (INT), action(String) Type is either RUN or STOP. ….
Images s3://service/images 512x256 pixel images Images are collected from profile photo...
#GDSC 2018
Build data catalog using Notebook
● Flexible enough to describe data
● Searchable, shareable, annotatable
● Programmatic generation
#GDSC 2018
Multi-user environment
#GDSC 2018
I like my notebook running on my laptop.
No you don’t.
#GDSC 2018
Sign in and Run
Install libraries and
Install notebook and
Configure driver, environments and
Request access to data and
Setup access to notebook repo and
….
Run
#GDSC 2018
Reverse Proxy
JupyterHub
/hub
Jupyter server
Kernel (Python, R)
Jupyter server
Kernel (Python, R)
/user/[name]
Authenticator
Spawner
Notebook
Storage
(Filesystem, Git, etc)
LDAP,
OAuth,
etc
Docker, k8s
Zeppelin Server
LDAP,
OAuth,
etc
Notebook
Storage
(Filesystem, Git, etc)
Interpreter Manager
Auth / ACL
Interpreter (kernel)
Interpreter (kernel)
Interpreter (kernel)
#GDSC 2018
● Easier to implement / manage
● Notebook sharing is decoupled with
execution environment
● Usually notebook sharing is basic or
restricted. (no notebook level ACL)
● e.g.
○ JupyterHub
○ AWS Sagemaker
Reverse Proxy
Single user
Notebook server
Kernel
Single user
Notebook server
Kernel
Notebook
Storage
Multi user
Notebook server
Notebook
Storage
Kernel Kernel Kernel
Browser
Browser
● More complex to implement / manage
● Notebook sharing is coupled with execution
environment
● Usually notebook sharing is more advanced
and fine grained
● e.g.
○ Apache Zeppelin
○ ZEPL
○ Google Colab
#GDSC 2018
Conclusion
Notebook Share
Data share
Multi-user environment
Collaboration
#GDSC 2018
Thanks

More Related Content

What's hot

Plotly dash and data visualisation in Python
Plotly dash and data visualisation in PythonPlotly dash and data visualisation in Python
Plotly dash and data visualisation in Python
Volodymyr Kazantsev
 
Deep dive into serverless on Google Cloud
Deep dive into serverless on Google CloudDeep dive into serverless on Google Cloud
Deep dive into serverless on Google Cloud
Bret McGowen - NYC Google Developer Advocate
 
Modular GraphQL with Schema Stitching
Modular GraphQL with Schema StitchingModular GraphQL with Schema Stitching
Modular GraphQL with Schema Stitching
Sashko Stubailo
 
Adding GraphQL to your existing architecture
Adding GraphQL to your existing architectureAdding GraphQL to your existing architecture
Adding GraphQL to your existing architecture
Sashko Stubailo
 
GraphQL + relay
GraphQL + relayGraphQL + relay
GraphQL + relay
Cédric GILLET
 
Meetup
MeetupMeetup
GraphQL in Production
GraphQL in ProductionGraphQL in Production
GraphQL in Production
Bogdan Nedelcu
 
20170927 py data_n3_bokeh_plotly
20170927 py data_n3_bokeh_plotly20170927 py data_n3_bokeh_plotly
20170927 py data_n3_bokeh_plotly
Andrey Vykhodtsev
 
GraphQL
GraphQLGraphQL
GraphQL
Joel Corrêa
 
Go lambda-presentation
Go lambda-presentationGo lambda-presentation
Go lambda-presentation
Steven White
 
Kubernetes Config Management Landscape
Kubernetes Config Management LandscapeKubernetes Config Management Landscape
Kubernetes Config Management Landscape
Tomasz Tarczyński
 
GraphQL in an Age of REST
GraphQL in an Age of RESTGraphQL in an Age of REST
GraphQL in an Age of REST
Yos Riady
 
Google cloud infrastructure workshop
Google cloud infrastructure workshopGoogle cloud infrastructure workshop
Google cloud infrastructure workshop
Akash Agrawal
 
GraphQL & Relay
GraphQL & RelayGraphQL & Relay
GraphQL & Relay
Viacheslav Slinko
 
Serverless with Google Cloud
Serverless with Google CloudServerless with Google Cloud
Serverless with Google Cloud
Bret McGowen - NYC Google Developer Advocate
 
Introduction to GraphQL
Introduction to GraphQLIntroduction to GraphQL
Introduction to GraphQL
Rodrigo Prates
 
月刊ライトニングトーク 2014/06-07: 前回からのダイジェスト
月刊ライトニングトーク 2014/06-07: 前回からのダイジェスト月刊ライトニングトーク 2014/06-07: 前回からのダイジェスト
月刊ライトニングトーク 2014/06-07: 前回からのダイジェスト
Seiya Konno
 
Firebase Code Lab - 2015 GDG Buffalo DevFest
Firebase Code Lab - 2015 GDG Buffalo DevFestFirebase Code Lab - 2015 GDG Buffalo DevFest
Firebase Code Lab - 2015 GDG Buffalo DevFest
Bret McGowen - NYC Google Developer Advocate
 
Introduction to GraphQL
Introduction to GraphQLIntroduction to GraphQL
Introduction to GraphQL
İlker Güller
 
How to GraphQL
How to GraphQLHow to GraphQL
How to GraphQL
Tomasz Bak
 

What's hot (20)

Plotly dash and data visualisation in Python
Plotly dash and data visualisation in PythonPlotly dash and data visualisation in Python
Plotly dash and data visualisation in Python
 
Deep dive into serverless on Google Cloud
Deep dive into serverless on Google CloudDeep dive into serverless on Google Cloud
Deep dive into serverless on Google Cloud
 
Modular GraphQL with Schema Stitching
Modular GraphQL with Schema StitchingModular GraphQL with Schema Stitching
Modular GraphQL with Schema Stitching
 
Adding GraphQL to your existing architecture
Adding GraphQL to your existing architectureAdding GraphQL to your existing architecture
Adding GraphQL to your existing architecture
 
GraphQL + relay
GraphQL + relayGraphQL + relay
GraphQL + relay
 
Meetup
MeetupMeetup
Meetup
 
GraphQL in Production
GraphQL in ProductionGraphQL in Production
GraphQL in Production
 
20170927 py data_n3_bokeh_plotly
20170927 py data_n3_bokeh_plotly20170927 py data_n3_bokeh_plotly
20170927 py data_n3_bokeh_plotly
 
GraphQL
GraphQLGraphQL
GraphQL
 
Go lambda-presentation
Go lambda-presentationGo lambda-presentation
Go lambda-presentation
 
Kubernetes Config Management Landscape
Kubernetes Config Management LandscapeKubernetes Config Management Landscape
Kubernetes Config Management Landscape
 
GraphQL in an Age of REST
GraphQL in an Age of RESTGraphQL in an Age of REST
GraphQL in an Age of REST
 
Google cloud infrastructure workshop
Google cloud infrastructure workshopGoogle cloud infrastructure workshop
Google cloud infrastructure workshop
 
GraphQL & Relay
GraphQL & RelayGraphQL & Relay
GraphQL & Relay
 
Serverless with Google Cloud
Serverless with Google CloudServerless with Google Cloud
Serverless with Google Cloud
 
Introduction to GraphQL
Introduction to GraphQLIntroduction to GraphQL
Introduction to GraphQL
 
月刊ライトニングトーク 2014/06-07: 前回からのダイジェスト
月刊ライトニングトーク 2014/06-07: 前回からのダイジェスト月刊ライトニングトーク 2014/06-07: 前回からのダイジェスト
月刊ライトニングトーク 2014/06-07: 前回からのダイジェスト
 
Firebase Code Lab - 2015 GDG Buffalo DevFest
Firebase Code Lab - 2015 GDG Buffalo DevFestFirebase Code Lab - 2015 GDG Buffalo DevFest
Firebase Code Lab - 2015 GDG Buffalo DevFest
 
Introduction to GraphQL
Introduction to GraphQLIntroduction to GraphQL
Introduction to GraphQL
 
How to GraphQL
How to GraphQLHow to GraphQL
How to GraphQL
 

Similar to Collaborative environment with data science notebook

AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadAirBNB's ML platform - BigHead
AirBNB's ML platform - BigHead
Karthik Murugesan
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
OW2
 
DocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winnerDocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDoku
 
What cloud changes the developer
What cloud changes the developerWhat cloud changes the developer
What cloud changes the developer
Simon Su
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
carlostorres15106
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Andrey Dotsenko
 
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CD
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CDA GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CD
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CD
Julian Mazzitelli
 
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdfLupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
WolfgangZiegler6
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
Ahmed Ossama
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
Márton Kodok
 
Instant developer onboarding with self contained repositories
Instant developer onboarding with self contained repositoriesInstant developer onboarding with self contained repositories
Instant developer onboarding with self contained repositories
Yshay Yaacobi
 
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
Vladislav Supalov
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
SeungYong Oh
 
Introduction to serverless computing on Google Cloud
Introduction to serverless computing on Google CloudIntroduction to serverless computing on Google Cloud
Introduction to serverless computing on Google Cloud
wesley chun
 
From React to React Native - Things I wish I knew when I started
From React to React Native - Things I wish I knew when I startedFrom React to React Native - Things I wish I knew when I started
From React to React Native - Things I wish I knew when I started
sparkfabrik
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Kaxil Naik
 
AGES Presentation on Web, Python, Django and GeoServer
AGES Presentation on Web, Python, Django and GeoServerAGES Presentation on Web, Python, Django and GeoServer
AGES Presentation on Web, Python, Django and GeoServer
Ng'eno Victor
 
Simplified News Analytics in Presidential Election with Google Cloud Platform
Simplified News Analytics in Presidential Election with Google Cloud PlatformSimplified News Analytics in Presidential Election with Google Cloud Platform
Simplified News Analytics in Presidential Election with Google Cloud Platform
Imre Nagi
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
Yohei Onishi
 

Similar to Collaborative environment with data science notebook (20)

AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadAirBNB's ML platform - BigHead
AirBNB's ML platform - BigHead
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
 
DocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winnerDocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winner
 
What cloud changes the developer
What cloud changes the developerWhat cloud changes the developer
What cloud changes the developer
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CD
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CDA GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CD
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CD
 
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdfLupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
Instant developer onboarding with self contained repositories
Instant developer onboarding with self contained repositoriesInstant developer onboarding with self contained repositories
Instant developer onboarding with self contained repositories
 
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
Introduction to serverless computing on Google Cloud
Introduction to serverless computing on Google CloudIntroduction to serverless computing on Google Cloud
Introduction to serverless computing on Google Cloud
 
From React to React Native - Things I wish I knew when I started
From React to React Native - Things I wish I knew when I startedFrom React to React Native - Things I wish I knew when I started
From React to React Native - Things I wish I knew when I started
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
 
AGES Presentation on Web, Python, Django and GeoServer
AGES Presentation on Web, Python, Django and GeoServerAGES Presentation on Web, Python, Django and GeoServer
AGES Presentation on Web, Python, Django and GeoServer
 
Simplified News Analytics in Presidential Election with Google Cloud Platform
Simplified News Analytics in Presidential Election with Google Cloud PlatformSimplified News Analytics in Presidential Election with Google Cloud Platform
Simplified News Analytics in Presidential Election with Google Cloud Platform
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 

Recently uploaded

Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSCW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
veerababupersonal22
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
top1002
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 

Recently uploaded (20)

Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSCW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 

Collaborative environment with data science notebook

  • 1. What makes Data driven environments more efficient and how to build a data science toolchain around Notebook technologies Creator of Apache Zeppelin Co-Founder, CTO Moon soo Lee moon@zepl.com
  • 2. #GDSC 2018 Who am I A true believer that data science notebook changes how people collaborate Creator of Apache Zeppelin Co-founder https://github.com/Leemoonsoo
  • 3. #GDSC 2018 It was 2013, really wanted to have interactive analytics interface for .
  • 4. #GDSC 2018 Started an opensource project - Zeppelin http://zeppelin-project.org/ data science notebook.Became an project in 2016. http://zeppelin.apache.org
  • 5. #GDSC 2018 Iterations REPL interface (2012) Editor / Result interface (2013) Notebook interface (2014)
  • 6. #GDSC 2018 Pilot to Production in 1 day Hey, take a look I need an update every morning!
  • 7. #GDSC 2018 More notebook consumers than producers
  • 8. #GDSC 2018 At the same time Opensource project receiving contributions like Authentication Access control
  • 9. #GDSC 2018 Realized that notebook is a great collaboration tool Why notebook?
  • 10. #GDSC 2018 Notebook is - Interactive - Flexible - Visualized - Inline description - Contain a story - Shareable
  • 11. #GDSC 2018 How to build collaborative environment with notebook technology Data sharing Multi-user environment Notebook sharing
  • 12. #GDSC 2018 Data scientist Data engineer Data Analyst Marketing SW engineer Sales Executive You Notebook Sharing
  • 13. #GDSC 2018 You’re using only half of its potential if not sharing
  • 15. #GDSC 2018 Github ● Store notebook in github ● Versioning ● Github provides .ipynb viewer ● Fork / pull request / merge ● Private / Public / Team / Org ● Hard to apply Notebook level ACL ● Not easy for Non-engineers
  • 16. #GDSC 2018 nbviewer ● Publishing notebook ● Share notebook by sharing link ● Easy use ● No access control Nbconvert (endering ipynb to static HTML) as a webservice
  • 17. #GDSC 2018 Apache Zeppelin ● Share notebook with ACL, Read/Write/Execute ● In case of Jupyter notebook, need to convert .ipynb to zeppelin format in command line.
  • 18. #GDSC 2018 Airbnb/knowledge-repo https://github.com/airbnb/knowledge-repo ● .ipynb, md as a post ● Git repo for version control ● Feeds ● Search ● No access control
  • 19. #GDSC 2018 Commercial services for notebook sharing Google Colab ● Share notebook through google drive ● View/Edit/Run ipynb notebook using Colab ● Realtime collaboration ZEPL ● Notebook level ACL ● View/Edit/Run .ipynb and Zeppelin notebook ● Realtime collaboration ● Import existing notebook from git/s3 storage www.zepl.com
  • 21. #GDSC 2018 DON’Ts ● Email attach ● Direct send ● Share through USB ● ... Email attach Local copy in laptop USB drive
  • 22. #GDSC 2018 DO’s ● Provide access to the same dataset ● Access control capability ● Horizontal scalability
  • 23. #GDSC 2018 Data catalog ● Provides location of data, what it means and how to load ○ e.g. ● Catalogue need to be accessible / searchable / annotatable ● Many different way to build depends on team / infra ○ Hive Metastore as a data catalog ○ Cloud infrastructure service (e.g. AWS glue data catalog, Azure data catalog) ○ Data catalog / publishing software (e.g. CKAN, DKAN) ○ Custom built on top of RDBMS, Nosql, Indexing engine ○ Build data catalog using Notebook Dataset Location Schema Note Activity s3://service/activity Date (DateTime), type (INT), action(String) Type is either RUN or STOP. …. Images s3://service/images 512x256 pixel images Images are collected from profile photo...
  • 24. #GDSC 2018 Build data catalog using Notebook ● Flexible enough to describe data ● Searchable, shareable, annotatable ● Programmatic generation
  • 26. #GDSC 2018 I like my notebook running on my laptop. No you don’t.
  • 27. #GDSC 2018 Sign in and Run Install libraries and Install notebook and Configure driver, environments and Request access to data and Setup access to notebook repo and …. Run
  • 28. #GDSC 2018 Reverse Proxy JupyterHub /hub Jupyter server Kernel (Python, R) Jupyter server Kernel (Python, R) /user/[name] Authenticator Spawner Notebook Storage (Filesystem, Git, etc) LDAP, OAuth, etc Docker, k8s Zeppelin Server LDAP, OAuth, etc Notebook Storage (Filesystem, Git, etc) Interpreter Manager Auth / ACL Interpreter (kernel) Interpreter (kernel) Interpreter (kernel)
  • 29. #GDSC 2018 ● Easier to implement / manage ● Notebook sharing is decoupled with execution environment ● Usually notebook sharing is basic or restricted. (no notebook level ACL) ● e.g. ○ JupyterHub ○ AWS Sagemaker Reverse Proxy Single user Notebook server Kernel Single user Notebook server Kernel Notebook Storage Multi user Notebook server Notebook Storage Kernel Kernel Kernel Browser Browser ● More complex to implement / manage ● Notebook sharing is coupled with execution environment ● Usually notebook sharing is more advanced and fine grained ● e.g. ○ Apache Zeppelin ○ ZEPL ○ Google Colab
  • 30. #GDSC 2018 Conclusion Notebook Share Data share Multi-user environment Collaboration