SlideShare a Scribd company logo
1 of 24
PayPal Notebooks
Agenda
Introductions
PayPal Key Metrics and analytics ecosystem
Enabling data science at scale
Jupyter platform
PPMagics
Collaboration and deployment
Data access with Gimel
Big data enhancements
Open source plans
2© 2018 PayPal Inc. Confidential and proprietary.
Introductions
• Product manager, data processing products
at PayPal
• 20 years in data and analytics across
networking, semi-conductors, telecom,
security and fintech industries
• Data warehouse developer, BI program
manager, Data product manager
romehta@paypal.com
https://www.linkedin.com/in/romit-mehta
© 2018 PayPal Inc. Confidential and proprietary.
Romit Mehta
• Software engineer, Big data platform
engineering at PayPal
• 5 years in building distributed and scalable
applications
• PayPal Notebooks lead engineer
pkanamarlapudi@paypal.com
https://www.linkedin.com/in/praveenkanamarlapudi
Praveen Kanamarlapudi
3
PayPal Key Metrics
© 2018 PayPal Inc. Confidential and proprietary.
PayPal Customers, Transactions and Growth
From: PayPal’s Q2 2018 Investor
Update
5
PayPal Analytics Ecosystem
© 2018 PayPal Inc. Confidential and proprietary.
PayPal Big Data Platform
160+ PB Data
75,000+ YARN
jobs/day
One of the largest
Aerospike,
Teradata,
Hortonworks and
Oracle installations
Compute supported:
MR, Pig, Hive, Spark,
Beam
13 prod clusters, 12 non-
prod clusters
GPU co-located with
Hadoop
7
© 2018 PayPal Inc. Confidential and proprietary.
Developer Data scientist Analyst Operator
Gimel SDK Notebooks
UDC Data API
Infrastructure services leveraged for elasticity and redundancy
Multi-DC Public cloudPredictive resource allocation
Logging
Monitoring
Alerting
Security
Application
Lifecycle
Management
Compute
Frameworkand
APIs
GimelData
Platform
User
Experience
andAccess
R Studio BI tools
© 2018 PayPal Inc. Confidential and proprietary.
PayPal Notebooks Platform
Jupyter @
PayPal
PayPal Notebooks
Beta
PayPal Notebooks
Generally Available
PayPal Notebooks Today
Q3 2016 Q3 2017
~50 users
Feb 2018
~100 users
~1,300 users
SQL, Spark/PySpark,
Python, R
2016
Zeppelin
Individual use
9
Demo
Two customer segments
11
Tracking payments by geo
POC with static (csv) data
Collaborate with team
Visualize results
Switch to live data
Deploy/productionalize
Build and train models
Fetch data
Prep/cleanse data
Use algorithm to build model
Tweak model
Finalize model
Analyst Data scientist
11
© 2018 PayPal Inc. Confidential and proprietary.
Jupyter deployed as a platform
© 2018 PayPal Inc. Confidential and proprietary.
PayPal Notebooks Platform
Highly available
JupyterHub
GPU integration
Standalone
Docker
• Enable deep learning through
notebooks
• Distributed TensorFlow training enabled
with dynamic GPU resource
management
• Container image with all
PPExtensions
• Required to deploy across various
security zones at PayPal
• Foundation for open sourcing
PPExtensions
• Grid of JupyterHub hosts
• Highly available and distributed
SSO + 2FA
integrationKerberos + LDAP integration
13
PPExtensions
Set of extensions to improve user
experience and reduce time to market
© 2018 PayPal Inc. Confidential and proprietary.
PPMagics
• Query data from Hive (or Teradata)
• Insert data using csv/dataframes
• Publish to Tableau
%hive, %teradata
• Run any notebook from another notebook
• Run multiple notebooks in parallel
• Execute a pipeline of notebooks
%run, %run_pipeline
• Run SQL on csv files%csv
• Query data from Presto
• Publish to Tableau%presto
• Query data from Spark Thrift Server
• Includes progress bar for SQL execution%sts
15
© 2018 PayPal Inc. Confidential and proprietary.
Collaboration and deployment
Github sharing Project collaboration
Tableau publishing Deployment/scheduling
• Push notebook to
common org-wide repo
• View full fidelity notebook
on Github
• Share link to notebook
instead of .ipynb file or
code snippets
• Seamlessly publish to
Tableau
• Download TDE to use
Tableau Desktop, or directly
publish as a data source
• Share notebook to personal
and team repos
• Resolve conflicts between
remote and local notebooks
with nbdime
• Integrate with Airflow
• Set up frequency, alerts,
optionally push to Github
after every run
• Add Celery executor for
scalability
16
Gimel
Unified Data API to access any data store
© 2018 PayPal Inc. Confidential and proprietary.
Simplified access to big data systems with GSQL
• Single unified data API to access any data store
• SQL capabilities against any data store
• Switch between interactive, batch and streaming modes
• Centralized metadata catalog (PCatalog) to abstract the
physical complexities of accessing data
• Open sourced in April: gimel.io
• Integrated with Jupyter through GSQL
• Dataset browser in notebooks powered by PCatalog
18
Big data and machine learning
© 2018 PayPal Inc. Confidential and proprietary.
Big data and machine learning
Apache Spark
• Updates to sparkmagic
• Enabled progress bar for Spark
jobs
Apache Livy
• Enabled SQL session support
with Apache Livy
Tensorflow
• Integrated Tensorflow for
distributed model training
• Enabled Tensorflow with GPU
20
PayPal Notebooks Open Source Plans
Open sourcing PPExtensions
© 2018 PayPal Inc. Confidential and proprietary.
• PPMagics:
available now
• Gimel:
available now
• Airflow integration
• Github sharing
• Project
collaboration
• Tableau integration
• Config UI
• Dataset browser
• Pipelines
• Data Science
Workbench
22
Open source links and info
© 2018 PayPal Inc. Confidential and proprietary.
Install pip install ppextensions
Github http://ppextensions.io
Google Group https://groups.google.com/d/forum/ppextensions
Slack https://ppextensions.slack.com
Gimel http://gimel.io
23
Q&A

More Related Content

What's hot

Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCPAllCloud
 
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...SnapLogic
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Big Data Ingestion Using Hadoop - Capstone Presentation
Big Data Ingestion Using Hadoop - Capstone PresentationBig Data Ingestion Using Hadoop - Capstone Presentation
Big Data Ingestion Using Hadoop - Capstone PresentationSamkannan
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...DataWorks Summit
 
Choosing the Right Open Source Database
Choosing the Right Open Source DatabaseChoosing the Right Open Source Database
Choosing the Right Open Source DatabaseAll Things Open
 
How to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4jHow to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4jGraphRM
 
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...SnapLogic
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Big Data Spain
 
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jUnified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jDeepak Chandramouli
 
Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0SnapLogic
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?Jeraldine Phneah
 
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsWebinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsSnapLogic
 
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...Flink Forward
 
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...HostedbyConfluent
 
VP of WW Partners by Alan Chhabra
VP of WW Partners by Alan ChhabraVP of WW Partners by Alan Chhabra
VP of WW Partners by Alan ChhabraBig Data Spain
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demoDatabricks
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...Big Data Spain
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarBig Data Spain
 

What's hot (20)

Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCP
 
Ford
FordFord
Ford
 
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Big Data Ingestion Using Hadoop - Capstone Presentation
Big Data Ingestion Using Hadoop - Capstone PresentationBig Data Ingestion Using Hadoop - Capstone Presentation
Big Data Ingestion Using Hadoop - Capstone Presentation
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
 
Choosing the Right Open Source Database
Choosing the Right Open Source DatabaseChoosing the Right Open Source Database
Choosing the Right Open Source Database
 
How to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4jHow to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4j
 
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
 
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jUnified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
 
Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?
 
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsWebinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud Analytics
 
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
 
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
 
VP of WW Partners by Alan Chhabra
VP of WW Partners by Alan ChhabraVP of WW Partners by Alan Chhabra
VP of WW Partners by Alan Chhabra
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
 

Similar to PayPal Notebooks at Jupytercon 2018

Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Romit Mehta
 
Gimel at Teradata Analytics Universe 2018
Gimel at Teradata Analytics Universe 2018Gimel at Teradata Analytics Universe 2018
Gimel at Teradata Analytics Universe 2018Romit Mehta
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?SnapLogic
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderDataconomy Media
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkFlink Forward
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramFIWARE
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015Cloudera, Inc.
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
Open Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeOpen Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeApache Geode
 
An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)Anthony Baker
 
Capstone presentation
Capstone presentationCapstone presentation
Capstone presentationVikal Gupta
 
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenWes McKinney
 
Advanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applicationsAdvanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applicationsRogue Wave Software
 
PyData: The Next Generation
PyData: The Next GenerationPyData: The Next Generation
PyData: The Next GenerationWes McKinney
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapNeo4j
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!KNIMESlides
 
Big Data as easy as 1, 2, 3, ... 4 ... with KNIME
Big Data as easy as 1, 2, 3, ... 4 ... with KNIMEBig Data as easy as 1, 2, 3, ... 4 ... with KNIME
Big Data as easy as 1, 2, 3, ... 4 ... with KNIMERosaria Silipo
 
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsDataKitchen
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOpsChristopher Bergh
 

Similar to PayPal Notebooks at Jupytercon 2018 (20)

Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018
 
Gimel at Teradata Analytics Universe 2018
Gimel at Teradata Analytics Universe 2018Gimel at Teradata Analytics Universe 2018
Gimel at Teradata Analytics Universe 2018
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Open Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeOpen Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache Geode
 
An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)
 
Capstone presentation
Capstone presentationCapstone presentation
Capstone presentation
 
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
 
Advanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applicationsAdvanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applications
 
PyData: The Next Generation
PyData: The Next GenerationPyData: The Next Generation
PyData: The Next Generation
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!
 
Big Data as easy as 1, 2, 3, ... 4 ... with KNIME
Big Data as easy as 1, 2, 3, ... 4 ... with KNIMEBig Data as easy as 1, 2, 3, ... 4 ... with KNIME
Big Data as easy as 1, 2, 3, ... 4 ... with KNIME
 
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataops
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOps
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
 

Recently uploaded

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 

Recently uploaded (20)

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 

PayPal Notebooks at Jupytercon 2018

  • 2. Agenda Introductions PayPal Key Metrics and analytics ecosystem Enabling data science at scale Jupyter platform PPMagics Collaboration and deployment Data access with Gimel Big data enhancements Open source plans 2© 2018 PayPal Inc. Confidential and proprietary.
  • 3. Introductions • Product manager, data processing products at PayPal • 20 years in data and analytics across networking, semi-conductors, telecom, security and fintech industries • Data warehouse developer, BI program manager, Data product manager romehta@paypal.com https://www.linkedin.com/in/romit-mehta © 2018 PayPal Inc. Confidential and proprietary. Romit Mehta • Software engineer, Big data platform engineering at PayPal • 5 years in building distributed and scalable applications • PayPal Notebooks lead engineer pkanamarlapudi@paypal.com https://www.linkedin.com/in/praveenkanamarlapudi Praveen Kanamarlapudi 3
  • 5. © 2018 PayPal Inc. Confidential and proprietary. PayPal Customers, Transactions and Growth From: PayPal’s Q2 2018 Investor Update 5
  • 7. © 2018 PayPal Inc. Confidential and proprietary. PayPal Big Data Platform 160+ PB Data 75,000+ YARN jobs/day One of the largest Aerospike, Teradata, Hortonworks and Oracle installations Compute supported: MR, Pig, Hive, Spark, Beam 13 prod clusters, 12 non- prod clusters GPU co-located with Hadoop 7
  • 8. © 2018 PayPal Inc. Confidential and proprietary. Developer Data scientist Analyst Operator Gimel SDK Notebooks UDC Data API Infrastructure services leveraged for elasticity and redundancy Multi-DC Public cloudPredictive resource allocation Logging Monitoring Alerting Security Application Lifecycle Management Compute Frameworkand APIs GimelData Platform User Experience andAccess R Studio BI tools
  • 9. © 2018 PayPal Inc. Confidential and proprietary. PayPal Notebooks Platform Jupyter @ PayPal PayPal Notebooks Beta PayPal Notebooks Generally Available PayPal Notebooks Today Q3 2016 Q3 2017 ~50 users Feb 2018 ~100 users ~1,300 users SQL, Spark/PySpark, Python, R 2016 Zeppelin Individual use 9
  • 10. Demo
  • 11. Two customer segments 11 Tracking payments by geo POC with static (csv) data Collaborate with team Visualize results Switch to live data Deploy/productionalize Build and train models Fetch data Prep/cleanse data Use algorithm to build model Tweak model Finalize model Analyst Data scientist 11 © 2018 PayPal Inc. Confidential and proprietary.
  • 12. Jupyter deployed as a platform
  • 13. © 2018 PayPal Inc. Confidential and proprietary. PayPal Notebooks Platform Highly available JupyterHub GPU integration Standalone Docker • Enable deep learning through notebooks • Distributed TensorFlow training enabled with dynamic GPU resource management • Container image with all PPExtensions • Required to deploy across various security zones at PayPal • Foundation for open sourcing PPExtensions • Grid of JupyterHub hosts • Highly available and distributed SSO + 2FA integrationKerberos + LDAP integration 13
  • 14. PPExtensions Set of extensions to improve user experience and reduce time to market
  • 15. © 2018 PayPal Inc. Confidential and proprietary. PPMagics • Query data from Hive (or Teradata) • Insert data using csv/dataframes • Publish to Tableau %hive, %teradata • Run any notebook from another notebook • Run multiple notebooks in parallel • Execute a pipeline of notebooks %run, %run_pipeline • Run SQL on csv files%csv • Query data from Presto • Publish to Tableau%presto • Query data from Spark Thrift Server • Includes progress bar for SQL execution%sts 15
  • 16. © 2018 PayPal Inc. Confidential and proprietary. Collaboration and deployment Github sharing Project collaboration Tableau publishing Deployment/scheduling • Push notebook to common org-wide repo • View full fidelity notebook on Github • Share link to notebook instead of .ipynb file or code snippets • Seamlessly publish to Tableau • Download TDE to use Tableau Desktop, or directly publish as a data source • Share notebook to personal and team repos • Resolve conflicts between remote and local notebooks with nbdime • Integrate with Airflow • Set up frequency, alerts, optionally push to Github after every run • Add Celery executor for scalability 16
  • 17. Gimel Unified Data API to access any data store
  • 18. © 2018 PayPal Inc. Confidential and proprietary. Simplified access to big data systems with GSQL • Single unified data API to access any data store • SQL capabilities against any data store • Switch between interactive, batch and streaming modes • Centralized metadata catalog (PCatalog) to abstract the physical complexities of accessing data • Open sourced in April: gimel.io • Integrated with Jupyter through GSQL • Dataset browser in notebooks powered by PCatalog 18
  • 19. Big data and machine learning
  • 20. © 2018 PayPal Inc. Confidential and proprietary. Big data and machine learning Apache Spark • Updates to sparkmagic • Enabled progress bar for Spark jobs Apache Livy • Enabled SQL session support with Apache Livy Tensorflow • Integrated Tensorflow for distributed model training • Enabled Tensorflow with GPU 20
  • 21. PayPal Notebooks Open Source Plans
  • 22. Open sourcing PPExtensions © 2018 PayPal Inc. Confidential and proprietary. • PPMagics: available now • Gimel: available now • Airflow integration • Github sharing • Project collaboration • Tableau integration • Config UI • Dataset browser • Pipelines • Data Science Workbench 22
  • 23. Open source links and info © 2018 PayPal Inc. Confidential and proprietary. Install pip install ppextensions Github http://ppextensions.io Google Group https://groups.google.com/d/forum/ppextensions Slack https://ppextensions.slack.com Gimel http://gimel.io 23
  • 24. Q&A