SlideShare a Scribd company logo
1 of 16
Zeppelin at Twitter
Prasad Wagle
Technical Lead, Data Platform
twitter.com/prasadwagle
July 13, 2016
SF Data Science Meetup
Galvanize, San Francisco, CA
Twitter Data Pipeline Overview
Production
systems
Presto
Vertica
MySQL
Scalding
Spark
R
Custom Dashboards
Tableau
Zeppelin
Command line tools
HDFS
Analytics Tools
Analytics Front-ends
One company-wide server
860 notes
4000 paragraphs
1500 Vertica, 1500 Presto, 300 MySQL
400 Markdown
300 Scalding, Spark, etc
850 users
Zeppelin Usage Metrics
Field of (Data) Dreams
Started as hackweek project in Dec 2015, Beta: Jan 27
Number of notes created
Feb - 270, Mar - 350, Apr - 470, May - 560, Jun - 750
Report Creators
Product managers (dashboards, product analytics)
Data scientists
Sales analysts
Engineers and SREs
Report Viewers
Anyone in the company
Zeppelin Users
My mind is absolutely blown away by the ease of use, speed, and power of
Zeppelin. I've been wanting a tool like this at Twitter my entire time
working here.
started playing with @ApacheZeppelin. amazingly addictive!
Thanks for all the updates to Zeppelin - Fabric has fallen in love with it fast
(and we're even using it for daily tracking of our OKRs amongst all the
other metrics)
Zeppelin Testimonials
Very easy to create and share reports
Web based
Works seamlessly with analytics engines
JDBC
Non-JDBC - Scalding, Spark
Open source (easy to add features)
Reasons for adoption
Drag and drop report builder
Can create complex queries without SQL knowledge (e.g. Top N)
Polished UI
Filters and other transformations work on extracts
no new database queries (fast)
Row level permissions (for sales reports)
Tableau
Areas:
Security
Stability
Operations
Interpreter
Work Done Before Production
Authentication
Integrated with Twitter’s homegrown single sign-on system
SSL
Integrated with Twitter’s homegrown key distribution system
Notebook authorization
Data source authorization
Work Done (Security)
Websocket deadlock issue with Jetty 8
reduce communication
remove synchronized block (risky, will move to Jetty 9)
Monitoring
Standby server
Backups
Work Done (Stability, Operations)
Scalding
JDBC
Create JDBC connection for every query to avoid Vertica closed
connection issue
Work Done (Interpreter)
Notebook authorization
Data source authorization
Run scheduled notes with a user
Scalding interpreter
Reduce websocket communication
Paragraph footer
Row level permissions
Work Contributed to Apache Project
Stability, Scalability
Jetty9
Interpreters
Scalding, Spark, R
Multiuser scalability, authentication, integration with Twitter
sources
Use JDBC interpreter instead of Hive
Future / Work in Progress
Features and UX
Notebook organization (folders)
Email reports and alerts
Row level permissions like tableau
Operations
Monitoring (end-to-end query)
Admin (view/stop running jobs, resource usage)
Failover
Continuous Integration
Future / Work in Progress
There is a real need for Notebook style interface for data analysis
Zeppelin is enterprise ready, flexible and easy to use
Zeppelin user and dev community is awesome
Takeaways

More Related Content

What's hot

Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FutureDataWorks Summit
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...Spark Summit
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureDatabricks
 
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDeploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDatabricks
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakHakka Labs
 
Structured Streaming Use-Cases at Apple
Structured Streaming Use-Cases at AppleStructured Streaming Use-Cases at Apple
Structured Streaming Use-Cases at AppleDatabricks
 
Apache Spark & Scala
Apache Spark & ScalaApache Spark & Scala
Apache Spark & ScalaEdureka!
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @ShanghaiLuke Han
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016kbajda
 
Building your bi system-HadoopCon Taiwan 2015
Building your bi system-HadoopCon Taiwan 2015Building your bi system-HadoopCon Taiwan 2015
Building your bi system-HadoopCon Taiwan 2015Bryan Yang
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Lightbend
 
The Revolution Will be Streamed
The Revolution Will be StreamedThe Revolution Will be Streamed
The Revolution Will be StreamedDatabricks
 
 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin...
 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin... Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin...
 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin...Jacek Laskowski
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...Databricks
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaSpark Summit
 
JEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache SparkJEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache SparkTaras Matyashovsky
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookDatabricks
 
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit
 
Keynote at spark summit east anjul
Keynote at spark summit east anjulKeynote at spark summit east anjul
Keynote at spark summit east anjulAnjul Bhambhri
 

What's hot (20)

Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
 
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDeploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Structured Streaming Use-Cases at Apple
Structured Streaming Use-Cases at AppleStructured Streaming Use-Cases at Apple
Structured Streaming Use-Cases at Apple
 
Apache Spark & Scala
Apache Spark & ScalaApache Spark & Scala
Apache Spark & Scala
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
Building your bi system-HadoopCon Taiwan 2015
Building your bi system-HadoopCon Taiwan 2015Building your bi system-HadoopCon Taiwan 2015
Building your bi system-HadoopCon Taiwan 2015
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
 
The Revolution Will be Streamed
The Revolution Will be StreamedThe Revolution Will be Streamed
The Revolution Will be Streamed
 
 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin...
 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin... Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin...
 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin...
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
 
JEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache SparkJEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache Spark
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
 
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
 
Keynote at spark summit east anjul
Keynote at spark summit east anjulKeynote at spark summit east anjul
Keynote at spark summit east anjul
 

Viewers also liked

Explorez vos données avec apache zeppelin
Explorez vos données avec apache zeppelinExplorez vos données avec apache zeppelin
Explorez vos données avec apache zeppelinBruno Bonnin
 
사물인터넷 보안 사례 및 대응 방안 2016.11.09
사물인터넷 보안 사례 및 대응 방안   2016.11.09사물인터넷 보안 사례 및 대응 방안   2016.11.09
사물인터넷 보안 사례 및 대응 방안 2016.11.09Hakyong Kim
 
Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095
Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095
Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095CBIZ, Inc.
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politècnica de Catalunya
 
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...Amazon Web Services Korea
 
Top 10 retail tech trends 2017
Top 10 retail tech trends   2017Top 10 retail tech trends   2017
Top 10 retail tech trends 2017ALTEN Calsoft Labs
 
애자일 스크럼과 JIRA
애자일 스크럼과 JIRA 애자일 스크럼과 JIRA
애자일 스크럼과 JIRA Terry Cho
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksTaegyun Jeon
 
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAn Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAmazon Web Services
 
Apache flink - prise en main rapide
Apache flink - prise en main rapideApache flink - prise en main rapide
Apache flink - prise en main rapideBilal Baltagi
 
Apache flink - retour d'expérience sur la conférence flink forward 2015
Apache flink - retour d'expérience sur la conférence flink forward 2015Apache flink - retour d'expérience sur la conférence flink forward 2015
Apache flink - retour d'expérience sur la conférence flink forward 2015Bilal Baltagi
 
Explorez vos données présentes dans MongoDB avec Apache Zeppelin
Explorez vos données présentes dans MongoDB avec Apache ZeppelinExplorez vos données présentes dans MongoDB avec Apache Zeppelin
Explorez vos données présentes dans MongoDB avec Apache ZeppelinBruno Bonnin
 

Viewers also liked (12)

Explorez vos données avec apache zeppelin
Explorez vos données avec apache zeppelinExplorez vos données avec apache zeppelin
Explorez vos données avec apache zeppelin
 
사물인터넷 보안 사례 및 대응 방안 2016.11.09
사물인터넷 보안 사례 및 대응 방안   2016.11.09사물인터넷 보안 사례 및 대응 방안   2016.11.09
사물인터넷 보안 사례 및 대응 방안 2016.11.09
 
Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095
Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095
Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
 
Top 10 retail tech trends 2017
Top 10 retail tech trends   2017Top 10 retail tech trends   2017
Top 10 retail tech trends 2017
 
애자일 스크럼과 JIRA
애자일 스크럼과 JIRA 애자일 스크럼과 JIRA
애자일 스크럼과 JIRA
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
 
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAn Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
 
Apache flink - prise en main rapide
Apache flink - prise en main rapideApache flink - prise en main rapide
Apache flink - prise en main rapide
 
Apache flink - retour d'expérience sur la conférence flink forward 2015
Apache flink - retour d'expérience sur la conférence flink forward 2015Apache flink - retour d'expérience sur la conférence flink forward 2015
Apache flink - retour d'expérience sur la conférence flink forward 2015
 
Explorez vos données présentes dans MongoDB avec Apache Zeppelin
Explorez vos données présentes dans MongoDB avec Apache ZeppelinExplorez vos données présentes dans MongoDB avec Apache Zeppelin
Explorez vos données présentes dans MongoDB avec Apache Zeppelin
 

Similar to Zeppelin at twitter (sf data science meetup, july 2016)

Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Paulo Gutierrez
 
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdfDustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdfDustin Vannoy
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minssparkflows
 
Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...Erwin de Kreuk
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019Juan Fabian
 
IT Summit - Modernizing Enterprise Analytics: the IT Story
IT Summit - Modernizing Enterprise Analytics: the IT StoryIT Summit - Modernizing Enterprise Analytics: the IT Story
IT Summit - Modernizing Enterprise Analytics: the IT StoryTableau Software
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engineWalter Liu
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Zhenxiao Luo
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Running High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioiguazio
 
An Architect's guide to real time big data systems
An Architect's guide to real time big data systemsAn Architect's guide to real time big data systems
An Architect's guide to real time big data systemsRaja SP
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
PostgreSQL 10; Long Awaited Enterprise Solutions
PostgreSQL 10; Long Awaited Enterprise SolutionsPostgreSQL 10; Long Awaited Enterprise Solutions
PostgreSQL 10; Long Awaited Enterprise SolutionsJulyanto SUTANDANG
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in RealtimeDataWorks Summit
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 

Similar to Zeppelin at twitter (sf data science meetup, july 2016) (20)

Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
 
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdfDustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
 
Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019
 
IT Summit - Modernizing Enterprise Analytics: the IT Story
IT Summit - Modernizing Enterprise Analytics: the IT StoryIT Summit - Modernizing Enterprise Analytics: the IT Story
IT Summit - Modernizing Enterprise Analytics: the IT Story
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Running High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclio
 
An Architect's guide to real time big data systems
An Architect's guide to real time big data systemsAn Architect's guide to real time big data systems
An Architect's guide to real time big data systems
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Web Development In Oracle APEX
Web Development In Oracle APEXWeb Development In Oracle APEX
Web Development In Oracle APEX
 
PostgreSQL 10; Long Awaited Enterprise Solutions
PostgreSQL 10; Long Awaited Enterprise SolutionsPostgreSQL 10; Long Awaited Enterprise Solutions
PostgreSQL 10; Long Awaited Enterprise Solutions
 
Airavata_Architecture_xsede16
Airavata_Architecture_xsede16Airavata_Architecture_xsede16
Airavata_Architecture_xsede16
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 

Recently uploaded

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

Zeppelin at twitter (sf data science meetup, july 2016)

  • 1. Zeppelin at Twitter Prasad Wagle Technical Lead, Data Platform twitter.com/prasadwagle July 13, 2016 SF Data Science Meetup Galvanize, San Francisco, CA
  • 2. Twitter Data Pipeline Overview Production systems Presto Vertica MySQL Scalding Spark R Custom Dashboards Tableau Zeppelin Command line tools HDFS Analytics Tools Analytics Front-ends
  • 3. One company-wide server 860 notes 4000 paragraphs 1500 Vertica, 1500 Presto, 300 MySQL 400 Markdown 300 Scalding, Spark, etc 850 users Zeppelin Usage Metrics
  • 4. Field of (Data) Dreams Started as hackweek project in Dec 2015, Beta: Jan 27 Number of notes created Feb - 270, Mar - 350, Apr - 470, May - 560, Jun - 750
  • 5. Report Creators Product managers (dashboards, product analytics) Data scientists Sales analysts Engineers and SREs Report Viewers Anyone in the company Zeppelin Users
  • 6. My mind is absolutely blown away by the ease of use, speed, and power of Zeppelin. I've been wanting a tool like this at Twitter my entire time working here. started playing with @ApacheZeppelin. amazingly addictive! Thanks for all the updates to Zeppelin - Fabric has fallen in love with it fast (and we're even using it for daily tracking of our OKRs amongst all the other metrics) Zeppelin Testimonials
  • 7. Very easy to create and share reports Web based Works seamlessly with analytics engines JDBC Non-JDBC - Scalding, Spark Open source (easy to add features) Reasons for adoption
  • 8. Drag and drop report builder Can create complex queries without SQL knowledge (e.g. Top N) Polished UI Filters and other transformations work on extracts no new database queries (fast) Row level permissions (for sales reports) Tableau
  • 10. Authentication Integrated with Twitter’s homegrown single sign-on system SSL Integrated with Twitter’s homegrown key distribution system Notebook authorization Data source authorization Work Done (Security)
  • 11. Websocket deadlock issue with Jetty 8 reduce communication remove synchronized block (risky, will move to Jetty 9) Monitoring Standby server Backups Work Done (Stability, Operations)
  • 12. Scalding JDBC Create JDBC connection for every query to avoid Vertica closed connection issue Work Done (Interpreter)
  • 13. Notebook authorization Data source authorization Run scheduled notes with a user Scalding interpreter Reduce websocket communication Paragraph footer Row level permissions Work Contributed to Apache Project
  • 14. Stability, Scalability Jetty9 Interpreters Scalding, Spark, R Multiuser scalability, authentication, integration with Twitter sources Use JDBC interpreter instead of Hive Future / Work in Progress
  • 15. Features and UX Notebook organization (folders) Email reports and alerts Row level permissions like tableau Operations Monitoring (end-to-end query) Admin (view/stop running jobs, resource usage) Failover Continuous Integration Future / Work in Progress
  • 16. There is a real need for Notebook style interface for data analysis Zeppelin is enterprise ready, flexible and easy to use Zeppelin user and dev community is awesome Takeaways