SlideShare a Scribd company logo

Serverless Data Architecture at scale on Google Cloud Platform

Lorenzo Ridi (noovle) - Caso d'uso di sentiment analysis sulla Google Cloud Platform

Serverless Data Architecture at scale on Google Cloud Platform

1 of 61
Download to read offline
Serverless Data Architecture at scale on
Google Cloud Platform
Lorenzo Ridi
Machine Learning/Data Science Meetup
Rome, 02-02-2017
I’ve been a
Research Fellow @UniFI
I am a
Software Engineer @Noovle
I am a
Google Cloud Platform
Qualified Developer
I am a
Google Cloud Platform
Authorized Trainer
Hi, I’m Lorenzo!
Google’s Mission
Organize the world’s information and make
it universally accessible and useful.
“
”
2002 2004 2006 2008 2010 2012 2014 2016
GFS
MapReduce TensorFlow
BigTable
Dremel
Colossus
Flume
Megastore
Spanner
Millwheel
PubSub
F1
Google’s Data Research
2002 2004 2006 2008 2010 2012 2014 2016
ML
PubSub
DataFlow
DataStore
DataFlow
Cloud Storage
BigQuery
BigTable
DataProc
Cloud Storage
Google’s Data Products
GA
Cloud
Natural
Language
BetaGAGA
Cloud
Speech
Cloud
Translate
Cloud
Vision
Stay tuned...
Fully trained ML models from Google Cloud that allow a general developer to take
advantage of rich machine learning capabilities with simple REST based services.
Pre-Trained Machine Learning Models

Recommended

Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi...
Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi...Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi...
Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi...Codemotion
 
Google Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teamsGoogle Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teamsBarton Rhodes
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningDataWorks Summit/Hadoop Summit
 
Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)Karel Dumon
 
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
[Giovanni Galloro] How to use machine learning on Google Cloud Platform[Giovanni Galloro] How to use machine learning on Google Cloud Platform
[Giovanni Galloro] How to use machine learning on Google Cloud PlatformMeetupDataScienceRoma
 
Machine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMatthias Feys
 

More Related Content

What's hot

ML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks BootcampML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks BootcampKarel Dumon
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoopColin Su
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalSub Szabolcs Feczak
 
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLScaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLSeldon
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUsiguazio
 
CICD Pipeline and delivery of Apache Spark Applications on the cloud using AWS
CICD Pipeline and delivery of Apache Spark Applications on the cloud using AWSCICD Pipeline and delivery of Apache Spark Applications on the cloud using AWS
CICD Pipeline and delivery of Apache Spark Applications on the cloud using AWSData Con LA
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on SparkMathieu Dumoulin
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Mathieu Dumoulin
 
Machine learning at scale by Amy Unruh from Google
Machine learning at scale by  Amy Unruh from GoogleMachine learning at scale by  Amy Unruh from Google
Machine learning at scale by Amy Unruh from GoogleBill Liu
 
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDistributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDatabricks
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringKeith Kraus
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksDatabricks
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsDatabricks
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLDESMOND YUEN
 
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data AnalyticsOpen Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analyticsinside-BigData.com
 
Spark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit
 
ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016Keith Kraus
 

What's hot (20)

ML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks BootcampML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks Bootcamp
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoop
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLScaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
CICD Pipeline and delivery of Apache Spark Applications on the cloud using AWS
CICD Pipeline and delivery of Apache Spark Applications on the cloud using AWSCICD Pipeline and delivery of Apache Spark Applications on the cloud using AWS
CICD Pipeline and delivery of Apache Spark Applications on the cloud using AWS
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on Spark
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
Machine learning at scale by Amy Unruh from Google
Machine learning at scale by  Amy Unruh from GoogleMachine learning at scale by  Amy Unruh from Google
Machine learning at scale by Amy Unruh from Google
 
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDistributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud Environments
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data AnalyticsOpen Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Spark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan Zvara
 
ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016
 

Viewers also liked

AI for business: Capire l'opportunità
AI for business: Capire l'opportunitàAI for business: Capire l'opportunità
AI for business: Capire l'opportunitàMeetupDataScienceRoma
 
Introduzione Deep Learning & TensorFlow
Introduzione Deep Learning & TensorFlowIntroduzione Deep Learning & TensorFlow
Introduzione Deep Learning & TensorFlowMeetupDataScienceRoma
 
A Tour of Google Cloud Platform
A Tour of Google Cloud PlatformA Tour of Google Cloud Platform
A Tour of Google Cloud PlatformColin Su
 
Google Cloud Platform and Kubernetes
Google Cloud Platform and KubernetesGoogle Cloud Platform and Kubernetes
Google Cloud Platform and KubernetesKasper Nissen
 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformUnderstanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformDr. Ketan Parmar
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Chris Jang
 
Google Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleGoogle Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleIdan Tohami
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud Platformdhruv_chaudhari
 
Google Cloud for Data Crunchers - Strata Conf 2011
Google Cloud for Data Crunchers - Strata Conf 2011Google Cloud for Data Crunchers - Strata Conf 2011
Google Cloud for Data Crunchers - Strata Conf 2011Patrick Chanezon
 
Devday 2014 using_afs_in_your_cloud_app
Devday 2014 using_afs_in_your_cloud_appDevday 2014 using_afs_in_your_cloud_app
Devday 2014 using_afs_in_your_cloud_appMihail Mateev
 
Travelling to the far side of Andromeda
Travelling to the far side of AndromedaTravelling to the far side of Andromeda
Travelling to the far side of AndromedaJose Miguel Esparza
 
Clash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft AzureClash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft AzureMihail Mateev
 
Power Comparison Power Comparison of Cloud Data of Cloud Data Center Architec...
Power Comparison Power Comparison of Cloud Data of Cloud Data Center Architec...Power Comparison Power Comparison of Cloud Data of Cloud Data Center Architec...
Power Comparison Power Comparison of Cloud Data of Cloud Data Center Architec...Paolo Giaccone
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Data Science Thailand
 
스타트업3년사용기
스타트업3년사용기스타트업3년사용기
스타트업3년사용기소리 강
 
Plan your Log Platform at Google Cloud Platform
Plan your Log Platform at Google Cloud PlatformPlan your Log Platform at Google Cloud Platform
Plan your Log Platform at Google Cloud PlatformSimon Su
 
Google Cloud Platform 概要
Google Cloud Platform 概要Google Cloud Platform 概要
Google Cloud Platform 概要Kiyoshi Fukuda
 
A Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksA Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksRyousei Takano
 
Dropbox - Architecture and Business Prospective
Dropbox - Architecture and Business ProspectiveDropbox - Architecture and Business Prospective
Dropbox - Architecture and Business ProspectiveChiara Cilardo
 

Viewers also liked (20)

AI for business: Capire l'opportunità
AI for business: Capire l'opportunitàAI for business: Capire l'opportunità
AI for business: Capire l'opportunità
 
Introduzione Deep Learning & TensorFlow
Introduzione Deep Learning & TensorFlowIntroduzione Deep Learning & TensorFlow
Introduzione Deep Learning & TensorFlow
 
Introduzione
IntroduzioneIntroduzione
Introduzione
 
A Tour of Google Cloud Platform
A Tour of Google Cloud PlatformA Tour of Google Cloud Platform
A Tour of Google Cloud Platform
 
Google Cloud Platform and Kubernetes
Google Cloud Platform and KubernetesGoogle Cloud Platform and Kubernetes
Google Cloud Platform and Kubernetes
 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformUnderstanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud Platform
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
 
Google Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleGoogle Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scale
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud Platform
 
Google Cloud for Data Crunchers - Strata Conf 2011
Google Cloud for Data Crunchers - Strata Conf 2011Google Cloud for Data Crunchers - Strata Conf 2011
Google Cloud for Data Crunchers - Strata Conf 2011
 
Devday 2014 using_afs_in_your_cloud_app
Devday 2014 using_afs_in_your_cloud_appDevday 2014 using_afs_in_your_cloud_app
Devday 2014 using_afs_in_your_cloud_app
 
Travelling to the far side of Andromeda
Travelling to the far side of AndromedaTravelling to the far side of Andromeda
Travelling to the far side of Andromeda
 
Clash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft AzureClash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft Azure
 
Power Comparison Power Comparison of Cloud Data of Cloud Data Center Architec...
Power Comparison Power Comparison of Cloud Data of Cloud Data Center Architec...Power Comparison Power Comparison of Cloud Data of Cloud Data Center Architec...
Power Comparison Power Comparison of Cloud Data of Cloud Data Center Architec...
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)
 
스타트업3년사용기
스타트업3년사용기스타트업3년사용기
스타트업3년사용기
 
Plan your Log Platform at Google Cloud Platform
Plan your Log Platform at Google Cloud PlatformPlan your Log Platform at Google Cloud Platform
Plan your Log Platform at Google Cloud Platform
 
Google Cloud Platform 概要
Google Cloud Platform 概要Google Cloud Platform 概要
Google Cloud Platform 概要
 
A Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksA Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center Networks
 
Dropbox - Architecture and Business Prospective
Dropbox - Architecture and Business ProspectiveDropbox - Architecture and Business Prospective
Dropbox - Architecture and Business Prospective
 

Similar to Serverless Data Architecture at scale on Google Cloud Platform

Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)DoiT International
 
The App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxThe App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxNebulaworks
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014Puppet
 
Introduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google CloudIntroduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google Cloudwesley chun
 
Accessing Google Cloud APIs
Accessing Google Cloud APIsAccessing Google Cloud APIs
Accessing Google Cloud APIswesley chun
 
Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Holden Karau
 
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online BootcampBuilding Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online BootcampData Con LA
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Pythonwesley chun
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark DownscalingDatabricks
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPSACA IT-Solutions
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITStijn Wijndaele
 
Heroku to Kubernetes & Gihub to Gitlab success story
Heroku to Kubernetes & Gihub to Gitlab success storyHeroku to Kubernetes & Gihub to Gitlab success story
Heroku to Kubernetes & Gihub to Gitlab success storyJérémy Wimsingues
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
GT-Mconf - Transfer of Technology Course
GT-Mconf - Transfer of Technology CourseGT-Mconf - Transfer of Technology Course
GT-Mconf - Transfer of Technology Coursemconf
 

Similar to Serverless Data Architecture at scale on Google Cloud Platform (20)

Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)
 
The App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxThe App Developer's Kubernetes Toolbox
The App Developer's Kubernetes Toolbox
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Introduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google CloudIntroduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google Cloud
 
Accessing Google Cloud APIs
Accessing Google Cloud APIsAccessing Google Cloud APIs
Accessing Google Cloud APIs
 
Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018
 
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online BootcampBuilding Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPS
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-IT
 
Heroku to Kubernetes & Gihub to Gitlab success story
Heroku to Kubernetes & Gihub to Gitlab success storyHeroku to Kubernetes & Gihub to Gitlab success story
Heroku to Kubernetes & Gihub to Gitlab success story
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
GT-Mconf - Transfer of Technology Course
GT-Mconf - Transfer of Technology CourseGT-Mconf - Transfer of Technology Course
GT-Mconf - Transfer of Technology Course
 

More from MeetupDataScienceRoma

Serve Davvero il Machine Learning nelle PMI? | Niccolò Annino
Serve Davvero il Machine Learning nelle PMI? | Niccolò AnninoServe Davvero il Machine Learning nelle PMI? | Niccolò Annino
Serve Davvero il Machine Learning nelle PMI? | Niccolò AnninoMeetupDataScienceRoma
 
Meta-learning through the lenses of Statistical Learning Theory (Carlo Cilibe...
Meta-learning through the lenses of Statistical Learning Theory (Carlo Cilibe...Meta-learning through the lenses of Statistical Learning Theory (Carlo Cilibe...
Meta-learning through the lenses of Statistical Learning Theory (Carlo Cilibe...MeetupDataScienceRoma
 
Claudio Gallicchio - Deep Reservoir Computing for Structured Data
Claudio Gallicchio - Deep Reservoir Computing for Structured DataClaudio Gallicchio - Deep Reservoir Computing for Structured Data
Claudio Gallicchio - Deep Reservoir Computing for Structured DataMeetupDataScienceRoma
 
Docker for Deep Learning (Andrea Panizza)
Docker for Deep Learning (Andrea Panizza)Docker for Deep Learning (Andrea Panizza)
Docker for Deep Learning (Andrea Panizza)MeetupDataScienceRoma
 
Machine Learning for Epidemiological Models (Enrico Meloni)
Machine Learning for Epidemiological Models (Enrico Meloni)Machine Learning for Epidemiological Models (Enrico Meloni)
Machine Learning for Epidemiological Models (Enrico Meloni)MeetupDataScienceRoma
 
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)MeetupDataScienceRoma
 
Web Meetup #2: Modelli matematici per l'epidemiologia
Web Meetup #2: Modelli matematici per l'epidemiologiaWeb Meetup #2: Modelli matematici per l'epidemiologia
Web Meetup #2: Modelli matematici per l'epidemiologiaMeetupDataScienceRoma
 
Deep red - The environmental impact of deep learning (Paolo Caressa)
Deep red - The environmental impact of deep learning (Paolo Caressa)Deep red - The environmental impact of deep learning (Paolo Caressa)
Deep red - The environmental impact of deep learning (Paolo Caressa)MeetupDataScienceRoma
 
Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...
Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...
Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...MeetupDataScienceRoma
 
Multimodal AI Approach to Provide Assistive Services (Francesco Puja)
Multimodal AI Approach to Provide Assistive Services (Francesco Puja)Multimodal AI Approach to Provide Assistive Services (Francesco Puja)
Multimodal AI Approach to Provide Assistive Services (Francesco Puja)MeetupDataScienceRoma
 
Introduzione - Meetup MLOps & Assistive AI
Introduzione - Meetup MLOps & Assistive AIIntroduzione - Meetup MLOps & Assistive AI
Introduzione - Meetup MLOps & Assistive AIMeetupDataScienceRoma
 
Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)
Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)
Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)MeetupDataScienceRoma
 
Mario Incarnati - The power of data visualization
Mario Incarnati - The power of data visualizationMario Incarnati - The power of data visualization
Mario Incarnati - The power of data visualizationMeetupDataScienceRoma
 
OLIVAW: reaching superhuman strength at Othello
OLIVAW: reaching superhuman strength at OthelloOLIVAW: reaching superhuman strength at Othello
OLIVAW: reaching superhuman strength at OthelloMeetupDataScienceRoma
 
Bring your neural networks to the browser with TF.js - Simone Scardapane
Bring your neural networks to the browser with TF.js - Simone ScardapaneBring your neural networks to the browser with TF.js - Simone Scardapane
Bring your neural networks to the browser with TF.js - Simone ScardapaneMeetupDataScienceRoma
 
Meetup Gennaio 2019 - Slide introduttiva
Meetup Gennaio 2019 - Slide introduttivaMeetup Gennaio 2019 - Slide introduttiva
Meetup Gennaio 2019 - Slide introduttivaMeetupDataScienceRoma
 
Bruno Coletta - Data-Driven Creativity in Marketing and Advertising
Bruno Coletta - Data-Driven Creativity in Marketing and AdvertisingBruno Coletta - Data-Driven Creativity in Marketing and Advertising
Bruno Coletta - Data-Driven Creativity in Marketing and AdvertisingMeetupDataScienceRoma
 

More from MeetupDataScienceRoma (20)

Serve Davvero il Machine Learning nelle PMI? | Niccolò Annino
Serve Davvero il Machine Learning nelle PMI? | Niccolò AnninoServe Davvero il Machine Learning nelle PMI? | Niccolò Annino
Serve Davvero il Machine Learning nelle PMI? | Niccolò Annino
 
Meta-learning through the lenses of Statistical Learning Theory (Carlo Cilibe...
Meta-learning through the lenses of Statistical Learning Theory (Carlo Cilibe...Meta-learning through the lenses of Statistical Learning Theory (Carlo Cilibe...
Meta-learning through the lenses of Statistical Learning Theory (Carlo Cilibe...
 
Claudio Gallicchio - Deep Reservoir Computing for Structured Data
Claudio Gallicchio - Deep Reservoir Computing for Structured DataClaudio Gallicchio - Deep Reservoir Computing for Structured Data
Claudio Gallicchio - Deep Reservoir Computing for Structured Data
 
Docker for Deep Learning (Andrea Panizza)
Docker for Deep Learning (Andrea Panizza)Docker for Deep Learning (Andrea Panizza)
Docker for Deep Learning (Andrea Panizza)
 
Machine Learning for Epidemiological Models (Enrico Meloni)
Machine Learning for Epidemiological Models (Enrico Meloni)Machine Learning for Epidemiological Models (Enrico Meloni)
Machine Learning for Epidemiological Models (Enrico Meloni)
 
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
 
Web Meetup #2: Modelli matematici per l'epidemiologia
Web Meetup #2: Modelli matematici per l'epidemiologiaWeb Meetup #2: Modelli matematici per l'epidemiologia
Web Meetup #2: Modelli matematici per l'epidemiologia
 
Deep red - The environmental impact of deep learning (Paolo Caressa)
Deep red - The environmental impact of deep learning (Paolo Caressa)Deep red - The environmental impact of deep learning (Paolo Caressa)
Deep red - The environmental impact of deep learning (Paolo Caressa)
 
[Sponsored] C3.ai description
[Sponsored] C3.ai description[Sponsored] C3.ai description
[Sponsored] C3.ai description
 
Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...
Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...
Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...
 
Multimodal AI Approach to Provide Assistive Services (Francesco Puja)
Multimodal AI Approach to Provide Assistive Services (Francesco Puja)Multimodal AI Approach to Provide Assistive Services (Francesco Puja)
Multimodal AI Approach to Provide Assistive Services (Francesco Puja)
 
Introduzione - Meetup MLOps & Assistive AI
Introduzione - Meetup MLOps & Assistive AIIntroduzione - Meetup MLOps & Assistive AI
Introduzione - Meetup MLOps & Assistive AI
 
Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)
Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)
Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)
 
Mario Incarnati - The power of data visualization
Mario Incarnati - The power of data visualizationMario Incarnati - The power of data visualization
Mario Incarnati - The power of data visualization
 
Machine Learning in the AWS Cloud
Machine Learning in the AWS CloudMachine Learning in the AWS Cloud
Machine Learning in the AWS Cloud
 
OLIVAW: reaching superhuman strength at Othello
OLIVAW: reaching superhuman strength at OthelloOLIVAW: reaching superhuman strength at Othello
OLIVAW: reaching superhuman strength at Othello
 
Bring your neural networks to the browser with TF.js - Simone Scardapane
Bring your neural networks to the browser with TF.js - Simone ScardapaneBring your neural networks to the browser with TF.js - Simone Scardapane
Bring your neural networks to the browser with TF.js - Simone Scardapane
 
Meetup Gennaio 2019 - Slide introduttiva
Meetup Gennaio 2019 - Slide introduttivaMeetup Gennaio 2019 - Slide introduttiva
Meetup Gennaio 2019 - Slide introduttiva
 
Elena Gagliardoni - Neural Chatbot
Elena Gagliardoni - Neural ChatbotElena Gagliardoni - Neural Chatbot
Elena Gagliardoni - Neural Chatbot
 
Bruno Coletta - Data-Driven Creativity in Marketing and Advertising
Bruno Coletta - Data-Driven Creativity in Marketing and AdvertisingBruno Coletta - Data-Driven Creativity in Marketing and Advertising
Bruno Coletta - Data-Driven Creativity in Marketing and Advertising
 

Recently uploaded

Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Thibaud Le Douarin
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxVighnesh Shashtri
 
Recurrent neural network for machine learning
Recurrent neural network for machine learningRecurrent neural network for machine learning
Recurrent neural network for machine learningomogire08
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxMdRafiqulIslam403212
 
data analytics and tools from in2inglobal.pdf
data analytics  and tools from in2inglobal.pdfdata analytics  and tools from in2inglobal.pdf
data analytics and tools from in2inglobal.pdfdigimartfamily
 
PredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxPredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxKapilSinghal47
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for usersStephenEfange3
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)CUO VEERANAN VEERANAN
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referencepriyansabari355
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Cyber Security Experts
 
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...Daniele Malitesta
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaAdrian Sanabria
 
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referencepriyansabari355
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfAustraliaChapterIIBA
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023stephizcoolio
 

Recently uploaded (17)

Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptx
 
Recurrent neural network for machine learning
Recurrent neural network for machine learningRecurrent neural network for machine learning
Recurrent neural network for machine learning
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptx
 
data analytics and tools from in2inglobal.pdf
data analytics  and tools from in2inglobal.pdfdata analytics  and tools from in2inglobal.pdf
data analytics and tools from in2inglobal.pdf
 
PredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxPredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptx
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for users
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)
 
2.pptx
2.pptx2.pptx
2.pptx
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a reference
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
 
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix Enigma
 
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as reference
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023
 

Serverless Data Architecture at scale on Google Cloud Platform

  • 1. Serverless Data Architecture at scale on Google Cloud Platform Lorenzo Ridi Machine Learning/Data Science Meetup Rome, 02-02-2017
  • 2. I’ve been a Research Fellow @UniFI I am a Software Engineer @Noovle I am a Google Cloud Platform Qualified Developer I am a Google Cloud Platform Authorized Trainer Hi, I’m Lorenzo!
  • 3. Google’s Mission Organize the world’s information and make it universally accessible and useful. “ ”
  • 4. 2002 2004 2006 2008 2010 2012 2014 2016 GFS MapReduce TensorFlow BigTable Dremel Colossus Flume Megastore Spanner Millwheel PubSub F1 Google’s Data Research
  • 5. 2002 2004 2006 2008 2010 2012 2014 2016 ML PubSub DataFlow DataStore DataFlow Cloud Storage BigQuery BigTable DataProc Cloud Storage Google’s Data Products
  • 6. GA Cloud Natural Language BetaGAGA Cloud Speech Cloud Translate Cloud Vision Stay tuned... Fully trained ML models from Google Cloud that allow a general developer to take advantage of rich machine learning capabilities with simple REST based services. Pre-Trained Machine Learning Models
  • 7. tensorflow.org github.com/tensorflow Open Source Software Library for Machine Learning. Cloud Machine Learning Managed service that enables you to easily build machine learning models, that work on any type of data, of any size. Use your own data to train models
  • 8. Cracking Black Friday Adding Machine Learning to a serverless data analysis pipeline
  • 9. Black Friday (ˈblæk fraɪdɪ) noun The day following Thanksgiving Day in the United States. Since 1932, it has been regarded as the beginning of the Christmas shopping season.
  • 10. Black Friday in the US 2012 - 2016 source: Google Trends, November 23rd 2016
  • 11. Black Friday in Italy 2012 - 2016 source: Google Trends, November 23rd 2016
  • 12. What are we doing Processing + analytics Tweets about black friday insights
  • 16. What is Google Cloud Pub/Sub? ● Google Cloud Pub/Sub is a fully-managed real-time messaging service. ○ Guaranteed delivery ■ “At least once” semantics ○ Reliable at scale ■ Messages are replicated in different zones
  • 17. From Twitter to Pub/Sub $ gcloud beta pubsub topics create blackfridaytweets Created topic [blackfridaytweets]. SHELL
  • 18. From Twitter to Pub/Sub ? Pub/Sub Topic Subscription A Subscription B Subscription C Consumer A Consumer B Consumer C
  • 19. From Twitter to Pub/Sub ● Simple Python application using the TweePy library # somewhere in the code, track a given set of keywords stream = Stream(auth, listener) stream.filter(track=['blackfriday', [...]]) [...] # somewhere else, write messages to Pub/Sub for line in data_lines: pub = base64.urlsafe_b64encode(line) messages.append({'data': pub}) body = {'messages': messages} resp = client.projects().topics().publish( topic='blackfridaytweets', body=body).execute(num_retries=NUM_RETRIES) PYTHON
  • 20. From Twitter to Pub/Sub App + Libs
  • 21. VM From Twitter to Pub/Sub App + Libs
  • 22. VM From Twitter to Pub/Sub App + Libs
  • 23. From Twitter to Pub/Sub App + Libs Container
  • 24. From Twitter to Pub/Sub App + Libs Container FROM google/python RUN pip install --upgrade pip RUN pip install pyopenssl ndg-httpsclient pyasn1 RUN pip install tweepy RUN pip install --upgrade google-api-python-client RUN pip install python-dateutil ADD twitter-to-pubsub.py /twitter-to-pubsub.py ADD utils.py /utils.py CMD python twitter-to-pubsub.py DOCKERFILE
  • 25. From Twitter to Pub/Sub App + Libs Container
  • 26. From Twitter to Pub/Sub App + Libs Container Pod
  • 27. What is Kubernetes (K8S)? ● An orchestration tool for managing a cluster of containers across multiple hosts ○ Scaling, rolling upgrades, A/B testing, etc. ● Declarative – not procedural ○ Auto-scales and self-heals to desired state ● Supports multiple container runtimes, currently Docker and CoreOS Rkt ● Open-source: github.com/kubernetes
  • 28. From Twitter to Pub/Sub App + Libs Container Pod apiVersion: v1 kind: ReplicationController metadata: [...] Spec: replicas: 1 template: metadata: labels: name: twitter-stream spec: containers: - name: twitter-to-pubsub image: gcr.io/codemotion-2016-demo/pubsub_pipeline env: - name: PUBSUB_TOPIC value: ... YAML
  • 29. From Twitter to Pub/Sub App + Libs Container Pod
  • 30. From Twitter to Pub/Sub App + Libs Container Pod Node
  • 31. Node From Twitter to Pub/Sub Pod A Pod B
  • 32. From Twitter to Pub/Sub Node 1 Node 2
  • 33. From Twitter to Pub/Sub $ gcloud container clusters create codemotion-2016-demo-cluster Creating cluster cluster-1...done. Created [...projects/codemotion-2016-demo/.../clusters/codemotion-2016-demo-cluster]. $ gcloud container clusters get-credentials codemotion-2016-demo-cluster Fetching cluster endpoint and auth data. kubeconfig entry generated for cluster-1. $ kubectl create -f ~/git/kube-pubsub-bq/pubsub/twitter-stream.yaml replicationcontroller “twitter-stream” created. SHELL
  • 38. What is Google Cloud Dataflow? ● Cloud Dataflow is a collection of open source SDKs to implement parallel processing pipelines. ○ same programming model for streaming and batch pipelines ● Cloud Dataflow is a managed service to run parallel processing pipelines on Google Cloud Platform
  • 39. What is Google BigQuery? ● Google BigQuery is a fully- managed Analytic Data Warehouse solution allowing real-time analysis of Petabyte- scale datasets. ● Enterprise-grade features ○ Batch and streaming (100K rows/sec) data ingestion ○ JDBC/ODBC connectors ○ Rich SQL-2011-compliant query language ○ Supports updates and deletes new! new!
  • 40. From Pub/Sub to BigQuery Pub/Sub Topic Subscription Read tweets from Pub/Sub Format tweets for BigQuery Write tweets on BigQuery BigQuery Table Dataflow Pipeline
  • 41. From Pub/Sub to BigQuery ● A Dataflow pipeline is a Java program. // TwitterProcessor.java public static void main(String[] args) { Pipeline p = Pipeline.create(); PCollection<String> tweets = p.apply(PubsubIO.Read.topic("...blackfridaytweets")); PCollection<TableRow> formattedTweets = tweets.apply(ParDo.of(new DoFormat())); formattedTweets.apply(BigQueryIO.Write.to(tableReference)); p.run(); } JAVA
  • 42. From Pub/Sub to BigQuery ● A Dataflow pipeline is a Java program. // TwitterProcessor.java // Do Function (to be used within a ParDo) private static final class DoFormat extends DoFn<String, TableRow> { private static final long serialVersionUID = 1L; @Override public void processElement(DoFn<String, TableRow>.ProcessContext c) { c.output(createTableRow(c.element())); } } // Helper method private static TableRow createTableRow(String tweet) throws IOException { return JacksonFactory.getDefaultInstance().fromString(tweet, TableRow.class); } JAVA
  • 43. From Pub/Sub to BigQuery ● Use Maven to build, deploy or update the Pipeline. $ mvn compile exec:java -Dexec.mainClass=it.noovle.dataflow.TwitterProcessor -Dexec.args="--streaming" [...] INFO: To cancel the job using the 'gcloud' tool, run: > gcloud alpha dataflow jobs --project=codemotion-2016-demo cancel 2016-11- 19_15_49_53-5264074060979116717 [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 18.131s [INFO] Finished at: Sun Nov 20 00:49:54 CET 2016 [INFO] Final Memory: 28M/362M [INFO] ------------------------------------------------------------------------ SHELL
  • 44. From Pub/Sub to BigQuery ● You can monitor your pipelines from Cloud Console.
  • 45. From Pub/Sub to BigQuery ● Data start flowing into BigQuery tables. You can run queries from the CLI or the Web Interface.
  • 50. Pub/Sub Kubernetes Dataflow BigQuery How we’re gonna do it Natural Language API Data Studio
  • 51. Sentiment Analysis with Natural Language API Polarity: [-1,1] Magnitude: [0,+inf) Text
  • 52. Sentiment Analysis with Natural Language API Polarity: [-1,1] Magnitude: [0,+inf) Text sentiment = polarity x magnitude
  • 53. Sentiment Analysis with Natural Language API Pub/Sub Topic Read tweets from Pub/Sub Write tweets on BigQuery BigQuery Tables Dataflow Pipeline Filter and Evaluate sentiment Format tweets for BigQuery Write tweets on BigQuery Format tweets for BigQuery
  • 54. From Pub/Sub to BigQuery ● We just add the additional necessary steps. // TwitterProcessor.java public static void main(String[] args) { Pipeline p = Pipeline.create(); PCollection<String> tweets = p.apply(PubsubIO.Read.topic("...blackfridaytweets")); PCollection<String> sentTweets = tweets.apply(ParDo.of(new DoFilterAndProcess())); PCollection<TableRow> formSentTweets = sentTweets.apply(ParDo.of(new DoFormat())); formSentTweets.apply(BigQueryIO.Write.to(sentTableReference)); PCollection<TableRow> formattedTweets = tweets.apply(ParDo.of(new DoFormat())); formattedTweets.apply(BigQueryIO.Write.to(tableReference)); p.run(); } JAVA PCollection<String> sentTweets = tweets.apply(ParDo.of(new DoFilterAndProcess())); PCollection<TableRow> formSentTweets = sentTweets.apply(ParDo.of(new DoFormat())); formSentTweets.apply(BigQueryIO.Write.to(sentTableReference));
  • 55. From Pub/Sub to BigQuery ● The update process preserves all in-flight data. $ mvn compile exec:java -Dexec.mainClass=it.noovle.dataflow.TwitterProcessor -Dexec.args="--streaming --update --jobName=twitterprocessor-lorenzo-1107222550" [...] INFO: To cancel the job using the 'gcloud' tool, run: > gcloud alpha dataflow jobs --project=codemotion-2016-demo cancel 2016-11- 19_15_49_53-5264074060979116717 [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 18.131s [INFO] Finished at: Sun Nov 20 00:49:54 CET 2016 [INFO] Final Memory: 28M/362M [INFO] ------------------------------------------------------------------------ SHELL
  • 56. From Pub/Sub to BigQuery

Editor's Notes

  1. Black Friday is the biggest selling event in the US, and since 1932 it demarcated the begin of the Christmas shopping season.
  2. Interest about Black Friday in the US remained unchanged in the last years, according to Google Trends.
  3. However, if we perform the same analysis in Italy, we can see that interest about Black Friday in Italy grew exponentially. That’s why there is no company (even Worldwide) who can ignore this day. Companies can take advantage of Black Friday to advertise themselves and sell more We are going to step into the shoes of a company that wants to propose some deals specific to Black Friday, so the problem is: how to make and on which channels we have to advertise the deals to maximize revenues?
  4. Social Networks like Twitter can help a lot about analyzing people trends and opinions and supporting us to making the right decision. So today we are focusing on Twitter. <selected hashtags>
  5. This is how we want to do this. The story is more or less always the same: we get some data we process it (removing unnecessary things, transforming others) we store the data in a format that is good for analysis. Complexities: We do not have so much time We have to make it work even if we don’t know the traffic we will have to handle (how high is the peak we saw before?)
  6. Our solution is to adopt a serverless architecture: We want to use services that allow us to concentrate on our solution, rather than config files and boilerplate code We do not have to configure or manage the infrastructure We choose Google Cloud Platform because its Data Analytics offering is based exactly on these foundations. Today we are going to explore almost all the tools of GCP for Data Analytics. So, let’s start this whirlwind tour!
  7. Let’s start from the beginning. For the ingestion part we are going to use two technologies: Google Container Engine, the technology that powers Kubernetes-as-a-service (who knows Kubernetes? Containers/Docker?) on GCP Google Cloud Pub/Sub, a middleware solution on the Cloud
  8. Pub/Sub is a fully managed real time messaging service. I create a topic, I can send messages to a topic, if I’m interested in a topic i can subscribe to it and I start receiving messages. Nothing new, other technologies do this. However, Pub/Sub has a few strong points: It is a service, i do not have to configure a cluster It is reliable by design It keeps being reliable at scale
  9. How do I create a Pub/Sub topic? Without going much into detail, it is a one-liner. gcloud is the command line tool that manages all Google Cloud Platform resources.
  10. This is how we are going to use Pub/Sub: we implement something that converts tweets into messages, and by means of Pub/Sub we can distribute these tweets to several subscribers with ease. Pub/Sub decouples producers and consumers: they do not have to know each others It improves the reliability of the overall system, acting as a shock-absorber even if some parts of the following infrastructure has problems. We have a missing part here: how do we capture tweets and transform them in messages?
  11. We write a simple Python app that uses the TweePy library to interact with Twitter Streaming API Somewhere we use the stream.filter method to track a list of keywords somewhere else (in the listener of TweePy events) we collect tweets, packaging them and sending them out as Pub/Sub messages (note the Pub/Sub topic name)
  12. We wrote the app, we tested it. Now we have to deploy it (and its library) somewhere. Our first temptation would be...
  13. To start a Virtual Machine, install python on it and make it run there. However...
  14. This is not the solution we want. It doesn’t scale It is hard to make fault-tolerant (if the VM crashes it doesn’t restart) It is difficult to deploy and to update (no rolling update)
  15. A much better solution is to use containers. Containers provide an higher level of abstraction (OS-level rather than HW-level), that allows us to create portable and isolated deployments that can be installed easily on on-prem or Cloud environments.
  16. We create a docker image using a dockerfile, which is a sequence of instruction that, starting from a base image, add some pieces to build our personal solution. In this case we: Install necessary libraries Add our Python files Invoke our Python executable file (the container will run as long as this command does)
  17. We build an image based on the dockerfile and we are done. But, a container solves the problem of deploy and portability, but not the one of scaling and management.
  18. We need a further layer of abstraction, and this level of abstraction is provided by Kubernetes.
  19. Kubernetes is an open source orchestration tool for managing clusters of containers. It introduces all those features that are missing from “standard” container deployments. A cool thing about Kubernetes is that it is completely declarative - you do not specify that you want one more node or one less pod, but you define a desired state and the Kubernetes Master works to reach and maintain that state.
  20. This is what we deploy on Kubernetes: a ReplicationController (or a ReplicaSet/Deployment in recent versions) is the definition of a group of container replicas that you want concurrently running. For the sake of our example we need only one replica, but also in this case a ReplicationController is useful - as it ensures that this single replica is always up and running.
  21. So we wrap our container into a Pod. The Pod is the replica unit of Kubernetes.
  22. Each Pod runs on a cluster node, but...
  23. ...more than one Pod can run on a single node. The allocation of Pods on nodes are managed by the Kubernetes Master, which is a particular cluster node. In Container Engine the K8S Master is completely managed (and free!)
  24. Since version 1.3 Kubernetes supports also autoscaling of nodes. If there isn’t sufficient resources available to keep up with Pods scaling, node pool is enlarged.
  25. Creating a Kubernetes cluster is easy: 1) we create the cluster 2) we acquire Kubernetes credentials using gcloud 3) we use kubectl (opensource CLI) to submit commands to the Kubernetes Master
  26. Once the cluster has been created, we can monitor all worker nodes from the Cloud Console. Here we have one node, that contains one Pod, that contains one Container, that contains our application, that is transforming Tweets in Pub/Sub messages.
  27. Cool! We have implemented the first piece of our processing chain. What’s next?
  28. For the processing we want something equally scalable, so we are going to use a technology named Google Cloud Dataflow and...
  29. ...for the storage we are going to use Google BigQuery.
  30. Google Cloud Dataflow is two things: A collection of open source SDKs to implement parallel processing pipelines. The cool thing of being open source is that it means that runners for Dataflow pipelines have already been implemented for other opensource processing technologies, like Apache Spark or Apache Flink. (all the code I’ve written for that demo could run in an open source environment) The project itself is now an Apache Incubator project called Apache Beam. Cloud Dataflow is also a managed service on Google Cloud Platform that runs Apache Beam pipelines.
  31. Google BigQuery is an analytic data warehouse with impressive (almost magical) performances. It comes with a series of features that make it a valid choice as an enterprise-grade DWH: The ability to ingest streaming and batch data JDBC and ODBC connectors to guarantee interoperability A rich query language, which has now been renewed to support standard ANSI SQL-2011 A new Data Manipulation Language that supports updates and deletes
  32. How we are going to make use of these tools? We will build a simple Dataflow pipeline that is composed by three steps: Read tweets from Pub/Sub Transform tweets so as to conform with BigQuery API Write tweets on BigQuery For “tweet” I do not mean only the text, but all the informations that are returned by Twitter APIs (infos about the user,etc)
  33. The implementation is very easy: this is one of the best parts of Cloud Dataflow wrt existing processing technologies like MapReduce. First, we create a Pipeline object First operation is performed invoking an apply method to the Pipeline object, and using a Source to create collections of data called PCollections. In this case, we are using a PubSub Source to create a so-called unbounded PCollection (that is, a PCollection without a limited number of elements) All subsequent operations are performed by invoking apply methods on PCollections, which in turn generate other PCollections The simplest operation you can apply on a PCollection is a ParDo (ParallelDo), that process every element of the PCollection independently from the others. We write data by applying a transform At the end, we tell the system to run the pipeline. The source (PubSubIO) determines if the pipeline is a streaming or a batch one. All the other components (like BigQueryIO) adapt themselves consequently, e.g. BigQueryIO uses Streaming APIs in streaming mode and Load Jobs in batch mode.
  34. The simplest operation you can apply on a PCollection is a ParDo (ParallelDo), that process every element of the PCollection independently from the others. The argument of a ParDo is a DoFn object, we need to redefine the processElement method to instruct the system to do the right thing.
  35. The easiest way to deploy a Datalab Pipeline is using Maven. (hidden some complexity here, like the choice of the runner, the staging location)
  36. Once your pipeline is deployed, you can monitor its execution from the Cloud Console.
  37. You can check if data are actually being processed by querying the destination BigQuery table. It works! We built a very simple processing pipeline that streams data in real-time to our DWH and allows us to query results right as they are coming in. What now?
  38. Now we have to find some interesting analyses that we can evaluate on our data, represent them in a readable and shareable manner
  39. Google Data Studio is a BI solution that allows the creation of dashboards and graphs from several sources, including BigQuery.
  40. Here you see an example showing the number of tweets per state in the US. Not very fancy. In fact, we soon realize that the informations we have from raw data don’t give us very “smart” insights.
  41. We need to enrich our data model in some way. The good news is that Google released a series of APIs exposing ready-to-use Machine Learning algorithms and models. The one that seems to fit our case is...
  42. ...Natural Language APIs. These APIs can perform several different tasks on text strings: extract the syntactic structure of sentences extract entities that are mentioned within a text and even perform sentiment analysis.
  43. The Sentiment analysis API takes a text in input and returns two float values: Polarity (float ranging from -1 to 1) expresses the mood of the text: positive values denote positive moods Magnitude (float ranging from 0 to +inf) expresses the intensity of the feeling. Higher values denote stronger feelings.
  44. Our personal simplistic definition of “sentiment” will be “polarity times magnitude”.
  45. Let’s modify our pipeline. For illustration purposes we will maintain the old flow adding another one to implement the sentiment analysis. The evaluation of the sentiment will happen only for a subset of tweets (those that explicitly contain the words “blackfriday”)
  46. How does this reflect on the Pipeline code? We only have to add three lines of code (I’m lying!) Note how we start from the “tweets” PCollection both for the processing and the write of raw data. Note also how we can reuse the DoFormat function for both flows.
  47. Updating a pipeline is easy if the update doesn’t modify the existing structure (we are only adding new pieces). We only have to provide the name of the job we want to update. Dataflow will take care of draining the existing pipeline before shutting it down.
  48. The Cloud Console shows the updated pipeline, and new “enriched” data is immediately available in a BigQuery table.
  49. We did it! We built a serverless scalable data solution based on Google Cloud Platform. One interesting aspect about this architecture is that it is completely no-ops, and...
  50. ...it has integrated logging, monitoring and alerting thanks to Google Stackdriver. And we didn’t have to do anything!
  51. Let me show you the final solution. We will see how easy it is to query data, monitor the infrastructure, and we will give a look to some dashboards.
  52. When you detect an anomaly in one of the trends, you can drill down in BigQuery to explore the reasons. Walmart popularity is not so high mainly due to their decision of starting Black Friday sales at 6 PM on Thanksgiving Day Amazon popularity dropped down right after they announced their first “Black Friday Week” deals, which apparently did not meet customers’ expectations (they are recovering, though :)