SlideShare a Scribd company logo
Creating #serverless data analytics
system on GCP using BigQuery
Márton Kodok / @martonkodok
Google Developer Expert at REEA.net
March 2018 - Tirgu Mures, Romania
● Geek. Hiker. Do-er.
● Among the Top3 romanians on Stackoverflow 120k reputation
● Google Developer Expert on Cloud technologies
● Crafting Web/Mobile backends at REEA.net
● BigQuery/Redis and database engine expert
● Active in mentoring and IT community
Twitter: @martonkodok
StackOverflow: pentium10
Slideshare: martonkodok
GitHub: pentium10
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
About me
REEA.net uses GCP
Build on the same infrastructure
that powers Google
Google Cloud Platform (GCP)
Compute Big Data
BigQuery
Cloud
Dataflow
Cloud
Dataproc
Cloud
Datalab
Cloud
Pub/Sub
Genomics
Storage & Databases
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud SQL
Cloud
Spanner
Persistent
Disk
Machine Learning
Cloud Machine
Learning
Cloud
Vision API
Cloud
Speech API
Cloud Natural
Language API
Cloud
Translation
API
Cloud
Jobs API
Data
Studio
Cloud
Dataprep
Cloud Video
Intelligence
API
Advanced
Solutions Lab
Compute
Engine
App
Engine
Kubernetes
Engine
GPU
Cloud
Functions
Container-
Optimized OS
Identity & Security
Cloud IAM
Cloud Resource
Manager
Cloud Security
Scanner
Key
Management
Service
BeyondCorp
Data Loss
Prevention API
Identity-Aware
Proxy
Security Key
Enforcement
Internet of Things
Cloud IoT
Core
Transfer
Appliance
Google Cloud Platform (GCP)
Developer Tools
Cloud SDK
Cloud
Deployment
Manager
Cloud Source
Repositories
Cloud
Tools for
Android Studio
Cloud Tools
for IntelliJ
Cloud
Tools for
PowerShell
Cloud
Tools for
Visual Studio
Container
Registry
Google Plug-in
for Eclipse
Cloud Test
Lab
Networking
Virtual
Private Cloud
Cloud Load
Balancing
Cloud
CDN
Cloud
Interconnect
Cloud DNS
Cloud
Network
Cloud
External IP
Addresses
Cloud
Firewall Rules
Cloud
Routes
Cloud VPN
Management Tools
Stackdriver Monitoring Logging
Error
Reporting
Trace
Debugger
Cloud
Deployment
Manager
Cloud
Endpoints
Cloud
Console
Cloud
Shell
Cloud Mobile
App
Cloud
Billing API
Cloud
APIs
Cloud
Router
Dedicated
Interconnect
Container
Builder
Meet Serverless
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Meet Serverless
serverless data center depicted
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Event-driven serverless compute platform
Cloud
Services
Changes in data state
Business logic events
Integrations
Event Router
Gateway
HTTPS
Event Source
Multiple Platforms
Data Warehouse
Pub/Sub
Cloud Functions
Streaming
Business Value
Application
Task
Analysis
Serverless is about maximizing elasticity, cost
savings, and agility of cloud computing.
@martonkodok
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Crafting a solution for building high-performance,
petabyte scale data analytics, serverless
reporting system on Google Cloud Platform
Goal today
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Legacy Reporting System
App
Cloud Load
Balancing
NGINX
Compute Engine
10GB PD
2 1
Database Service (Master/Slave)
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Serverless Reporting System
App
Cloud Load
Balancing
NGINX
Compute Engine
10GB PD
2 1
Database Service (Master/Slave)
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
BigQuery Data Studio
Report & Share
Business Analysis
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Analytics-as-a-Service - Data Warehouse in the Cloud
Scales into Petabytes on Managed Google Infrastructure (US or EU zone)
SQL 2011 + Javascript UDF (User Defined Functions)
Familiar DB Structure (table, views, struct, nested, JSON)
Integrates with Google Sheets + Cloud Storage + Pub/Sub connectors
Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *Mar 2018
Open Interfaces (Web UI, BQ command line tool, REST, ODBC)
What is BigQuery?
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Columnar storage (max 10 000 columns in table)
Large files for loading: 5TB (CSV or JSON)
UDF in Javascript or SQL
Rich SQL 2011: JSON,IP,Math,RegExp,Geocode,Window functions
Modern data types: Record, Nested, Struct, Array.
Append-only tables prefered (DML syntax available)
Day column partitioned tables (select * from t where day=’2018-01-01’)
BigQuery: Convenience of SQL
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Architecting for The Cloud
BigQuery
On-Premises Servers
Pipelines
ETL
Engine
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
“ Our project generates many/big files.
How can I seamlessly ingest them?
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Serverless file ingest
BigQuery
On-Premises Servers
ApplicationEvent Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Cloud
Storage
Cloud
Functions
Triggered Code
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
“ Data needs to be processed in
multiple services.
How can we pipe to multiple places?
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Architecting for The Cloud
On-Premises Servers
Event Sourcing
Frontend
Platform Services
Analyze
Metrics / Logs/
Streaming
Cloud Storage
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Cloud
Dataflow
Process
BigQuery
Cloud SQL
Stream
Batch
Data
Studio
Third-Party
Tools
“ We have our app outside of GCP.
How can we use the benefits of BigQuery?
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Data Pipeline Integration at REEA.net
Analytics Backend
BigQuery
On-Premises Servers
Pipelines
FluentD
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Development
Team
Data Analysts
Report & Share
Business Analysis
Tools
Tableau
QlikView
Data Studio
Internal
Dashboard
Database
SQL
Application
ServersServers
Cloud Storage
archive
Load
Export
Replay
Standard
Devices
HTTPS
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
The following slides will present a sample Fluentd configuration to:
1. Transform a record
2. Copy event to multiple outputs
3. Store event data in File (for backup/log purposes)
4. Stream to BigQuery (for immediate analyses)
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
<filter frontend.user.*>
@type record_transformer
</filter>
<match frontend.user.*>
@type copy
<store>
@type forest
subtype file
</store>
<store>
@type bigquery
</store>
…
</match>
Filter plugin mutates incoming data. Add/modify/delete
event data transform attributes without a code deploy.1
2
3
4
The copy output plugin copies events to multiple outputs.
File(s), multiple databases, DB engines.
Great to ship same event to multiple subsystems.
The Bigquery output plugin on the fly streams the event to
the BigQuery warehouse. No need to write integration.
Data is available immediately for querying.
Whenever needed other output plugins can be wired in:
Kafka, Google Cloud Storage output plugin.
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
record_transformer copy file BigQuery
<filter frontend.user.*>
@type record_transformer
enable_ruby
remove_keys host
<record>
bq {"insert_id":"${uid}","host":"${host}",
"created":"${time.to_i}"}
avg ${record["total"] / record["count"]}
</record>
</filter>
syntax: Ruby, easy to use.
Great for:
- date transformation,
- quick normalizations,
- calculating something on the fly,
and store in clear log/analytics db
- renaming without code deploy.
1 2 3 4
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
record_transformer copy file BigQuery
<match frontend.user.*>
@type copy
<store>
@type forest
subtype file
<template>
path /tank/storage/${tag}.*.log
time_slice_format %Y%m%d
</template>
</store>
</match>
1 2 3 4
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
record_transformer copy file BigQuery
<match frontend.user.*>
@type bigquery
method insert
auth_method json_key
json_key /etc/td-agent/keys/key-31da042be48c.json
time_field timestamp
time_slice_format %Y%m%d
table user$%{time_slice}
ignore_unknown_values
schema_path /etc/td-agent/schema/user_login.json
</match>
1 2 3 4
Connector uses:
- JSON key auth file
- JSON table schema
Pro features:
- streaming to Partitioned tables
- ignore unknown values
(not reflected in schema)
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
● On data that it is difficult to process/analyze using traditional databases
● Not a replacement to traditional DBs, but it compliments the system
● Major strength is handling Large datasets
● Applying Javascript UDF on columnar storage to resolve complex tasks
(eg: JS for natural language processing)
● On streams (forms, IoT, Kafka)
● On exploring unstructured data
Where to use BigQuery?
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
➢ Optimize product pages
➢ Email engagement
➢ Funnel Analysis
Achievements - goal reached by measuring everything
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
● Funnel Analysis
Achievements
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Funnel analysis: Time on upsell pages
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Example HITS chain:
● article1 -> page2 -> page3 -> page4 -> orderpage1 -> thankyoupage1
● page1 -> article2-> page3 -> orderpage2 -> ...
Attribute credit to first article visited on purchase
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
● Funnel Analysis
● Email URL click heatmap
● Email Health Dashboard (SPAM, ISP deferral, content
A/B split tests, trends or low open rate campaigns)
● Advanced segmentation (all raw data stored)
● Behavioral analytics - engaged users etc...
Achievements Continued
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
● SQL language to run BigData queries
● run raw ad-hoc queries (either by analysts/sales or Devs)
● no more throwing away-, expiring-, aggregating old data
● no provisioning/deploy
● no running out of resources
● no more focus on large scale execution plan
● no need to re-implement tricky concepts
(time windows / join streams)
Our benefits
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
● No manual sharding
● No capacity guessing
● No idle resources
● No maintenance windows
● No manual scaling
● No file mgmt
BigQuery: Serverless Data Warehouse
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
● No servers to provision or manage
● Abstract away the complexity
● Scales with usage (ready every time for viral spikes or #BlackFriday)
● Availability and fault tolerance built in
● No orchestration in code
● Never pay for idle
● Cost savings (ps: we don’t have the same budget for security like GCP or AWS)
● Decoupled: APIs as contracts
● Monitored: Metrics and logging are a universal right
● Think concurrent, stateless, queue, stream based.
Serverlessmeans
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Easily Build Custom Reports and Dashboards
Creating #serverless data analytics system on GCP using BigQuery @martonkodok
Thank you.
Slides available on:
slideshare.net/martonkodok
Reea.net - Integrated web solutions driven by
creativity to deliver projects.

More Related Content

What's hot

Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
Alex Van Boxel
 
GDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud PlatformGDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud Platform
Márton Kodok
 
Google Cloud Platform as a Backend Solution for your Product
Google Cloud Platform as a Backend Solution for your ProductGoogle Cloud Platform as a Backend Solution for your Product
Google Cloud Platform as a Backend Solution for your Product
Sergey Smetanin
 
GDG Heraklion - Architecting for the Google Cloud Platform
GDG Heraklion - Architecting for the Google Cloud PlatformGDG Heraklion - Architecting for the Google Cloud Platform
GDG Heraklion - Architecting for the Google Cloud Platform
Márton Kodok
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Google Cloud Technologies Overview
Google Cloud Technologies OverviewGoogle Cloud Technologies Overview
Google Cloud Technologies Overview
Chris Schalk
 
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
Márton Kodok
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted Conf
In Marketing We Trust
 
30 days of google cloud event
30 days of google cloud event30 days of google cloud event
30 days of google cloud event
PreetyKhatkar
 
Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016
Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016
Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016
Sergey Smetanin
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Big Data Spain
 
Spark and MongoDB
Spark and MongoDBSpark and MongoDB
Spark and MongoDB
Norberto Leite
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
javier ramirez
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCP
AllCloud
 
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
Gera Shegalov
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
Guido Schmutz
 
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQLPGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC
 
Big query the first step - (MOSG)
Big query the first step - (MOSG)Big query the first step - (MOSG)
Big query the first step - (MOSG)
Soshi Nemoto
 
MongoDB and Spark
MongoDB and SparkMongoDB and Spark
MongoDB and Spark
Norberto Leite
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
MongoDB
 

What's hot (20)

Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
GDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud PlatformGDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud Platform
 
Google Cloud Platform as a Backend Solution for your Product
Google Cloud Platform as a Backend Solution for your ProductGoogle Cloud Platform as a Backend Solution for your Product
Google Cloud Platform as a Backend Solution for your Product
 
GDG Heraklion - Architecting for the Google Cloud Platform
GDG Heraklion - Architecting for the Google Cloud PlatformGDG Heraklion - Architecting for the Google Cloud Platform
GDG Heraklion - Architecting for the Google Cloud Platform
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Google Cloud Technologies Overview
Google Cloud Technologies OverviewGoogle Cloud Technologies Overview
Google Cloud Technologies Overview
 
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted Conf
 
30 days of google cloud event
30 days of google cloud event30 days of google cloud event
30 days of google cloud event
 
Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016
Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016
Firebase Realtime Database and Remote Config in Practice - DroidCon Moscow 2016
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
 
Spark and MongoDB
Spark and MongoDBSpark and MongoDB
Spark and MongoDB
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCP
 
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
 
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQLPGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
 
Big query the first step - (MOSG)
Big query the first step - (MOSG)Big query the first step - (MOSG)
Big query the first step - (MOSG)
 
MongoDB and Spark
MongoDB and SparkMongoDB and Spark
MongoDB and Spark
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 

Similar to CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery

VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
Márton Kodok
 
DevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQueryDevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQuery
Márton Kodok
 
Making advanced analytics accessible to more companies
Making advanced analytics accessible to more companiesMaking advanced analytics accessible to more companies
Making advanced analytics accessible to more companies
Márton Kodok
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
Márton Kodok
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Christopher Gutknecht
 
Google Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarGoogle Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery Webinar
Rasel Rana
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoT
James Chittenden
 
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編
Miho Yamamoto
 
Google Cloud Platform Solutions for DevOps Engineers
Google Cloud Platform Solutions  for DevOps EngineersGoogle Cloud Platform Solutions  for DevOps Engineers
Google Cloud Platform Solutions for DevOps Engineers
Márton Kodok
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
Lynn Langit
 
Data Platform on GCP
Data Platform on GCPData Platform on GCP
Data Platform on GCP
Patrick Alexander
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13
EDB
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
wesley chun
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute final
Avere Systems
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
GoDataDriven
 
The journey of Moving from AWS ELK to GCP Data Pipeline
The journey of Moving from AWS ELK to GCP Data PipelineThe journey of Moving from AWS ELK to GCP Data Pipeline
The journey of Moving from AWS ELK to GCP Data Pipeline
Randy Huang
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
HostedbyConfluent
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
Ido Green
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
UA DevOps Conference
 

Similar to CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery (20)

VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
DevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQueryDevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQuery
 
Making advanced analytics accessible to more companies
Making advanced analytics accessible to more companiesMaking advanced analytics accessible to more companies
Making advanced analytics accessible to more companies
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
 
Google Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarGoogle Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery Webinar
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoT
 
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編
 
Google Cloud Platform Solutions for DevOps Engineers
Google Cloud Platform Solutions  for DevOps EngineersGoogle Cloud Platform Solutions  for DevOps Engineers
Google Cloud Platform Solutions for DevOps Engineers
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
 
Data Platform on GCP
Data Platform on GCPData Platform on GCP
Data Platform on GCP
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute final
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
The journey of Moving from AWS ELK to GCP Data Pipeline
The journey of Moving from AWS ELK to GCP Data PipelineThe journey of Moving from AWS ELK to GCP Data Pipeline
The journey of Moving from AWS ELK to GCP Data Pipeline
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
 

More from Márton Kodok

Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
Márton Kodok
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementDiscover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statement
Márton Kodok
 
Cloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerizationCloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerization
Márton Kodok
 
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
Márton Kodok
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Márton Kodok
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
Cloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automationCloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automation
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
BigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery MLBigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery ML
Márton Kodok
 
Applying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsApplying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analytics
Márton Kodok
 
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer ExpertigVibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Márton Kodok
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
Next18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to youNext18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to you
Márton Kodok
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
Márton Kodok
 
GCP - A felhőalapú architektúrák és szolgáltatások
GCP - A felhőalapú architektúrák és szolgáltatásokGCP - A felhőalapú architektúrák és szolgáltatások
GCP - A felhőalapú architektúrák és szolgáltatások
Márton Kodok
 
Efikot - Smart City, okos város - a jövőnk kulcsa
Efikot - Smart City, okos város - a jövőnk kulcsaEfikot - Smart City, okos város - a jövőnk kulcsa
Efikot - Smart City, okos város - a jövőnk kulcsa
Márton Kodok
 

More from Márton Kodok (19)

Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementDiscover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statement
 
Cloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerizationCloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerization
 
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Cloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automationCloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automation
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
BigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery MLBigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery ML
 
Applying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsApplying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analytics
 
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer ExpertigVibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Next18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to youNext18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to you
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
 
GCP - A felhőalapú architektúrák és szolgáltatások
GCP - A felhőalapú architektúrák és szolgáltatásokGCP - A felhőalapú architektúrák és szolgáltatások
GCP - A felhőalapú architektúrák és szolgáltatások
 
Efikot - Smart City, okos város - a jövőnk kulcsa
Efikot - Smart City, okos város - a jövőnk kulcsaEfikot - Smart City, okos város - a jövőnk kulcsa
Efikot - Smart City, okos város - a jövőnk kulcsa
 

Recently uploaded

Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
ayushiqss
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 

Recently uploaded (20)

Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 

CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery

  • 1. Creating #serverless data analytics system on GCP using BigQuery Márton Kodok / @martonkodok Google Developer Expert at REEA.net March 2018 - Tirgu Mures, Romania
  • 2. ● Geek. Hiker. Do-er. ● Among the Top3 romanians on Stackoverflow 120k reputation ● Google Developer Expert on Cloud technologies ● Crafting Web/Mobile backends at REEA.net ● BigQuery/Redis and database engine expert ● Active in mentoring and IT community Twitter: @martonkodok StackOverflow: pentium10 Slideshare: martonkodok GitHub: pentium10 Creating #serverless data analytics system on GCP using BigQuery @martonkodok About me
  • 3. REEA.net uses GCP Build on the same infrastructure that powers Google
  • 4. Google Cloud Platform (GCP) Compute Big Data BigQuery Cloud Dataflow Cloud Dataproc Cloud Datalab Cloud Pub/Sub Genomics Storage & Databases Cloud Storage Cloud Bigtable Cloud Datastore Cloud SQL Cloud Spanner Persistent Disk Machine Learning Cloud Machine Learning Cloud Vision API Cloud Speech API Cloud Natural Language API Cloud Translation API Cloud Jobs API Data Studio Cloud Dataprep Cloud Video Intelligence API Advanced Solutions Lab Compute Engine App Engine Kubernetes Engine GPU Cloud Functions Container- Optimized OS Identity & Security Cloud IAM Cloud Resource Manager Cloud Security Scanner Key Management Service BeyondCorp Data Loss Prevention API Identity-Aware Proxy Security Key Enforcement Internet of Things Cloud IoT Core Transfer Appliance
  • 5. Google Cloud Platform (GCP) Developer Tools Cloud SDK Cloud Deployment Manager Cloud Source Repositories Cloud Tools for Android Studio Cloud Tools for IntelliJ Cloud Tools for PowerShell Cloud Tools for Visual Studio Container Registry Google Plug-in for Eclipse Cloud Test Lab Networking Virtual Private Cloud Cloud Load Balancing Cloud CDN Cloud Interconnect Cloud DNS Cloud Network Cloud External IP Addresses Cloud Firewall Rules Cloud Routes Cloud VPN Management Tools Stackdriver Monitoring Logging Error Reporting Trace Debugger Cloud Deployment Manager Cloud Endpoints Cloud Console Cloud Shell Cloud Mobile App Cloud Billing API Cloud APIs Cloud Router Dedicated Interconnect Container Builder
  • 6. Meet Serverless Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 7. Meet Serverless serverless data center depicted Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 8. Event-driven serverless compute platform Cloud Services Changes in data state Business logic events Integrations Event Router Gateway HTTPS Event Source Multiple Platforms Data Warehouse Pub/Sub Cloud Functions Streaming Business Value Application Task Analysis
  • 9. Serverless is about maximizing elasticity, cost savings, and agility of cloud computing. @martonkodok Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 10. Crafting a solution for building high-performance, petabyte scale data analytics, serverless reporting system on Google Cloud Platform Goal today Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 11. Legacy Reporting System App Cloud Load Balancing NGINX Compute Engine 10GB PD 2 1 Database Service (Master/Slave) Compute Engine 10GB PD 4 1 Compute Engine 10GB PD 4 1 Compute Engine 10GB PD 4 1 Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 12. Serverless Reporting System App Cloud Load Balancing NGINX Compute Engine 10GB PD 2 1 Database Service (Master/Slave) Compute Engine 10GB PD 4 1 Compute Engine 10GB PD 4 1 Compute Engine 10GB PD 4 1 Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances BigQuery Data Studio Report & Share Business Analysis Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 13. Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 14. Analytics-as-a-Service - Data Warehouse in the Cloud Scales into Petabytes on Managed Google Infrastructure (US or EU zone) SQL 2011 + Javascript UDF (User Defined Functions) Familiar DB Structure (table, views, struct, nested, JSON) Integrates with Google Sheets + Cloud Storage + Pub/Sub connectors Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *Mar 2018 Open Interfaces (Web UI, BQ command line tool, REST, ODBC) What is BigQuery? Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 15. Columnar storage (max 10 000 columns in table) Large files for loading: 5TB (CSV or JSON) UDF in Javascript or SQL Rich SQL 2011: JSON,IP,Math,RegExp,Geocode,Window functions Modern data types: Record, Nested, Struct, Array. Append-only tables prefered (DML syntax available) Day column partitioned tables (select * from t where day=’2018-01-01’) BigQuery: Convenience of SQL Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 16. Architecting for The Cloud BigQuery On-Premises Servers Pipelines ETL Engine Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 17. “ Our project generates many/big files. How can I seamlessly ingest them? Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 18. Serverless file ingest BigQuery On-Premises Servers ApplicationEvent Sourcing Frontend Platform Services Metrics / Logs/ Streaming Cloud Storage Cloud Functions Triggered Code Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 19. “ Data needs to be processed in multiple services. How can we pipe to multiple places? Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 20. Architecting for The Cloud On-Premises Servers Event Sourcing Frontend Platform Services Analyze Metrics / Logs/ Streaming Cloud Storage Creating #serverless data analytics system on GCP using BigQuery @martonkodok Cloud Dataflow Process BigQuery Cloud SQL Stream Batch Data Studio Third-Party Tools
  • 21. “ We have our app outside of GCP. How can we use the benefits of BigQuery? Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 22. Data Pipeline Integration at REEA.net Analytics Backend BigQuery On-Premises Servers Pipelines FluentD Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming Development Team Data Analysts Report & Share Business Analysis Tools Tableau QlikView Data Studio Internal Dashboard Database SQL Application ServersServers Cloud Storage archive Load Export Replay Standard Devices HTTPS Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 23. The following slides will present a sample Fluentd configuration to: 1. Transform a record 2. Copy event to multiple outputs 3. Store event data in File (for backup/log purposes) 4. Stream to BigQuery (for immediate analyses) Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 24. <filter frontend.user.*> @type record_transformer </filter> <match frontend.user.*> @type copy <store> @type forest subtype file </store> <store> @type bigquery </store> … </match> Filter plugin mutates incoming data. Add/modify/delete event data transform attributes without a code deploy.1 2 3 4 The copy output plugin copies events to multiple outputs. File(s), multiple databases, DB engines. Great to ship same event to multiple subsystems. The Bigquery output plugin on the fly streams the event to the BigQuery warehouse. No need to write integration. Data is available immediately for querying. Whenever needed other output plugins can be wired in: Kafka, Google Cloud Storage output plugin. Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 25. record_transformer copy file BigQuery <filter frontend.user.*> @type record_transformer enable_ruby remove_keys host <record> bq {"insert_id":"${uid}","host":"${host}", "created":"${time.to_i}"} avg ${record["total"] / record["count"]} </record> </filter> syntax: Ruby, easy to use. Great for: - date transformation, - quick normalizations, - calculating something on the fly, and store in clear log/analytics db - renaming without code deploy. 1 2 3 4 Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 26. record_transformer copy file BigQuery <match frontend.user.*> @type copy <store> @type forest subtype file <template> path /tank/storage/${tag}.*.log time_slice_format %Y%m%d </template> </store> </match> 1 2 3 4 Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 27. record_transformer copy file BigQuery <match frontend.user.*> @type bigquery method insert auth_method json_key json_key /etc/td-agent/keys/key-31da042be48c.json time_field timestamp time_slice_format %Y%m%d table user$%{time_slice} ignore_unknown_values schema_path /etc/td-agent/schema/user_login.json </match> 1 2 3 4 Connector uses: - JSON key auth file - JSON table schema Pro features: - streaming to Partitioned tables - ignore unknown values (not reflected in schema) Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 28. ● On data that it is difficult to process/analyze using traditional databases ● Not a replacement to traditional DBs, but it compliments the system ● Major strength is handling Large datasets ● Applying Javascript UDF on columnar storage to resolve complex tasks (eg: JS for natural language processing) ● On streams (forms, IoT, Kafka) ● On exploring unstructured data Where to use BigQuery? Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 29. ➢ Optimize product pages ➢ Email engagement ➢ Funnel Analysis Achievements - goal reached by measuring everything Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 30. ● Funnel Analysis Achievements Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 31. Funnel analysis: Time on upsell pages Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 32. Example HITS chain: ● article1 -> page2 -> page3 -> page4 -> orderpage1 -> thankyoupage1 ● page1 -> article2-> page3 -> orderpage2 -> ... Attribute credit to first article visited on purchase Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 33. ● Funnel Analysis ● Email URL click heatmap ● Email Health Dashboard (SPAM, ISP deferral, content A/B split tests, trends or low open rate campaigns) ● Advanced segmentation (all raw data stored) ● Behavioral analytics - engaged users etc... Achievements Continued Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 34. ● SQL language to run BigData queries ● run raw ad-hoc queries (either by analysts/sales or Devs) ● no more throwing away-, expiring-, aggregating old data ● no provisioning/deploy ● no running out of resources ● no more focus on large scale execution plan ● no need to re-implement tricky concepts (time windows / join streams) Our benefits Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 35. ● No manual sharding ● No capacity guessing ● No idle resources ● No maintenance windows ● No manual scaling ● No file mgmt BigQuery: Serverless Data Warehouse Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 36. ● No servers to provision or manage ● Abstract away the complexity ● Scales with usage (ready every time for viral spikes or #BlackFriday) ● Availability and fault tolerance built in ● No orchestration in code ● Never pay for idle ● Cost savings (ps: we don’t have the same budget for security like GCP or AWS) ● Decoupled: APIs as contracts ● Monitored: Metrics and logging are a universal right ● Think concurrent, stateless, queue, stream based. Serverlessmeans Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 37. Easily Build Custom Reports and Dashboards Creating #serverless data analytics system on GCP using BigQuery @martonkodok
  • 38. Thank you. Slides available on: slideshare.net/martonkodok Reea.net - Integrated web solutions driven by creativity to deliver projects.