DevTalks Keynote Powering interactive data analysis with Google BigQuery

Powering Interactive Data
Analysis with Google
BigQuery
Márton Kodok / @martonkodok
Google Developer Expert at REEA.net, Targu Mures, Romania17 May 2017
Cluj Napoca, Romania

● Geek. Hiker. Do-er.
● Among the Top3 romanians on Stackoverflow
● Google Developer Expert on Cloud technologies
● Crafting Web/Mobile backends at REEA.net Targu Mures
● BigQuery and database engine expert
● Active in mentoring
Twitter: @martonkodok
StackOverflow: pentium10
Slideshare: martonkodok
GitHub: pentium10
Powering Interactive Data Analysis with Google BigQuery @martonkodok
About me

Theme 2017: Big Data for Big Impact
The theme for WTISD-17, "Big Data for Big Impact," focuses on
the power of Big Data for development and aims to explore
how to turn imperfect, complex, often unstructured data into
actionable information in a development context.
17 May 2017, World Telecommunication and Information Society Day

Romania needs to
accelerateopenBigDataadoption

Let Education/Healthcare
benefitfromtalented BigData experts

<Start-up Nation> projects should focus on going
-from paperless to openendpoints
-from smalldatatoBigData
-sustain a City by building BigDataPlatformHubs

Everycompany,
no matter how far from the tech they are,
isevolvingintoasoftwarecompany,
and by extension a datacompany.

For a small company it’simportant
to have access to modernBigDatatools
withoutrunningadedicatedteam for it.

Agenda
The
Challenge
Architecture
Overview
Strategy &
Tricks
Winning
Solution

❏ Need backend/database to STORE, QUERY, EXTRACT data
❏ Deep analytics - large, multi-source, complex, unstructured
❏ Be real time
❏ Terabyte scale - Cost effective
❏ Run Ad-Hoc reports - Without Developer - interactive
❏ Minimal engineering efforts - no dedicated BigData team
❏ Simple Query language (prefered SQL / Javascript)
Making analytics accessible to more companies

Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances

Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
BehindtheScenes:
DaysToInsights

Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Minutes
to kick in
Hours to Run
Batch Processing
Hours to Clean and
Aggregate
DAYS TO
INSIGHTS

● Terabyte scalable storage
● Real-time row ingestion
● Ask sophisticated queries
● Query-performance
● Low-maintenance
● Cost effective
● Wire them up easily
Goal: Store everything accessible by SQL immediately.
Desired system/platform
Engines:
● MongoDB, Riak, Redis
● ELK Stack (Elasticsearch-Logstash-Kibana)
● Cassandra, Hive, Hadoop...
● Amazon Athena, Google BigQuery...

● Analytics-as-a-Service - Data Warehouse in the Cloud
● Fully-Managed by Google (US or EU zone)
● Scales into Petabytes
● Ridiculously fast
● SQL 2011 Standard + Javascript UDF (User Defined Functions)
● Familiar DB Structure (table, views, record, nested, JSON)
● Open Interfaces (REST, ODBC, Web UI, BQ command line tool)
● Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors
● Client libraries available in YFL (your favorite languages)
● Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *May 2017
What is BigQuery?

Architecting for The Cloud
BigQuery
On-Premises Servers
Pipelines
ETL
Engine
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming

Data Pipeline Integration
Analytics Backend
BigQuery
On-Premises Servers
Pipelines
FluentD
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Development
Team
Data Analysts
Report & Share
Business Analysis
Tools
Tableau
QlikView
Data Studio
Internal
Dashboard
Database
SQL
Application
ServersServers
Cloud Storage
archive
Load
Export
Replay
Standard
Devices
HTTPS

<filter frontend.user.*>
@type record_transformer
enable_ruby
remove_keys host
<record>
bq {"insert_id":"${uid}","host":"${host}","created":"${time.to_i}"}
</record>
</filter>
<match frontend.user.*>
@type copy
<store>
@type forest
subtype file
<template>
path /tank/storage/${tag}.*.log
time_slice_format %Y%m%d
time_slice_wait 10m
</template>
</store>
<store>
@type bigquery
method insert
...
</store>
</match>
….bigquery section continued….
auth_method json_key
json_key /etc/td-agent/keys/key-31da042be48c.json
project project_id
dataset dataset_name
time_field timestamp
time_slice_format %Y%m%d
table user$%{time_slice}
ignore_unknown_values
schema_path /etc/td-agent/schema/user_login.json
1
2
3
4

● On data that it is difficult to process/analyze using traditional databases
● On exploring unstructured data
● Not a replacement to traditional DBs, but it compliments the system
● Applying Javascript UDF on columnar storage to resolve complex tasks
(eg: JS for natural language processing)
● On streams (form wizard ...)
● On IoT streams
● Major strength is handling Large datasets
Where to use BigQuery?

Romanian stations that record the most days of snow

Mentions of RO politicians since ‘16 Nov in GDELT articles

● Optimize product pages
● Email engagement
● Funnel Analysis
Achievements

Funnel analysis: Time on upsell pages

Example HITS chain:
● article1 -> page2 -> page3 -> page4 -> orderpage1 -> thankyoupage1
● page1 -> article2-> page3 -> orderpage2 -> ...
Attribute credit to first article visited on purchase

● No manual sharding
● No capacity guessing
● No idle resources
● No maintenance windows
● No manual scaling
● No file mgmt
BigQuery: Serverless Data Warehouse

● no provisioning/deploy
● no running out of resources
● no more focus on large scale execution plan
● no need to re-implement tricky concepts
(time windows / join streams)
● pay only the columns we have in your queries
● run raw ad-hoc queries (either by analysts/sales or Devs)
● no more throwing away-, expiring-, aggregating old data.
Our benefits

Easily Build Custom Reports and Dashboards

Questions?
Thank you.
Slides available on: slideshare.net/martonkodok

DevTalks Keynote Powering interactive data analysis with Google BigQuery

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DevTalks Keynote Powering interactive data analysis with Google BigQuery

Similar to DevTalks Keynote Powering interactive data analysis with Google BigQuery (20)

More from Márton Kodok

More from Márton Kodok (20)

Recently uploaded

Recently uploaded (20)

DevTalks Keynote Powering interactive data analysis with Google BigQuery