Making advanced analytics accessible to more companies

Making advanced analytics
accessible to more companies
Márton Kodok / @martonkodok
Google Developer Expert at REEA.net, Targu Mures, Romania
24 May 2017 Tirgu Mures, Romania
Issue 59 - May 2017

● Geek. Hiker. Do-er.
● Crafting Web/Mobile backends at REEA.net Targu Mures
● Among the TOP3 romanians on Stackoverflow.com
● Google Developer Expert on Cloud technologies
● BigQuery and database engine expert
● Active in mentoring
Twitter: @martonkodok
StackOverflow: pentium10
Slideshare: martonkodok
GitHub: pentium10
Making advanced analytics accessible to more companies @martonkodok
About me

Agenda
The
Challenge
Making advanced analytics
accessible to more companies
Architecture
Overview
Strategy &
Tricks
Winning
Solution

Companies:
❏ must be able to identify, combine, and manage multiple sources of data
❏ should have the ability to obtain advanced analytics using concepts
they are familiar with.
❏ have a deployment of the right technology architecture
matching their capabilities.
3 principles to get from small data to BigData

Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances

Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
BehindtheScenes:
DaysToInsights

Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Minutes
to kick in
Hours to Run
Batch Processing
Hours to Clean and
Aggregate
DAYS TO
INSIGHTS

❏ Need backend/database to STORE, QUERY, EXTRACT data
❏ Deep analytics - large, multi-source, complex, unstructured
❏ Be real time
❏ Terabyte scale - Cost effective
❏ Run Ad-Hoc reports - Without Developer - interactive
❏ Minimal engineering efforts - no dedicated BigData team
❏ Simple Query language (prefered SQL / Javascript)
Desired system

● Analytics-as-a-Service - Data Warehouse in the Cloud
● Fully-Managed by Google (US or EU zone)
● Scales into Petabytes
● Ridiculously fast
● SQL 2011 Standard + Javascript UDF (User Defined Functions)
● Familiar DB Structure (table, views, record, nested, JSON)
● Integrates with Tableau, Google Sheets + Cloud Storage + Pub/Sub connectors
● Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *May 2017
What is BigQuery?

Architecting for The Cloud
BigQuery
On-Premises Servers
Pipelines
ETL
Engine
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming

Data Pipeline Integration
Analytics Backend
BigQuery
On-Premises Servers
Pipelines
FluentD
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Development
Team
Data Analysts
Report & Share
Business Analysis
Tools
Tableau
QlikView
Data Studio
Internal
Dashboard
Database
SQL
Application
ServersServers
Cloud Storage
archive
Load
Export
Replay
Standard
Devices
HTTPS

<filter frontend.user.*>
@type record_transformer
enable_ruby
remove_keys host
<record>
bq {"insert_id":"${uid}","host":"${host}","created":"${time.to_i}"}
</record>
</filter>
<match frontend.user.*>
@type copy
<store>
@type forest
subtype file
<template>
path /tank/storage/${tag}.*.log
time_slice_format %Y%m%d
time_slice_wait 10m
</template>
</store>
<store>
@type bigquery
method insert
...
</store>
</match>
….bigquery section continued….
auth_method json_key
json_key /etc/td-agent/keys/key-31da042be48c.json
project project_id
dataset dataset_name
time_field timestamp
time_slice_format %Y%m%d
table user$%{time_slice}
ignore_unknown_values
schema_path /etc/td-agent/schema/user_login.json
1
2
3
4

● On data that it is difficult to process/analyze using traditional databases
● On exploring unstructured data
● Not a replacement to traditional DBs, but it compliments the system
● Applying Javascript UDF on columnar storage to resolve complex tasks
(eg: JS for natural language processing)
● On streams (form wizard ...)
● On IoT streams
● Major strength is handling Large datasets
Where to use BigQuery?

● no manual sharding
● no capacity guessing
● no idle resources
● no manual scaling
● no provisioning/deploy/running out of resources
● run raw ad-hoc queries (either by analysts/sales)
● no more throwing away-, expiring-, aggregating old
data.
BigQuery Benefits: Serverless Data Warehouse

Easily Build Custom Reports and Dashboards

Thank you.
Slides available on: slideshare.net/martonkodok

Making advanced analytics accessible to more companies

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Making advanced analytics accessible to more companies

Similar to Making advanced analytics accessible to more companies (20)

More from Márton Kodok

More from Márton Kodok (20)

Recently uploaded

Recently uploaded (20)

Making advanced analytics accessible to more companies