Making advanced analytics
accessible to more companies
Márton Kodok / @martonkodok
Google Developer Expert at REEA.net, Targu Mures, Romania
24 May 2017 Tirgu Mures, Romania
Issue 59 - May 2017
● Geek. Hiker. Do-er.
● Crafting Web/Mobile backends at REEA.net Targu Mures
● Among the TOP3 romanians on Stackoverflow.com
● Google Developer Expert on Cloud technologies
● BigQuery and database engine expert
● Active in mentoring
Twitter: @martonkodok
StackOverflow: pentium10
Slideshare: martonkodok
GitHub: pentium10
Making advanced analytics accessible to more companies @martonkodok
About me
Making advanced analytics accessible to more companies @martonkodok
Agenda
The
Challenge
Making advanced analytics
accessible to more companies
Architecture
Overview
Strategy &
Tricks
Winning
Solution
Companies:
❏ must be able to identify, combine, and manage multiple sources of data
❏ should have the ability to obtain advanced analytics using concepts
they are familiar with.
❏ have a deployment of the right technology architecture
matching their capabilities.
Making advanced analytics accessible to more companies @martonkodok
3 principles to get from small data to BigData
Making advanced analytics accessible to more companies @martonkodok
Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Making advanced analytics accessible to more companies @martonkodok
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
BehindtheScenes:
DaysToInsights
Making advanced analytics accessible to more companies @martonkodok
Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Minutes
to kick in
Hours to Run
Batch Processing
Hours to Clean and
Aggregate
DAYS TO
INSIGHTS
❏ Need backend/database to STORE, QUERY, EXTRACT data
❏ Deep analytics - large, multi-source, complex, unstructured
❏ Be real time
❏ Terabyte scale - Cost effective
❏ Run Ad-Hoc reports - Without Developer - interactive
❏ Minimal engineering efforts - no dedicated BigData team
❏ Simple Query language (prefered SQL / Javascript)
Making advanced analytics accessible to more companies @martonkodok
Desired system
Making advanced analytics accessible to more companies @martonkodok
● Analytics-as-a-Service - Data Warehouse in the Cloud
● Fully-Managed by Google (US or EU zone)
● Scales into Petabytes
● Ridiculously fast
● SQL 2011 Standard + Javascript UDF (User Defined Functions)
● Familiar DB Structure (table, views, record, nested, JSON)
● Integrates with Tableau, Google Sheets + Cloud Storage + Pub/Sub connectors
● Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *May 2017
Making advanced analytics accessible to more companies @martonkodok
What is BigQuery?
Making advanced analytics accessible to more companies @martonkodok
Architecting for The Cloud
BigQuery
On-Premises Servers
Pipelines
ETL
Engine
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Making advanced analytics accessible to more companies @martonkodok
Data Pipeline Integration
Analytics Backend
BigQuery
On-Premises Servers
Pipelines
FluentD
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Development
Team
Data Analysts
Report & Share
Business Analysis
Tools
Tableau
QlikView
Data Studio
Internal
Dashboard
Database
SQL
Application
ServersServers
Cloud Storage
archive
Load
Export
Replay
Standard
Devices
HTTPS
Making advanced analytics accessible to more companies @martonkodok
<filter frontend.user.*>
@type record_transformer
enable_ruby
remove_keys host
<record>
bq {"insert_id":"${uid}","host":"${host}","created":"${time.to_i}"}
</record>
</filter>
<match frontend.user.*>
@type copy
<store>
@type forest
subtype file
<template>
path /tank/storage/${tag}.*.log
time_slice_format %Y%m%d
time_slice_wait 10m
</template>
</store>
<store>
@type bigquery
method insert
...
</store>
</match>
….bigquery section continued….
auth_method json_key
json_key /etc/td-agent/keys/key-31da042be48c.json
project project_id
dataset dataset_name
time_field timestamp
time_slice_format %Y%m%d
table user$%{time_slice}
ignore_unknown_values
schema_path /etc/td-agent/schema/user_login.json
1
2
3
4
● On data that it is difficult to process/analyze using traditional databases
● On exploring unstructured data
● Not a replacement to traditional DBs, but it compliments the system
● Applying Javascript UDF on columnar storage to resolve complex tasks
(eg: JS for natural language processing)
● On streams (form wizard ...)
● On IoT streams
● Major strength is handling Large datasets
Making advanced analytics accessible to more companies @martonkodok
Where to use BigQuery?
● no manual sharding
● no capacity guessing
● no idle resources
● no manual scaling
● no provisioning/deploy/running out of resources
● run raw ad-hoc queries (either by analysts/sales)
● no more throwing away-, expiring-, aggregating old
data.
Making advanced analytics accessible to more companies @martonkodok
BigQuery Benefits: Serverless Data Warehouse
Making advanced analytics accessible to more companies @martonkodok
Easily Build Custom Reports and Dashboards
Thank you.
Slides available on: slideshare.net/martonkodok
Making advanced analytics accessible to more companies @martonkodok

Making advanced analytics accessible to more companies

  • 1.
    Making advanced analytics accessibleto more companies Márton Kodok / @martonkodok Google Developer Expert at REEA.net, Targu Mures, Romania 24 May 2017 Tirgu Mures, Romania Issue 59 - May 2017
  • 2.
    ● Geek. Hiker.Do-er. ● Crafting Web/Mobile backends at REEA.net Targu Mures ● Among the TOP3 romanians on Stackoverflow.com ● Google Developer Expert on Cloud technologies ● BigQuery and database engine expert ● Active in mentoring Twitter: @martonkodok StackOverflow: pentium10 Slideshare: martonkodok GitHub: pentium10 Making advanced analytics accessible to more companies @martonkodok About me
  • 3.
    Making advanced analyticsaccessible to more companies @martonkodok Agenda The Challenge Making advanced analytics accessible to more companies Architecture Overview Strategy & Tricks Winning Solution
  • 4.
    Companies: ❏ must beable to identify, combine, and manage multiple sources of data ❏ should have the ability to obtain advanced analytics using concepts they are familiar with. ❏ have a deployment of the right technology architecture matching their capabilities. Making advanced analytics accessible to more companies @martonkodok 3 principles to get from small data to BigData
  • 5.
    Making advanced analyticsaccessible to more companies @martonkodok Legacy Business Reporting System Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances
  • 6.
    Making advanced analyticsaccessible to more companies @martonkodok Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances BehindtheScenes: DaysToInsights
  • 7.
    Making advanced analyticsaccessible to more companies @martonkodok Legacy Business Reporting System Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances Minutes to kick in Hours to Run Batch Processing Hours to Clean and Aggregate DAYS TO INSIGHTS
  • 8.
    ❏ Need backend/databaseto STORE, QUERY, EXTRACT data ❏ Deep analytics - large, multi-source, complex, unstructured ❏ Be real time ❏ Terabyte scale - Cost effective ❏ Run Ad-Hoc reports - Without Developer - interactive ❏ Minimal engineering efforts - no dedicated BigData team ❏ Simple Query language (prefered SQL / Javascript) Making advanced analytics accessible to more companies @martonkodok Desired system
  • 9.
    Making advanced analyticsaccessible to more companies @martonkodok
  • 10.
    ● Analytics-as-a-Service -Data Warehouse in the Cloud ● Fully-Managed by Google (US or EU zone) ● Scales into Petabytes ● Ridiculously fast ● SQL 2011 Standard + Javascript UDF (User Defined Functions) ● Familiar DB Structure (table, views, record, nested, JSON) ● Integrates with Tableau, Google Sheets + Cloud Storage + Pub/Sub connectors ● Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *May 2017 Making advanced analytics accessible to more companies @martonkodok What is BigQuery?
  • 11.
    Making advanced analyticsaccessible to more companies @martonkodok Architecting for The Cloud BigQuery On-Premises Servers Pipelines ETL Engine Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming
  • 12.
    Making advanced analyticsaccessible to more companies @martonkodok Data Pipeline Integration Analytics Backend BigQuery On-Premises Servers Pipelines FluentD Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming Development Team Data Analysts Report & Share Business Analysis Tools Tableau QlikView Data Studio Internal Dashboard Database SQL Application ServersServers Cloud Storage archive Load Export Replay Standard Devices HTTPS
  • 13.
    Making advanced analyticsaccessible to more companies @martonkodok <filter frontend.user.*> @type record_transformer enable_ruby remove_keys host <record> bq {"insert_id":"${uid}","host":"${host}","created":"${time.to_i}"} </record> </filter> <match frontend.user.*> @type copy <store> @type forest subtype file <template> path /tank/storage/${tag}.*.log time_slice_format %Y%m%d time_slice_wait 10m </template> </store> <store> @type bigquery method insert ... </store> </match> ….bigquery section continued…. auth_method json_key json_key /etc/td-agent/keys/key-31da042be48c.json project project_id dataset dataset_name time_field timestamp time_slice_format %Y%m%d table user$%{time_slice} ignore_unknown_values schema_path /etc/td-agent/schema/user_login.json 1 2 3 4
  • 14.
    ● On datathat it is difficult to process/analyze using traditional databases ● On exploring unstructured data ● Not a replacement to traditional DBs, but it compliments the system ● Applying Javascript UDF on columnar storage to resolve complex tasks (eg: JS for natural language processing) ● On streams (form wizard ...) ● On IoT streams ● Major strength is handling Large datasets Making advanced analytics accessible to more companies @martonkodok Where to use BigQuery?
  • 15.
    ● no manualsharding ● no capacity guessing ● no idle resources ● no manual scaling ● no provisioning/deploy/running out of resources ● run raw ad-hoc queries (either by analysts/sales) ● no more throwing away-, expiring-, aggregating old data. Making advanced analytics accessible to more companies @martonkodok BigQuery Benefits: Serverless Data Warehouse
  • 16.
    Making advanced analyticsaccessible to more companies @martonkodok Easily Build Custom Reports and Dashboards
  • 17.
    Thank you. Slides availableon: slideshare.net/martonkodok Making advanced analytics accessible to more companies @martonkodok