Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

I TAKE Unconference 2017 - Powering interactive data analysis with Google BigQuery

531 views

Published on

Every scientist who needs big data analytics to save millions of lives should have that power. Complex interactive Big Data analytics solutions require massive architecture, and Know-How to build a fast real-time computing system.BigQuery solves this problem by enabling super-fast, SQL-like queries against petabytes of data using the processing power of Google’s infrastructure. We will cover its core features, working with BigQuery, streaming inserts, User Defined Functions in Javascript, and several use cases for everyday developer: funnel analytics, behavioral analytics, exploring unstructured data.

Published in: Software
  • Be the first to comment

I TAKE Unconference 2017 - Powering interactive data analysis with Google BigQuery

  1. 1. Powering Interactive Data Analysis with Google BigQuery Márton Kodok / @martonkodok Google Developer Expert at REEA May 2017 - Bucharest, Romania
  2. 2. ● Geek. Hiker. Do-er. ● Among the Top3 romanians on Stackoverflow ● Google Developer Expert on Cloud technologies ● Crafting Web/Mobile backends at REEA.net ● BigQuery and database engine expert ● Active in mentoring Twitter: @martonkodok StackOverflow: pentium10 Slideshare: martonkodok GitHub: pentium10 Powering Interactive Data Analysis with Google BigQuery @martonkodok About me
  3. 3. Powering Interactive Data Analysis with Google BigQuery @martonkodok Agenda The Challenge Powering interactive Data Analysis/Reporting system Architecture Overview Strategy & Tricks Winning Solution
  4. 4. ❏ Need backend/database to STORE, QUERY, EXTRACT data ❏ Deep analytics - large, multi-source, complex, unstructured ❏ Be real time ❏ Terabyte scale ❏ Cost effective ❏ Run Ad-Hoc reports - as the occasion requires ❏ Without Developer - interactive ❏ Minimal engineering efforts ❏ Support streaming - data is generated on a continual basis ❏ Withstand #BlackFriday ❏ Simple Query language (prefered SQL / Javascript) Powering Interactive Data Analysis with Google BigQuery @martonkodok The Challenge
  5. 5. “We can't solve problems by using the same kind of thinking we used when we created them” -Albert Einstein Powering Interactive Data Analysis with Google BigQuery @martonkodok The Challenge
  6. 6. Powering Interactive Data Analysis with Google BigQuery @martonkodok Legacy Business Reporting System Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances
  7. 7. Powering Interactive Data Analysis with Google BigQuery @martonkodok Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances BehindtheScenes: DaysToInsights
  8. 8. Powering Interactive Data Analysis with Google BigQuery @martonkodok Legacy Business Reporting System Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances Minutes to kick in Hours to Run Batch Processing Hours to Clean and Aggregate DAYS TO INSIGHTS
  9. 9. ● Terabyte scalable storage ● Real-time row ingestion ● Ask sophisticated queries ● Query-performance ● Low-maintenance ● Cost effective ● Wire them up easily Goal: Store everything accessible by SQL immediately. Powering Interactive Data Analysis with Google BigQuery @martonkodok Desired system/platform Engines: ● MongoDB, Riak, Redis ● ELK Stack (Elasticsearch-Logstash-Kibana) ● Cassandra, Hive, Hadoop... ● Amazon Athena, Google BigQuery...
  10. 10. Powering Interactive Data Analysis with Google BigQuery @martonkodok
  11. 11. ● Analytics-as-a-Service - Data Warehouse in the Cloud ● Fully-Managed by Google (US or EU zone) ● Scales into Petabytes ● Ridiculously fast ● SQL 2011 Standard + Javascript UDF (User Defined Functions) ● Familiar DB Structure (table, views, record, nested, JSON) ● Open Interfaces (Web UI, BQ command line tool, REST, ODBC) ● Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors ● Client libraries available in YFL (your favorite languages) ● Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *May 2017 Powering Interactive Data Analysis with Google BigQuery @martonkodok What is BigQuery?
  12. 12. ● Columnar storage (max 10 000 columns in table) ● Batch load file size limits: 5TB (CSV or JSON) ● User Defined Functions in SQL or Javascript ● Rich SQL 2011: JSON, IP, Math, RegExp, Window functions ● Data types: String, Integer, Float, Boolean, Timestamp, Record, Nested, Struct, Array. ● Append-only tables prefered (DML syntax available) ● Day partitioned tables ● ACL - row level locking (individual or group based) Powering Interactive Data Analysis with Google BigQuery @martonkodok BigQuery: Convenience of SQL
  13. 13. * 1 Petabyte storage, 10 TB inserts, 100 TB queries => $22000 Queries Storage Ingestion ➔ 1 TB per month free ➔ 5 USD per TB ➔ only pay for the columns you use in your query ➔ 20 USD per TB frequently accessed data ➔ 10 USD per TB long term storage 90 days ➔ Batch load free (CSV/JSON) ➔ Exporting free ➔ Table copy free ➔ Streaming 50 USD per TB Estimate 1 - Storage 5 TB - Streaming Inserts 1 TB - Queries 3 TB Monthly total: $165 Estimate 2 - Storage 25 TB - Streaming Inserts 1 TB - Queries 50 TB Monthly total: $788 Powering Interactive Data Analysis with Google BigQuery @martonkodok BigQuery Costs - May 2017
  14. 14. Powering Interactive Data Analysis with Google BigQuery @martonkodok Architecting for The Cloud BigQuery On-Premises Servers Pipelines ETL Engine Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming
  15. 15. Powering Interactive Data Analysis with Google BigQuery @martonkodok Access to Insights without Developer support Analytics Backend BigQuery On-Premises Servers Pipelines ETL Engine Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming Development Team Data Analysts Report & Share Business Analysis Tools Tableau QlikView Data Studio Internal Dashboard Database SQL
  16. 16. Powering Interactive Data Analysis with Google BigQuery @martonkodok Data Pipeline Integration Analytics Backend BigQuery On-Premises Servers Pipelines ETL Database SQL Standard Devices HTTPS Ingest Events Monitoring Logging FluentD Cloud Storage Report & Share Business Analysis Firebase archive Load Export Replay Application ServersServers
  17. 17. Powering Interactive Data Analysis with Google BigQuery @martonkodok <filter frontend.user.*> @type record_transformer enable_ruby remove_keys host <record> bq {"insert_id":"${uid}","host":"${host}","created":"${time.to_i}"} </record> </filter> <match frontend.user.*> @type copy <store> @type forest subtype file <template> path /tank/storage/${tag}.*.log time_slice_format %Y%m%d time_slice_wait 10m </template> </store> <store> @type bigquery method insert ... </store> </match> ….bigquery section continued…. auth_method json_key json_key /etc/td-agent/keys/key-31da042be48c.json project project_id dataset dataset_name time_field timestamp time_slice_format %Y%m%d table user$%{time_slice} ignore_unknown_values schema_path /etc/td-agent/schema/user_login.json 1 2 3 4
  18. 18. ● On data that it is difficult to process/analyze using traditional databases ● On exploring unstructured data ● Not a replacement to traditional DBs, but it compliments the system ● Applying Javascript UDF on columnar storage to resolve complex tasks (eg: JS for natural language processing) ● On streams (form wizard ...) ● On IoT streams ● Major strength is handling Large datasets Powering Interactive Data Analysis with Google BigQuery @martonkodok Where to use BigQuery?
  19. 19. Go to the BigQuery web UI. https://bigquery.cloud.google.com/ Powering Interactive Data Analysis with Google BigQuery @martonkodok Query a public dataset
  20. 20. Powering Interactive Data Analysis with Google BigQuery @martonkodok Romanian stations that record the most days of snow
  21. 21. Powering Interactive Data Analysis with Google BigQuery @martonkodok Mentions of RO politicians since ‘16 Nov in GDELT articles
  22. 22. ● Funnel Analysis Powering Interactive Data Analysis with Google BigQuery @martonkodok Achievements
  23. 23. Powering Interactive Data Analysis with Google BigQuery @martonkodok Funnel analysis: Time on upsell pages
  24. 24. Example HITS chain: ● article1 -> page2 -> page3 -> page4 -> orderpage1 -> thankyoupage1 ● page1 -> article2-> page3 -> orderpage2 -> ... Powering Interactive Data Analysis with Google BigQuery @martonkodok Attribute credit to first article visited on purchase
  25. 25. ● Funnel Analysis ● Email URL click heatmap Powering Interactive Data Analysis with Google BigQuery @martonkodok Achievements
  26. 26. Powering Interactive Data Analysis with Google BigQuery @martonkodok Email URL clicks heat-map
  27. 27. ● Funnel Analysis ● Email URL click heatmap ● Email Health Dashboard (SPAM, ISP deferral, content A/B split tests, trends or low open rate campaigns) ● Advanced segmentation (all raw data stored) ● Behavioral analytics - engaged users etc... Powering Interactive Data Analysis with Google BigQuery @martonkodok Achievements Continued
  28. 28. ● no provisioning/deploy ● no running out of resources ● no more focus on large scale execution plan ● no need to re-implement tricky concepts (time windows / join streams) ● pay only the columns we have in your queries ● run raw ad-hoc queries (either by analysts/sales or Devs) ● no more throwing away-, expiring-, aggregating old data. Powering Interactive Data Analysis with Google BigQuery @martonkodok Our benefits
  29. 29. ● No manual sharding ● No capacity guessing ● No idle resources ● No maintenance windows ● No manual scaling ● No file mgmt Powering Interactive Data Analysis with Google BigQuery @martonkodok BigQuery: Serverless Data Warehouse
  30. 30. Powering Interactive Data Analysis with Google BigQuery @martonkodok BigQuery: Sample projects to try out 1
  31. 31. Powering Interactive Data Analysis with Google BigQuery @martonkodok BigQuery: Sample projects to try out 2
  32. 32. Powering Interactive Data Analysis with Google BigQuery @martonkodok HttpArchive - multiple JS frameworks
  33. 33. Powering Interactive Data Analysis with Google BigQuery @martonkodok HttpArchive - multiple jQuery versions
  34. 34. Powering Interactive Data Analysis with Google BigQuery @martonkodok Easily Build Custom Reports and Dashboards
  35. 35. Questions? Thank you. Slides available on: slideshare.net/martonkodok

×