Google BigQuery is the future of Analytics!
1. Generate big data reports require expensive servers and skilled database administrators
2. Interacting with big data has been expensive, slow and inefficient
3. BigQuery changes all that reducing time and expense to query data
4. Super fast SQL queries - run queries on terabyte data sets in seconds
5. Scalable – i) Store hundreds of terabytes ii) Pay only for what you use
6. Service for interactive analysis of massive datasets:
a) Query billions of rows: seconds to write, seconds to return
b) Uses a SQL style query syntax c) It's a service, accessed by a RESTful API
Why BigQuery
This Talk
● From zero to business impact
● Cut off analysis time by up to 90% with BigQuery
● Advanced visualization with data analysis tools
● Few best practices
● Every company is evolving into a tech solution and by
extension a data company
● For small company it's important to have access big data
tools without running a dedicated team for it
Turning data into Analytics
Legacy Business Analysis
Web
Mobile
Web Server Platform Services
Scheduled
Tasks
Batch Processing
Compute Engine
Database Business
Analysis
Behind The Scenes: Days To Insights
Web
Mobile
Web Server Platform Services
Scheduled Tasks Batch Processing
Database Business
Analysis
Minutes
to kick in
Hours to Run
Batch Processing
Hours to Clean
and Aggregate
Days to
Insights
Architecture For The Cloud
Frontend Service
Event Sourcing
Metrics/Logs
ETL
Engine
Pipelines
Google Cloud PlatformOn-Premises
Data Pipeline Integration at LCP
How Big is B-I-G
Youtube
Media data
15+ exabytes (2017)
Inventory &
Customer Data
42 Terabytes (2014)
Gmail only
18.5+ petabytes (2018)
English article
10 + Terabytes
(2013)
Amazon Google Wikipedia
Improve The Performance of BigQuery
Geography
Types
Partition
pruning with
subqueries
Using Nested
fields
Clustering
Execution time 30.0 sec
Partition pruning with subqueries
Execution time 9.6 sec
Partition pruning with subqueries
~12% speedup possible (Use UNNEST) in query performance
Geography Types
Execution time 1 min 17 sec
Using Nested fields
Execution time 0.8 sec, 43MB!
Clustering
Execution time 20 sec, 576MB! 10x speedup
1. Data Studio
2. Tableau
3. Power BI
4. Qlik View
5. Metric Insights
6. Supermetrics
7. Bime
Advance visualization tools for data analysis
BigQuery support following format for data loading
Avro, CSV, TSV, JSON,ORC, Parquet, Cloud Datastore exports, Cloud Firestore exports
Big Query
tool
Web
Browser
API
Big
Query
Data Format & Accessing BigQuery
Insights
Analysis Using Google Data Studio
● Country name
● State name
● Record count
Insights
Analysis Using Google Data Studio
● Country name
● State name
● Record count
• CSV/JSON must be split into chunks less than 1TB
• Split to smaller files
Easier error recovery
To smaller data unit (day, month instead of year)
• Split tables by dates
Minimize cost of data scanned
Minimize query time
• Denormalize or pre-join where possible
• For Query - Query only the columns(SELECT name) that you need instead of select
all(SELECT *)
A Few Best Practices
• 1,000 import jobs per table per day
• 10,000 import jobs per project per day
• File size (for both CSV and JSON)
1GB for compressed file
1TB for uncompressed
• 10,000 files per import job
• 1TB per import job
BigQuery Data Load
Google Developer Group - Cloud Singapore BigQuery Webinar

Google Developer Group - Cloud Singapore BigQuery Webinar

  • 2.
    Google BigQuery isthe future of Analytics!
  • 3.
    1. Generate bigdata reports require expensive servers and skilled database administrators 2. Interacting with big data has been expensive, slow and inefficient 3. BigQuery changes all that reducing time and expense to query data 4. Super fast SQL queries - run queries on terabyte data sets in seconds 5. Scalable – i) Store hundreds of terabytes ii) Pay only for what you use 6. Service for interactive analysis of massive datasets: a) Query billions of rows: seconds to write, seconds to return b) Uses a SQL style query syntax c) It's a service, accessed by a RESTful API Why BigQuery
  • 4.
    This Talk ● Fromzero to business impact ● Cut off analysis time by up to 90% with BigQuery ● Advanced visualization with data analysis tools ● Few best practices
  • 5.
    ● Every companyis evolving into a tech solution and by extension a data company ● For small company it's important to have access big data tools without running a dedicated team for it Turning data into Analytics
  • 6.
    Legacy Business Analysis Web Mobile WebServer Platform Services Scheduled Tasks Batch Processing Compute Engine Database Business Analysis
  • 7.
    Behind The Scenes:Days To Insights Web Mobile Web Server Platform Services Scheduled Tasks Batch Processing Database Business Analysis Minutes to kick in Hours to Run Batch Processing Hours to Clean and Aggregate Days to Insights
  • 8.
    Architecture For TheCloud Frontend Service Event Sourcing Metrics/Logs ETL Engine Pipelines Google Cloud PlatformOn-Premises
  • 9.
  • 10.
    How Big isB-I-G Youtube Media data 15+ exabytes (2017) Inventory & Customer Data 42 Terabytes (2014) Gmail only 18.5+ petabytes (2018) English article 10 + Terabytes (2013) Amazon Google Wikipedia
  • 11.
    Improve The Performanceof BigQuery Geography Types Partition pruning with subqueries Using Nested fields Clustering
  • 12.
    Execution time 30.0sec Partition pruning with subqueries
  • 13.
    Execution time 9.6sec Partition pruning with subqueries
  • 14.
    ~12% speedup possible(Use UNNEST) in query performance Geography Types
  • 15.
    Execution time 1min 17 sec Using Nested fields
  • 16.
    Execution time 0.8sec, 43MB! Clustering Execution time 20 sec, 576MB! 10x speedup
  • 17.
    1. Data Studio 2.Tableau 3. Power BI 4. Qlik View 5. Metric Insights 6. Supermetrics 7. Bime Advance visualization tools for data analysis
  • 18.
    BigQuery support followingformat for data loading Avro, CSV, TSV, JSON,ORC, Parquet, Cloud Datastore exports, Cloud Firestore exports Big Query tool Web Browser API Big Query Data Format & Accessing BigQuery
  • 19.
    Insights Analysis Using GoogleData Studio ● Country name ● State name ● Record count
  • 20.
    Insights Analysis Using GoogleData Studio ● Country name ● State name ● Record count
  • 21.
    • CSV/JSON mustbe split into chunks less than 1TB • Split to smaller files Easier error recovery To smaller data unit (day, month instead of year) • Split tables by dates Minimize cost of data scanned Minimize query time • Denormalize or pre-join where possible • For Query - Query only the columns(SELECT name) that you need instead of select all(SELECT *) A Few Best Practices
  • 22.
    • 1,000 importjobs per table per day • 10,000 import jobs per project per day • File size (for both CSV and JSON) 1GB for compressed file 1TB for uncompressed • 10,000 files per import job • 1TB per import job BigQuery Data Load