Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BigQuery Basics

Paris 2014
BigQuery Basics

Who? Why?
Ido Green
Solutions Architect
plus.google.com/greenido

greenido.wordpress.com
BigQuery Basics

Topics we cover in this lesson
●
●
●
●
●
●
●

BigQuery Overview
Typical Uses
Project Hierarchy
Access Con...
BigQuery Basics

How does BigQuery fit in the analytics landscape?
● MapReduce based analysis can be slow for ad-hoc queri...
BigQuery Basics

Why BigQuery?
● Generate big data reports require expensive servers
and skilled database administrators
●...
BigQuery Basics

What's BigQuery?
● Service for interactive analysis of massive datasets (TBs)
○ Query billions of rows: s...
BigQuery Basics

Analyzing Large Amount of Data
.....at high speed

demobigquery.appspot.com
Uses
BigQuery Basics

Typical Uses
Analyzing query results using a visualization library such as Google
Charts Tools API
BigQuery Basics

Typical Uses
Another way to analyze query results with Google Spreadsheets
○

greenido.wordpress.com/2013...
BigQuery Basics

BigQuery Use Cases
● Log Analysis. Making sense of computer generated records
● Retailer. Using data to f...
BigQuery Basics

Some Customer Case Studies
Uses BigQuery to hone ad targeting
and gain insights into their business
Dashb...
BigQuery Basic Technical Details
BigQuery Basics

Project Hierarchy
● Project. All data in BigQuery belongs inside a project
○ Set of users, APIs, authenti...
BigQuery Basics

Datasets and Tables
Table name is represented as
follows:
● Current Project
<dataset>.<table
name>
● Diff...
BigQuery Basics

Schema Example
● Demographics about names occurrence table schema
name:string,gender:string,count:integer
BigQuery Basics

Data Types
●
●
●
●
●

String
○ UTF-8 encoded, <64kB
Integer
○ 64 bit signed
Float
Boolean
○ "true" or "fa...
BigQuery Basics

Data Format
BigQuery supports the following format for loading data:
1. Comma Separated Values (CSV)
2. J...
BigQuery Basics

Repeated and Nested Fields

[
[

Schema
example

{
{
"fields": [
"fields": [
{
{

Loading data with repea...
BigQuery Basics

Accessing BigQuery
● BigQuery Web browser
○

Imports/exports data, runs
queries

● bq command line tool
○...
BigQuery Basics

Third-party Tools
ETL tools for loading data into BigQuery

Visualization and Business Intelligence
BigQuery Basics

Example of Visualization Tools
Using commercial visualization tools to graph the query results
BigQuery Basics

Loading Data Using the Web Browser
●
●
●
●

Upload from local disk or from Cloud Storage
Start the Web br...
BigQuery Basics

Loading Data Using bq Tool
"bq load" command
Syntax
bq load [--source_format=NEWLINE_DELIMITED_JSON|CSV]
...
BigQuery Basics

Load Limitations
● 1,000 import jobs per table per day
● 10,000 import jobs per project per day
● File si...
BigQuery Basics

A Few Best Practices
CSV/JSON must be split into chunks less than 1TB
● "split" command with --line-bytes...
BigQuery Basics

A Few Best Practices
● Split Tables by Dates
○ Minimize cost of data scanned
○ Minimize query time
● Uplo...
BigQuery Basics

Exercise & Questions
BigQuery Basics

Exercise
Work through Big Query Exercise 1 -- Basics
● Use the BigQuery UI
● Use the bq command line tool...
BigQuery Basics

Questions
● What are the different ways to load data into
BigQuery?
● What is the maximum size of data in...
BigQuery Basics

Google I/O Data Sensing
● Start the BigQuery Web browser
● Click on Display Project in the project choose...
BigQuery Basics

Data Structure
● Define table schema when creating table
● Data is stored in per-column structure
● Each ...
BigQuery Basics

Thank you!
Questions?
Upcoming SlideShare
Loading in …5
×

Big Query Basics

19,796 views

Published on

The 'macro view' on Big Query:
We started with an overview, some typical uses and moved to project hierarchy, access control and security.
In the end we touch about tools and demos.

Published in: Technology
  • Be the first to comment

Big Query Basics

  1. 1. BigQuery Basics Paris 2014
  2. 2. BigQuery Basics Who? Why? Ido Green Solutions Architect plus.google.com/greenido greenido.wordpress.com
  3. 3. BigQuery Basics Topics we cover in this lesson ● ● ● ● ● ● ● BigQuery Overview Typical Uses Project Hierarchy Access Control and Security Datasets and Tables Tools Demos
  4. 4. BigQuery Basics How does BigQuery fit in the analytics landscape? ● MapReduce based analysis can be slow for ad-hoc queries ● Managing data centers and tuning software takes time & money ● Analytics tools should be services
  5. 5. BigQuery Basics Why BigQuery? ● Generate big data reports require expensive servers and skilled database administrators ● Interacting with big data has been expensive, slow and inefficient ● BigQuery changes all that ○ Reducing time and expense to query data
  6. 6. BigQuery Basics What's BigQuery? ● Service for interactive analysis of massive datasets (TBs) ○ Query billions of rows: seconds to write, seconds to return ○ Uses a SQL-style query syntax ○ It's a service, accessed by a RESTful API ● Reliable and secure ○ Replicated across multiple sites ○ Secured through Access Control Lists ● Scalable ○ Store hundreds of terabytes ○ Pay only for what you use ● Fast (really) ○ Run ad hoc queries on multi-terabyte data sets in seconds
  7. 7. BigQuery Basics Analyzing Large Amount of Data .....at high speed demobigquery.appspot.com
  8. 8. Uses
  9. 9. BigQuery Basics Typical Uses Analyzing query results using a visualization library such as Google Charts Tools API
  10. 10. BigQuery Basics Typical Uses Another way to analyze query results with Google Spreadsheets ○ greenido.wordpress.com/2013/12/16/big-query-and-google-spreadsheet-intergration/ ○ greenido.wordpress.com/2013/07/24/big-query-power-with-javascript/
  11. 11. BigQuery Basics BigQuery Use Cases ● Log Analysis. Making sense of computer generated records ● Retailer. Using data to forecast product sales ● Ads Targeting. Targeting proper customer sections ● Sensor Data. Collect and visualize ambient data ● Data Mashup. Query terabytes of heterogeneous data
  12. 12. BigQuery Basics Some Customer Case Studies Uses BigQuery to hone ad targeting and gain insights into their business Dashboards using BigQuery to analyze booking and inventory data Use BigQuery to provide their customers ways to expand game engagement and find new channels for monetization Used BigQuery, App Engine and the Visualizaton API to build a business intelligence solution
  13. 13. BigQuery Basic Technical Details
  14. 14. BigQuery Basics Project Hierarchy ● Project. All data in BigQuery belongs inside a project ○ Set of users, APIs, authentication, billing information ● Dataset. Holds one or more tables ○ Lowest access control unit (to which ACLs are applied) ● Table. Row-column structure that contains actual data ● Job. Used to start potentially long running queries
  15. 15. BigQuery Basics Datasets and Tables Table name is represented as follows: ● Current Project <dataset>.<table name> ● Different Project <project>:<dataset>.<table> e.g. publicdata:samples.wikipedia
  16. 16. BigQuery Basics Schema Example ● Demographics about names occurrence table schema name:string,gender:string,count:integer
  17. 17. BigQuery Basics Data Types ● ● ● ● ● String ○ UTF-8 encoded, <64kB Integer ○ 64 bit signed Float Boolean ○ "true" or "false", case insensitive Timestamp ○ String format ■ YYYY-MM-DD HH:MM:SS[.sssss] [+/-][HH:MM] ○ Numeric format (seconds from UNIX epoch) ■ 1234567890, 1.234567890123456E9 (*) Max row size: 64kB Date type is supported as timestamp
  18. 18. BigQuery Basics Data Format BigQuery supports the following format for loading data: 1. Comma Separated Values (CSV) 2. JSON a. BigQuery can load data faster, embedded newlines. b. Supports nested/repeated data fields if your data con
  19. 19. BigQuery Basics Repeated and Nested Fields [ [ Schema example { { "fields": [ "fields": [ { { Loading data with repeated and nested fields is supported by JSON data format only "mode": "mode": "name": "name": "nullable", "nullable", "country", "country", "type": "string" "type": "string" }, }, { { "mode": "nullable", "mode": "nullable", "name": "city", "name": "city", "type": "string" "type": "string" } } ], ], "mode": "repeated", "mode": "repeated", "name": "location", "name": "location", "type": "record" "type": "record" }, }, ........... ...........
  20. 20. BigQuery Basics Accessing BigQuery ● BigQuery Web browser ○ Imports/exports data, runs queries ● bq command line tool ○ Performs operations from the command line ● Service API ○ RESTful API to access BigQuery programmatically ○ Requires authorization by OAuth2 ○ Google client libraries for Python, Java, JavaScript, PHP, ... ○
  21. 21. BigQuery Basics Third-party Tools ETL tools for loading data into BigQuery Visualization and Business Intelligence
  22. 22. BigQuery Basics Example of Visualization Tools Using commercial visualization tools to graph the query results
  23. 23. BigQuery Basics Loading Data Using the Web Browser ● ● ● ● Upload from local disk or from Cloud Storage Start the Web browser Select Dataset Create table and follow the wizard steps
  24. 24. BigQuery Basics Loading Data Using bq Tool "bq load" command Syntax bq load [--source_format=NEWLINE_DELIMITED_JSON|CSV] destination_table data_source_uri table_schema ● ● ● ● If not specified, the default file format is CSV (comma separated values) The files can also use newline delimited JSON format Schema ○ Either a filename or a comma-separated list of column_name:datatype pairs that describe the file format. Data source may be on local machine or on Cloud Storage
  25. 25. BigQuery Basics Load Limitations ● 1,000 import jobs per table per day ● 10,000 import jobs per project per day ● File size (for both CSV and JSON) ○ 1GB for compressed file ○ 1TB for uncompressed ■ 4GB for uncompressed CSV with newlines in strings ● 10,000 files per import job ● 1TB per import job
  26. 26. BigQuery Basics A Few Best Practices CSV/JSON must be split into chunks less than 1TB ● "split" command with --line-bytes option ● Split to smaller files ○ Easier error recovery ○ To smaller data unit (day, month instead of year) ● Uploading to Cloud Storage is recommended Cloud Storage BigQuery
  27. 27. BigQuery Basics A Few Best Practices ● Split Tables by Dates ○ Minimize cost of data scanned ○ Minimize query time ● Upload Multiple Files to Cloud Storage ○ Allows parallel upload into BigQuery ● Denormalize your data
  28. 28. BigQuery Basics Exercise & Questions
  29. 29. BigQuery Basics Exercise Work through Big Query Exercise 1 -- Basics ● Use the BigQuery UI ● Use the bq command line tool ● Upload a dataset You will query the public sample GSOD (global summary of day) weather dataset. You will get and upload earthquake data.
  30. 30. BigQuery Basics Questions ● What are the different ways to load data into BigQuery? ● What is the maximum size of data in a BigQuery table? ● How can we import data into BigQuery? ○ What's the limitation? ○ What formats does BigQuery accept?
  31. 31. BigQuery Basics Google I/O Data Sensing ● Start the BigQuery Web browser ● Click on Display Project in the project chooser dialog window ● Enter data-sensing-lab when prompted ● In the dataset data-sensing-lab:io_sensor_data, select the table moscone_io13 ● In the New Query box, enter the following query: SELECT * FROM [data-sensing-lab:io_sensor_data.moscone_io13] LIMIT 10 ● Click Run Query button ● Scroll to see relevant results
  32. 32. BigQuery Basics Data Structure ● Define table schema when creating table ● Data is stored in per-column structure ● Each column is handled separately and only combined when necessary Advantage of this data structure: ● No need to set index in advance ● Load only the relevant Columns
  33. 33. BigQuery Basics Thank you! Questions?

×