GGooooggllee BBiiggQQuueerryy 
Google BigQuery - Big data with SQL like query feature, but fast...
BigQuery Features 
● TB level data analysis 
● Fast mining response 
● SQL like query language 
● Multi-dataset interactiv...
Getting Start
BigQuery Web UI 
https://bigquery.cloud.google.com/
BigQuery structure 
● Project 
● Dataset 
● Table 
● Job
Handson - Import
The easily way - Import Wizard
Load Data to BigQuery in CMD 
CSV / JSON Cloud Storage BigQuery
Load CSV to BigQuery 
gsutil cp [source] gs://[bucket-name] 
# gsutil cp ~/Desktop/log.csv gs://your-bucket/ 
Copying file...
Load JSON to BigQuery 
bq load --source_format NEWLINE_DELIMITED_JSON  
[project]:[dataset].[table] [json file] [schema fi...
Handson - Query
Web way - Query Console
Shell way - bq commad 
Install google_cloud_sdk (https://developers.google.com/cloud/sdk/)
Shell way - bq commad 
bq query <sql_query> 
# bq query 'select charge_unit,charge_desc,one_charge from testbq.test'
BigQuery - Query Language
Query support 
Query syntax 
● SELECT 
● WITHIN 
● FROM 
● FLATTEN 
● JOIN 
● WHERE 
● GROUP BY 
● HAVING 
● ORDER BY 
● L...
Select 
select charge_unit,charge_desc,one_charge from testbq.test 
+-----------------+----------------+------------------...
Join 
SELECT a.THEID, a.THENAME ,b.DESCRIPITON 
FROM user01.USER_MST a LEFT JOIN user01.USER_DETAIL_MST b 
on a.THEID = b....
Flatten 
SELECT 
fullName, 
age, 
gender, 
citiesLived.place 
FROM (FLATTEN([dataset.tableId], children)) 
WHERE 
(citiesL...
Regular Expression 
SELECT 
word, 
COUNT(word) AS count 
FROM 
publicdata:samples.shakespeare 
WHERE 
(REGEXP_MATCH(word,r...
Time Function 
SELECT 
TOP (FORMAT_UTC_USEC(timestamp * 1000000), 5) 
AS top_revision_time, 
COUNT (*) AS revision_count 
...
IP Function 
SELECT 
DOMAIN(repository_homepage) AS user_domain, 
COUNT(*) AS activity_count 
FROM 
[publicdata:samples.gi...
Handson - Programming
Prepare 
● Prepare a Google Cloud Platform project 
● Create a Service Account 
● Generate key from Service Account p12 ke...
Google Service Account 
web server appliction 
service account 
v.s.
Prepare Authentications 
p12 key → pem key轉換 
$ openssl pkcs12 -in privatekey.p12 -out privatekey.pem -nocerts 
$ openssl ...
Node.js - bigquery模組 
var bq = require('bigquery') 
, prjId = 'your-bigquery-project-id'; 
bq.init({ 
client_secret: '/pat...
Google Drive way - Apps Script 
/* Ref: https://developers.google.com/apps-script/advanced/bigquery */ 
var request = { qu...
References 
● Features: https://cloud.google.com/products/bigquery#features 
● Case Studies: https://cloud.google.com/prod...
Upcoming SlideShare
Loading in...5
×

BigQuery implementation

314

Published on

Google BigQuery technical presentation for starting use of BigQuery

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
314
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
17
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

BigQuery implementation

  1. 1. GGooooggllee BBiiggQQuueerryy Google BigQuery - Big data with SQL like query feature, but fast...
  2. 2. BigQuery Features ● TB level data analysis ● Fast mining response ● SQL like query language ● Multi-dataset interactive support ● Cheap and pay by use ● Offline job support
  3. 3. Getting Start
  4. 4. BigQuery Web UI https://bigquery.cloud.google.com/
  5. 5. BigQuery structure ● Project ● Dataset ● Table ● Job
  6. 6. Handson - Import
  7. 7. The easily way - Import Wizard
  8. 8. Load Data to BigQuery in CMD CSV / JSON Cloud Storage BigQuery
  9. 9. Load CSV to BigQuery gsutil cp [source] gs://[bucket-name] # gsutil cp ~/Desktop/log.csv gs://your-bucket/ Copying file:///Users/simonsu/Desktop/log.csv [Content-Type=text/csv]... Uploading: 4.59 MB/36.76 MB bq load [project]:[dataset].[table] gs://[bucket]/[csv path] [schema] # bq load project.dataset gs://your-bucket/log.csv IP:STRING,DNS:STRING,TS:STRING,URL:STRING Waiting on bqjob_rf4f3f1d9e2366a6_00000142c1bdd36f_1 ... (24s) Current status: DONE
  10. 10. Load JSON to BigQuery bq load --source_format NEWLINE_DELIMITED_JSON [project]:[dataset].[table] [json file] [schema file] # bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest ./sample.json ./schema.json Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE # bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest gs://your-bucket/sample.json ./schema. json Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE
  11. 11. Handson - Query
  12. 12. Web way - Query Console
  13. 13. Shell way - bq commad Install google_cloud_sdk (https://developers.google.com/cloud/sdk/)
  14. 14. Shell way - bq commad bq query <sql_query> # bq query 'select charge_unit,charge_desc,one_charge from testbq.test'
  15. 15. BigQuery - Query Language
  16. 16. Query support Query syntax ● SELECT ● WITHIN ● FROM ● FLATTEN ● JOIN ● WHERE ● GROUP BY ● HAVING ● ORDER BY ● LIMIT Supported functions and operators ● Aggregate functions ● Arithmetic operators ● Bitwise operators ● Casting functions ● Comparison functions ● Date and time functions ● IP functions ● JSON functions ● Logical operators ● Mathematical functions ● Regular expression functions ● String functions ● Table wildcard functions ● URL functions ● Window functions ● Other functions
  17. 17. Select select charge_unit,charge_desc,one_charge from testbq.test +-----------------+----------------+--------------------+ | charge_unit | charge_desc | one_charge | +-----------------+----------------+--------------------+ | M | 按月計費 |0 | | D | 按日計費 |0 | | HH | 小時計費 |0 | | T | 分計費 |0 | | SS | 按次計費 |1 | +-----------------+----------------+--------------------+
  18. 18. Join SELECT a.THEID, a.THENAME ,b.DESCRIPITON FROM user01.USER_MST a LEFT JOIN user01.USER_DETAIL_MST b on a.THEID = b.THEID limit 10' +-----------------+----------------+-----------------------------+ | a_THEPID | a_THENAME | b_DESCRIPITON | +-----------------+----------------+-----------------------------+ | 2 | 關於道具 |在道具編成道具。 | | 2 | 關於道具 |寶玉。 | | 1 | 關於夥伴 |勇氣覺醒。 | | 1 | 關於夥伴 |編輯進行任務的隊伍。 | | 1 | 關於夥伴 |數個不同的類型 | +-----------------+----------------+-----------------------------+
  19. 19. Flatten SELECT fullName, age, gender, citiesLived.place FROM (FLATTEN([dataset.tableId], children)) WHERE (citiesLived.yearsLived > 1995) AND (children.age > 3) GROUP BY fullName, age, gender, citiesLived.place +------------+-----+--------+--------------------+ | fullName | age | gender | citiesLived_place | +------------+-----+--------+--------------------+ | John Doe | 22 | Male | Stockholm | | Mike Jones | 35 | Male | Los Angeles | | Mike Jones | 35 | Male | Washington DC | | Mike Jones | 35 | Male | Portland | | Mike Jones | 35 | Male | Austin | +------------+-----+--------+---------------------+
  20. 20. Regular Expression SELECT word, COUNT(word) AS count FROM publicdata:samples.shakespeare WHERE (REGEXP_MATCH(word,r'ww'ww')) GROUP BY word ORDER BY count DESC LIMIT 3; +-----------------+----------------+ | word | count | +-----------------+----------------+ | ne'er | 42 | | we'll | 35 | | We'll | 33 | +-----------------+----------------+
  21. 21. Time Function SELECT TOP (FORMAT_UTC_USEC(timestamp * 1000000), 5) AS top_revision_time, COUNT (*) AS revision_count FROM [publicdata:samples.wikipedia]; +----------------------------+----------------+ | top_revision_time | revision_count | +----------------------------+----------------+ | 2002-02-25 15:51:15.000000 | 20971 | | 2002-02-25 15:43:11.000000 | 15955 | | 2010-01-14 15:52:34.000000 | 3 | | 2009-12-31 19:29:19.000000 | 3 | | 2009-12-28 18:55:12.000000 | 3 | +----------------------------+----------------+
  22. 22. IP Function SELECT DOMAIN(repository_homepage) AS user_domain, COUNT(*) AS activity_count FROM [publicdata:samples.github_timeline] GROUP BY user_domain HAVING user_domain IS NOT NULL AND user_domain != '' ORDER BY activity_count DESC LIMIT 5; +-----------------+----------------+ | user_domain | activity_count | +-----------------+----------------+ | github.com | 281879 | | google.com | 34769 | | khanacademy.org | 17316 | | sourceforge.net | 15103 | | mozilla.org | 14091 | +-----------------+----------------+
  23. 23. Handson - Programming
  24. 24. Prepare ● Prepare a Google Cloud Platform project ● Create a Service Account ● Generate key from Service Account p12 key
  25. 25. Google Service Account web server appliction service account v.s.
  26. 26. Prepare Authentications p12 key → pem key轉換 $ openssl pkcs12 -in privatekey.p12 -out privatekey.pem -nocerts $ openssl rsa -in privatekey.pem -out key.pem
  27. 27. Node.js - bigquery模組 var bq = require('bigquery') , prjId = 'your-bigquery-project-id'; bq.init({ client_secret: '/path-to-client_secret.json', privatekey_pem: '/path-to-privatekey.pem', key_pem: '/path-to-key.pem' }); bq.job.listds(prjId, function(e,r,d){ if(e) console.log(e); console.log(JSON.stringify(d)); }); 操作時,透過bq呼叫job之下的 function做操作 bigquery模組可參考:https://github.com/peihsinsu/bigquery
  28. 28. Google Drive way - Apps Script /* Ref: https://developers.google.com/apps-script/advanced/bigquery */ var request = { query: 'SELECT TOP(word, 30) AS word, COUNT(*) AS word_count ' + 'FROM publicdata:samples.shakespeare WHERE LENGTH(word) > 10;' }; var queryResults = BigQuery.Jobs.query(request, projectId); var jobId = queryResults.jobReference.jobId; queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId); var rows = queryResults.rows; while (queryResults.pageToken) { queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId, { pageToken: queryResults.pageToken }); rows = rows.concat(queryResults.rows); }
  29. 29. References ● Features: https://cloud.google.com/products/bigquery#features ● Case Studies: https://cloud.google.com/products/bigquery#case-studies ● Pricing: https://cloud.google.com/products/bigquery#pricing ● Documentation: https://cloud.google. com/products/bigquery#documentation ● Query Reference: https://developers.google.com/bigquery/query-reference
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×