BigQuery implementation

GGooooggllee BBiiggQQuueerryy
Google BigQuery - Big data with SQL like query feature, but fast...

BigQuery Features
● TB level data analysis
● Fast mining response
● SQL like query language
● Multi-dataset interactive
support
● Cheap and pay by use
● Offline job support

BigQuery Web UI
https://bigquery.cloud.google.com/

BigQuery structure
● Project
● Dataset
● Table
● Job

The easily way - Import Wizard

Load Data to BigQuery in CMD
CSV / JSON Cloud Storage BigQuery

Load CSV to BigQuery
gsutil cp [source] gs://[bucket-name]
# gsutil cp ~/Desktop/log.csv gs://your-bucket/
Copying file:///Users/simonsu/Desktop/log.csv [Content-Type=text/csv]...
Uploading: 4.59 MB/36.76 MB
bq load [project]:[dataset].[table] gs://[bucket]/[csv path] [schema]
# bq load project.dataset gs://your-bucket/log.csv IP:STRING,DNS:STRING,TS:STRING,URL:STRING
Waiting on bqjob_rf4f3f1d9e2366a6_00000142c1bdd36f_1 ... (24s) Current status: DONE

Load JSON to BigQuery
bq load --source_format NEWLINE_DELIMITED_JSON
[project]:[dataset].[table] [json file] [schema file]
# bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest ./sample.json ./schema.json
Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE
# bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest gs://your-bucket/sample.json ./schema.
json
Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE

Shell way - bq commad
Install google_cloud_sdk (https://developers.google.com/cloud/sdk/)

Shell way - bq commad
bq query <sql_query>
# bq query 'select charge_unit,charge_desc,one_charge from testbq.test'

Query support
Query syntax
● SELECT
● WITHIN
● FROM
● FLATTEN
● JOIN
● WHERE
● GROUP BY
● HAVING
● ORDER BY
● LIMIT
Supported functions and operators
● Aggregate functions
● Arithmetic operators
● Bitwise operators
● Casting functions
● Comparison functions
● Date and time functions
● IP functions
● JSON functions
● Logical operators
● Mathematical functions
● Regular expression functions
● String functions
● Table wildcard functions
● URL functions
● Window functions
● Other functions

Select
select charge_unit,charge_desc,one_charge from testbq.test
+-----------------+----------------+--------------------+
| charge_unit | charge_desc | one_charge |
+-----------------+----------------+--------------------+
| M | 按月計費 |0 |
| D | 按日計費 |0 |
| HH | 小時計費 |0 |
| T | 分計費 |0 |
| SS | 按次計費 |1 |
+-----------------+----------------+--------------------+

Regular Expression
SELECT
word,
COUNT(word) AS count
FROM
publicdata:samples.shakespeare
WHERE
(REGEXP_MATCH(word,r'ww'ww'))
GROUP BY word
ORDER BY count DESC
LIMIT 3;
+-----------------+----------------+
| word | count |
+-----------------+----------------+
| ne'er | 42 |
| we'll | 35 |
| We'll | 33 |
+-----------------+----------------+

Time Function
SELECT
TOP (FORMAT_UTC_USEC(timestamp * 1000000), 5)
AS top_revision_time,
COUNT (*) AS revision_count
FROM
[publicdata:samples.wikipedia];
+----------------------------+----------------+
| top_revision_time | revision_count |
+----------------------------+----------------+
| 2002-02-25 15:51:15.000000 | 20971 |
| 2002-02-25 15:43:11.000000 | 15955 |
| 2010-01-14 15:52:34.000000 | 3 |
| 2009-12-31 19:29:19.000000 | 3 |
| 2009-12-28 18:55:12.000000 | 3 |
+----------------------------+----------------+

IP Function
SELECT
DOMAIN(repository_homepage) AS user_domain,
COUNT(*) AS activity_count
FROM
[publicdata:samples.github_timeline]
GROUP BY
user_domain
HAVING
user_domain IS NOT NULL AND user_domain != ''
ORDER BY
activity_count DESC
LIMIT 5;
+-----------------+----------------+
| user_domain | activity_count |
+-----------------+----------------+
| github.com | 281879 |
| google.com | 34769 |
| khanacademy.org | 17316 |
| sourceforge.net | 15103 |
| mozilla.org | 14091 |
+-----------------+----------------+

Prepare
● Prepare a Google Cloud Platform project
● Create a Service Account
● Generate key from Service Account p12 key

Google Service Account
web server appliction
service account
v.s.

Prepare Authentications
p12 key → pem key轉換
$ openssl pkcs12 -in privatekey.p12 -out privatekey.pem -nocerts
$ openssl rsa -in privatekey.pem -out key.pem

Node.js - bigquery模組
var bq = require('bigquery')
, prjId = 'your-bigquery-project-id';
bq.init({
client_secret: '/path-to-client_secret.json',
privatekey_pem: '/path-to-privatekey.pem',
key_pem: '/path-to-key.pem'
});
bq.job.listds(prjId, function(e,r,d){
if(e) console.log(e);
console.log(JSON.stringify(d));
});
操作時，透過bq呼叫job之下的
function做操作
bigquery模組可參考：https://github.com/peihsinsu/bigquery

Google Drive way - Apps Script
/* Ref: https://developers.google.com/apps-script/advanced/bigquery */
var request = { query: 'SELECT TOP(word, 30) AS word, COUNT(*) AS word_count ' +
'FROM publicdata:samples.shakespeare WHERE LENGTH(word) > 10;' };
var queryResults = BigQuery.Jobs.query(request, projectId);
var jobId = queryResults.jobReference.jobId;
queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId);
var rows = queryResults.rows;
while (queryResults.pageToken) {
queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId, {
pageToken: queryResults.pageToken
});
rows = rows.concat(queryResults.rows);
}

References
● Features: https://cloud.google.com/products/bigquery#features
● Case Studies: https://cloud.google.com/products/bigquery#case-studies
● Pricing: https://cloud.google.com/products/bigquery#pricing
● Documentation: https://cloud.google.
com/products/bigquery#documentation
● Query Reference: https://developers.google.com/bigquery/query-reference

BigQuery implementation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to BigQuery implementation

Similar to BigQuery implementation (20)

More from Simon Su

More from Simon Su (20)

Recently uploaded

Recently uploaded (20)

BigQuery implementation