Workshop 20140522 BigQuery Implementation

1,036 views

Published on

The BigQuery starter guide for load data using CSV or JSON format. And the query guide...

Published in: Technology, News & Politics

Workshop 20140522 BigQuery Implementation

  1. 1. MiTAC MiCloud - Google Cloud Platform Partner @ APAC2014Q2 BigQuery Workshop Google BigQuery Big data with SQL like query feature, but fast... Google BigQueryGoogle BigQuery http://goo.gl/XZmqgN
  2. 2. RESTful GCE LB 前言: ● 我們要實作喔~ 有興趣的 朋友,請打開您的電腦... ● 開好GCP專案? ● Enable Billing了? ● 裝好google_cloud_sdk? ● 這裡的無線AP: ○ 帳號: ○ 密碼: Data Access Big Data Access Frontend Services Backend Services
  3. 3. BigQuery它是... ● TB level data analysis ● Fast mining response ● SQL like query language ● Multi-dataset interactive support ● Cheap and pay by use ● Offline job support
  4. 4. Getting Start
  5. 5. BigQuery Web UI https://bigquery.cloud.google.com/
  6. 6. BigQuery structure ● Project ● Dataset ● Table ● Job
  7. 7. Handson - Import
  8. 8. Sample Data...
  9. 9. The easily way - Import Wizard
  10. 10. JCMB_2014.csv Schema date_time:String,atmospheric_pressure:float, rainfall:float,wind_speed:float,wind_direction: float,surface_temperature:float, relative_humidity:float,solar_flux:float,battery: float
  11. 11. Load Data to BigQuery in CMD CSV / JSON Cloud Storage BigQuery
  12. 12. Load CSV to BigQuery gsutil cp [source] gs://[bucket-name] # gsutil cp ~/Desktop/log.csv gs://your-bucket/ Copying file:///Users/simonsu/Desktop/log.csv [Content-Type=text/csv]... Uploading: 4.59 MB/36.76 MB bq load [project]:[dataset].[table] gs://[bucket]/[csv path] [schema] # bq load project.dataset gs://your-bucket/log.csv IP:STRING,DNS:STRING,TS:STRING,URL:STRING Waiting on bqjob_rf4f3f1d9e2366a6_00000142c1bdd36f_1 ... (24s) Current status: DONE
  13. 13. Load JSON to BigQuery bq load --source_format NEWLINE_DELIMITED_JSON [project]:[dataset].[table] [json file] [schema file] # bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest ./sample.json ./schema.json Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE # bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest gs://your-bucket/sample.json ./schema. json Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE
  14. 14. Handson - Query
  15. 15. Web way - Query Console
  16. 16. Install google_cloud_sdk (https://developers.google.com/cloud/sdk/) Shell way - bq commad
  17. 17. Shell way - bq commad bq query <sql_query> # bq query 'select charge_unit,charge_desc,one_charge from testbq.test'
  18. 18. BigQuery - Query Language
  19. 19. Query syntax ● SELECT ● WITHIN ● FROM ● FLATTEN ● JOIN ● WHERE ● GROUP BY ● HAVING ● ORDER BY ● LIMIT Query support Supported functions and operators ● Aggregate functions ● Arithmetic operators ● Bitwise operators ● Casting functions ● Comparison functions ● Date and time functions ● IP functions ● JSON functions ● Logical operators ● Mathematical functions ● Regular expression functions ● String functions ● Table wildcard functions ● URL functions ● Window functions ● Other functions
  20. 20. select charge_unit,charge_desc,one_charge from testbq.test Select +-----------------+----------------+--------------------+ | charge_unit | charge_desc | one_charge | +-----------------+----------------+--------------------+ | M | 按月計費 |0 | | D | 按日計費 |0 | | HH | 小時計費 |0 | | T | 分計費 |0 | | SS | 按次計費 |1 | +-----------------+----------------+--------------------+
  21. 21. SELECT a.order_id,a.sales,b.begin_use_date FROM testbq.order_master a LEFT JOIN testbq.order_detail b ON a.order_id = b.order_id Join +-----------------+----------------+-----------------------------+ | a_order_id | a_sales | b_begin_use_date | +-----------------+----------------+-----------------------------+ | OM2003 | D589 | 2011-11-01 17:43:00 UTC | | OM2004 | D589 | 2011-11-01 09:43:00 UTC | | OM2005 | D589 | 2011-11-01 17:55:00 UTC | | OM2006 | D589 | 2011-11-01 17:54:00 UTC | | OM2007 | D589 | 2011-11-03 16:31:00 UTC | +-----------------+----------------+-----------------------------+
  22. 22. SELECT fullName, age, gender, citiesLived.place FROM (FLATTEN([dataset.tableId], children)) WHERE (citiesLived.yearsLived > 1995) AND (children.age > 3) GROUP BY fullName, age, gender, citiesLived.place Flatten +------------+-----+--------+--------------------+ | fullName | age | gender | citiesLived_place | +------------+-----+--------+--------------------+ | John Doe | 22 | Male | Stockholm | | Mike Jones | 35 | Male | Los Angeles | | Mike Jones | 35 | Male | Washington DC | | Mike Jones | 35 | Male | Portland | | Mike Jones | 35 | Male | Austin | +------------+-----+--------+---------------------+
  23. 23. SELECT word, COUNT(word) AS count FROM publicdata:samples.shakespeare WHERE (REGEXP_MATCH(word,r'ww'ww')) GROUP BY word ORDER BY count DESC LIMIT 3; Regular Expression +-----------------+----------------+ | word | count | +-----------------+----------------+ | ne'er | 42 | | we'll | 35 | | We'll | 33 | +-----------------+----------------+
  24. 24. SELECT TOP (FORMAT_UTC_USEC(timestamp * 1000000), 5) AS top_revision_time, COUNT (*) AS revision_count FROM [publicdata:samples.wikipedia]; +----------------------------+----------------+ | top_revision_time | revision_count | +----------------------------+----------------+ | 2002-02-25 15:51:15.000000 | 20971 | | 2002-02-25 15:43:11.000000 | 15955 | | 2010-01-14 15:52:34.000000 | 3 | | 2009-12-31 19:29:19.000000 | 3 | | 2009-12-28 18:55:12.000000 | 3 | +----------------------------+----------------+ Time Function
  25. 25. SELECT DOMAIN(repository_homepage) AS user_domain, COUNT(*) AS activity_count FROM [publicdata:samples.github_timeline] GROUP BY user_domain HAVING user_domain IS NOT NULL AND user_domain != '' ORDER BY activity_count DESC LIMIT 5; IP Function +-----------------+----------------+ | user_domain | activity_count | +-----------------+----------------+ | github.com | 281879 | | google.com | 34769 | | khanacademy.org | 17316 | | sourceforge.net | 15103 | | mozilla.org | 14091 | +-----------------+----------------+
  26. 26. Handson - Programming
  27. 27. ● Prepare a Google Cloud Platform project ● Create a Service Account ● Generate key from Service Account p12 key Prepare
  28. 28. Google Service Account web server appliction service account v.s.
  29. 29. Prepare Authentications p12 key → pem key轉換 $ openssl pkcs12 -in privatekey.p12 -out privatekey.pem -nocerts $ openssl rsa -in privatekey.pem -out key.pem
  30. 30. Node.js - bigquery模組 var bq = require('bigquery') , prjId = 'your-bigquery-project-id'; bq.init({ client_secret: '/path/to/client_secret.json', key_pem: '/path/to/key.pem' }); bq.job.listds(prjId, function(e,r,d){ if(e) console.log(e); console.log(JSON.stringify(d)); }); 操作時,透過bq呼叫job之下的 function做操作 bigquery模組可參考:https://github.com/peihsinsu/bigquery
  31. 31. /* Ref: https://developers.google.com/apps-script/advanced/bigquery */ var request = { query: 'SELECT TOP(word, 30) AS word, COUNT(*) AS word_count ' + 'FROM publicdata:samples.shakespeare WHERE LENGTH(word) > 10;' }; var queryResults = BigQuery.Jobs.query(request, projectId); var jobId = queryResults.jobReference.jobId; queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId); var rows = queryResults.rows; while (queryResults.pageToken) { queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId, { pageToken: queryResults.pageToken }); rows = rows.concat(queryResults.rows); } Google Drive way - Apps Script
  32. 32. ● Features: https://cloud.google.com/products/bigquery#features ● Case Studies: https://cloud.google.com/products/bigquery#case- studies ● Pricing: https://cloud.google.com/products/bigquery#pricing ● Documentation: https://cloud.google. com/products/bigquery#documentation ● Query Reference: https://developers.google.com/bigquery/query- reference References
  33. 33. http://goo.gl/LD4RN4

×