Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Google BigQuery
- Command line and Tips -
2016/06/08
Mulodo Vietnam Co., Ltd.
What’s BigQuery
Official site : https://cloud.google.com/bigquery/docs/
BigQuery is Google's fully managed, petabyte scale...
Previous study
“BigQuery - The First Step -“ (2016/05/26)
• Just try to start for Google Big Query
• Using query on the Go...
Command line tools and Tips
1. Preparation (install SDK and settings)
2. Try command line tools
create datasets, tables an...
1. Preparation steps
Preparation steps
1. Create “Google Cloud Platform(GCP)” account, and
BigQuery.
See) previous paper.
2. Install GCP SKD to...
2. Install GCP SKD
1. Installation
Install SKD to your PC. (1)
nemo@ubuntu-14:~$ curl https://sdk.cloud.google.com | bash
:
Installation directory (this will...
Install SKD to your PC. (2)
// check the commands
nemo@ubuntu-14:~$ which bq
/home/nemo/google-cloud-sdk/bin/bq
nemo@ubunt...
2. Install GCP SKD
2. Activate your account
Activate your GPC account (1)
1. Preparation (create account)
2. Go to Google Cloud platform (has no account)
3. “Try IT F...
nemo@ubuntu-14:~$ gcloud init
Welcome! This command will take you through the configuration of gcloud.
Your current config...
https://accounts.google.com/o/oauth2/auth?redirect_uri=ur&xxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
https://accounts.google.com/o/oauth2/auth?redirect_uri=ur&xxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
https://accounts.google.com/o/oauth2/auth?redirect_uri=ur&xxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
https://accounts.google.com/o/oauth2/auth?redirect_uri=ur&xxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
https://accounts.google.com/o/oauth2/auth?redirect_uri=ur&xxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
Activate your GPC account (8)
// set Project ID
nemo@ubuntu-14:~$ gcloud config set project {{PROJECT_ID}}
nemo@ubuntu-14:...
What a pain!
AWS is much easiler...
2. Try command line tools
Try Public data (1)
nemonemo@ubuntu-14:~$ bq show publicdata:samples.shakespeare
Table publicdata:samples.shakespeare
Last...
Try Public data (2)
nemo@ubuntu-14:~$ bq query "SELECT word, COUNT(word) as count FROM
publicdata:samples.shakespeare WHER...
Create Dataset (1)
nemo@ubuntu-14:~$ bq ls
<--- no dataset
nemo@ubuntu-14:~$ bq mk saigon_engineers
Dataset 'open-study-gr...
Create Dataset (2)
nemo@ubuntu-14:~$ bq ls
<--- no dataset
nemo@ubuntu-14:~$ bq mk saigon_engineers
Dataset 'open-study-gr...
Create table and import data (1)
name type
ID INTEGER
name STRING
engineer_type INTEGER
ID name type
1 nemo 1
2 miki 1
Sch...
Create table and import data (2)
Schema (schema.json)
[
{
"name":"id",
"type":"INTEGER"
},
{
"name":"name",
"type":"STRING...
Create table and import data (3)
Data (data.json)
{"id":1,"name":"nemo","engineer_type":1}
{"id":2,"name":"miki","engineer...
Create table and import data (4)
nemo@ubuntu-14:~$ bq load --source_format=NEWLINE_DELIMITED_JSON
saigon_engineers.enginee...
Create table and import data (5)
nemo@ubuntu-14:~$ bq load --source_format=NEWLINE_DELIMITED_JSON
saigon_engineers.enginee...
Create table and import data (6)
nemo@ubuntu-14:~$ bq mk open-study-group-
saigon:saigon_engineers.engineer_list schema.js...
Create table and import data (7)
nemo@ubuntu-14:~$ bq load --source_format=NEWLINE_DELIMITED_JSON
saigon_engineers.enginee...
Query (1)
nemo@ubuntu-14:~$ bq show saigon_engineers.engineer_list
Last modified Schema Total Rows
Total Bytes Expiration
...
Query (2)
nemo@ubuntu-14:~$ bq query "SELECT name FROM
saigon_engineers.engineer_list"
Waiting on bqjob_r12185d1aa88d92c8_...
Query (3)
nemo@ubuntu-14:~$ bq query --dry_run "SELECT name FROM
saigon_engineers.engineer_list"
Query successfully valida...
Hmm.
(finished??)
A bit more
3. Tips for business use
Pricing
Storage															$0.02	per	GB,	per	month		
Long	Term	Storage					$0.01	per	GB,	per	month	
Streaming	Inserts		...
Pricing
Storage															$0.02	per	GB,	per	month		
Long	Term	Storage					$0.01	per	GB,	per	month	
Streaming	Inserts		...
Column oriented (1)
Sample case : database of Books
ID	
(indexed)
title	
(indexed)
contents
1 The	Cat
Lorem	ipsum	dolor	si...
Column oriented (2)
ID	
(indexed)
title	
(indexed)
contents
1 The	Cat
Lorem	ipsum	dolor	sit	amet,	
consectetur	(...	1.2MB)...
Column oriented (3)
ID	
(indexed)
title	
(indexed)
contents
1 The	Cat
Lorem	ipsum	dolor	sit	amet,	
consectetur	(...	1.2MB)...
Column oriented (3)
ID	
(indexed)
title	
(indexed)
contents
1 The	Cat
Lorem	ipsum	dolor	sit	amet,	
consectetur	(...	1.2MB)...
Column oriented (4)
ID	
(indexed)
title	
(indexed)
contents
1 The	Cat
Lorem	ipsum	dolor	sit	amet,	
consectetur	(...	1.2MB)...
Column oriented (5)
ID	
(indexed)
title	
(indexed)
contents
1 The	Cat
Lorem	ipsum	dolor	sit	amet,	
consectetur	(...	1.2MB)...
Column oriented (6)
ID	
(indexed)
title	
(indexed)
contents
1 The	Cat
Lorem	ipsum	dolor	sit	amet,	
consectetur	(...	1.2MB)...
Column oriented (6)
ID	
(indexed)
title	
(indexed)
contents
1 The	Cat
Lorem	ipsum	dolor	sit	amet,	
consectetur	(...	1.2MB)...
It's really
dangerous!
Please,	Please	set	columns	in	queries.
Table division
Sample case : database of Books
select	id,	title	from	books	where	time	in	‘2016/06/17’
: : : :
ID	
(indexed...
Table division (1)
index	(time)
hash	data
hash	data
hash	data
data in databaseIndexes
scanned data
: : : :
ID	
(indexed)
t...
Table division (2)
data in database
scanned data
: : : :
ID	
(indexed)
title	
(indexed)
contents
time	
(indexed)
1 The	Cat...
Table division (2)
data in database
scanned data
: : : :
ID	
(indexed)
title	
(indexed)
contents
time	
(indexed)
1 The	Cat...
Table division (3)
ID	
(indexed)
title	
(indexed)
contents
time	
(indexed)
1 The	Cat
Lorem	ipsum	dolor	sit	
amet,	consecte...
Table division (4)
ID	
(indexed)
title	
(indexed)
contents
time	
(indexed)
1 The	Cat
Lorem	ipsum	dolor	sit	
amet,	consecte...
Table division (5)
ID	
(indexed)
title	
(indexed)
contents
time	
(indexed)
1 The	Cat
Lorem	ipsum	dolor	sit	
amet,	consecte...
Table division (6)
select	id,	title	from	books		
where	time	in	‘2016/06/16	-	2016/06/17’	
@BigQuery
SELECT	id,	title	FROM	...
Table division (7)
Other	ways	to	divide	tables.
Table	decorator	

		-	https://cloud.google.com/bigquery/table-decorators	
...
BigQuery is
Fast
Easy
Cheap
if it is used properly.
BigQuery is
Fast
Easy
Cheap
if it is used properly.
Remember
“--dry_run”
Thank you!
Upcoming SlideShare
Loading in …5
×

Big query - Command line tools and Tips - (MOSG)

2,010 views

Published on

BigQuery =Command line tools and Tips for business use=
Mulodo Open Study Group (MOSG) @Ho chi minh, Vietnam
http://www.meetup.com/Open-Study-Group-Saigon/events/231504491/

Published in: Technology
  • Login to see the comments

Big query - Command line tools and Tips - (MOSG)

  1. 1. Google BigQuery - Command line and Tips - 2016/06/08 Mulodo Vietnam Co., Ltd.
  2. 2. What’s BigQuery Official site : https://cloud.google.com/bigquery/docs/ BigQuery is Google's fully managed, petabyte scale, low cost analytics data warehouse. BigQuery is NoOps—there is no infrastructure to manage and you don't need a database administrator—so you can focus on analyzing data to find meaningful insights, use familiar SQL, and take advantage of our pay-as-you-go model. → DWH: SQL like (easy to use), Petabyte scale(for Huge data)
  3. 3. Previous study “BigQuery - The First Step -“ (2016/05/26) • Just try to start for Google Big Query • Using query on the Google Cloud Platform console. • Create your own Dataset and Table • Using query for your table GPC console. http://www.meetup.com/Open-Study-Group-Saigon/events/231233151/ http://www.slideshare.net/nemo-mulodo/big-query-the-first-step-mosg c.f. “Big Data - Overview - “ http://www.slide http://www.meetup.com/Open-Study-Group-Saigon/events/229243903/ share.net/nemo-mulodo/big-data-overview-mosg
  4. 4. Command line tools and Tips 1. Preparation (install SDK and settings) 2. Try command line tools create datasets, tables and insert data. 3. Tips for business use. How to charge? Tips to reduce cost.
  5. 5. 1. Preparation steps
  6. 6. Preparation steps 1. Create “Google Cloud Platform(GCP)” account, and BigQuery. See) previous paper. 2. Install GCP SKD to your PC. (Using Ubuntu on Vagrant) 1. Installation 2. Activate your account 3. Set accounts for GCP SDK.
  7. 7. 2. Install GCP SKD 1. Installation
  8. 8. Install SKD to your PC. (1) nemo@ubuntu-14:~$ curl https://sdk.cloud.google.com | bash : Installation directory (this will create a google-cloud-sdk subdirectory) (/home/nemo): <-- Just type Enter (or you want) : Do you want to help improve the Google Cloud SDK (Y/n)? y : ! BigQuery Command Line Tool ! 2.0.24 ! < 1 MiB ! ! BigQuery Command Line Tool (Platform Specific)! 2.0.24 ! < 1 MiB ! : Modify profile to update your $PATH and enable shell command completion? (Y/n)? y (or you want) : For more information on how to get started, please visit: https://cloud.google.com/sdk/#Getting_Started nemo@ubuntu-14:~$ . ~/.bashrc <-- reload your bash environment nemo@ubuntu-14:~$
  9. 9. Install SKD to your PC. (2) // check the commands nemo@ubuntu-14:~$ which bq /home/nemo/google-cloud-sdk/bin/bq nemo@ubuntu-14:~$ which gcloud /Users/nemo/google-cloud-sdk/bin/gcloud
  10. 10. 2. Install GCP SKD 2. Activate your account
  11. 11. Activate your GPC account (1) 1. Preparation (create account) 2. Go to Google Cloud platform (has no account) 3. “Try IT Free” https://cloud.google.com nemo@ubuntu-14:~$ gcloud init Welcome! This command will take you through the configuration of gcloud. Your current configuration has been set to: [default] To continue, you must log in. Would you like to log in (Y/n)? Go to the following link in your browser: https://accounts.google.com/o/oauth2/auth?redirect_uri=ur&xxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx access_type=offline Enter verification code: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx You are now logged in as: [xxxx@example.com] This account has no projects. Please create one in developers console (https:// console.developers.google.com/project) before running this command. nemo@ubuntu-14:~$
  12. 12. nemo@ubuntu-14:~$ gcloud init Welcome! This command will take you through the configuration of gcloud. Your current configuration has been set to: [default] To continue, you must log in. Would you like to log in (Y/n)? Go to the following link in your browser: https://accounts.google.com/o/oauth2/auth?redirect_uri=ur&xxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx access_type=offline Enter verification code: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx You are now logged in as: [xxxx@example.com] This account has no projects. Please create one in developers console (https:// console.developers.google.com/project) before running this command. nemo@ubuntu-14:~$ Activate your GPC account (2)
  13. 13. https://accounts.google.com/o/oauth2/auth?redirect_uri=ur&xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx access_type=offline Launch Browser Select accounts (if you already login with multiple accounts) Activate your GPC account (3)
  14. 14. https://accounts.google.com/o/oauth2/auth?redirect_uri=ur&xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx access_type=offline Accept permission Activate your GPC account (4)
  15. 15. https://accounts.google.com/o/oauth2/auth?redirect_uri=ur&xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx access_type=offline get verification code Activate your GPC account (5)
  16. 16. https://accounts.google.com/o/oauth2/auth?redirect_uri=ur&xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx access_type=offline Enter verification code: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx You are now logged in as: [xxxx@example.com] set the code Activate your GPC account (6)
  17. 17. https://accounts.google.com/o/oauth2/auth?redirect_uri=ur&xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx access_type=offline Enter verification code: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx You are now logged in as: [xxxx@example.com] check the accounts Activate your GPC account (7)
  18. 18. Activate your GPC account (8) // set Project ID nemo@ubuntu-14:~$ gcloud config set project {{PROJECT_ID}} nemo@ubuntu-14:~$ // check the accounts nemo@ubuntu-14:~$ gcloud auth list - xxx@example.com (active) To set the active account, run: $ gcloud config set account ``ACCOUNT'' nemo@ubuntu-14:~$
  19. 19. What a pain! AWS is much easiler...
  20. 20. 2. Try command line tools
  21. 21. Try Public data (1) nemonemo@ubuntu-14:~$ bq show publicdata:samples.shakespeare Table publicdata:samples.shakespeare Last modified Schema Total Rows Total Bytes Expiration ----------------- ------------------------------------ ------------ ------------- ------------ 26 Aug 21:43:49 |- word: string (required) 164656 6432064 |- word_count: integer (required) |- corpus: string (required) |- corpus_date: integer (required) publicdata : samples . shakespeare {PROJECT_ID} : {DATASET} . {TABLE}
  22. 22. Try Public data (2) nemo@ubuntu-14:~$ bq query "SELECT word, COUNT(word) as count FROM publicdata:samples.shakespeare WHERE word CONTAINS 'raisin' GROUP BY word" Waiting on bqjob_r5e78fd2c80d5923c_000001554d1c4acc_1 ... (0s) Current status: DONE +---------------+-------+ | word | count | +---------------+-------+ | raising | 5 | | dispraising | 2 | | Praising | 4 | | praising | 7 | | dispraisingly | 1 | | raisins | 1 | +---------------+-------+ nemo@ubuntu-14:~$
  23. 23. Create Dataset (1) nemo@ubuntu-14:~$ bq ls <--- no dataset nemo@ubuntu-14:~$ bq mk saigon_engineers Dataset 'open-study-group-saigon:saigon_engineers' successfully created. nemo@ubuntu-14:~$ bq ls datasetId ------------------ <-- created!! saigon_engineers nemo@ubuntu-14:~$
  24. 24. Create Dataset (2) nemo@ubuntu-14:~$ bq ls <--- no dataset nemo@ubuntu-14:~$ bq mk saigon_engineers Dataset 'open-study-group-saigon:saigon_engineers' successfully created. nemo@ubuntu-14:~$ bq ls datasetId ------------------ <-- created!! saigon_engineers nemo@ubuntu-14:~$ Added!! -->
  25. 25. Create table and import data (1) name type ID INTEGER name STRING engineer_type INTEGER ID name type 1 nemo 1 2 miki 1 Schema Data
  26. 26. Create table and import data (2) Schema (schema.json) [ { "name":"id", "type":"INTEGER" }, { "name":"name", "type":"STRING" }, { "name":"engineer_type", "type":"INTEGER" } ]
  27. 27. Create table and import data (3) Data (data.json) {"id":1,"name":"nemo","engineer_type":1} {"id":2,"name":"miki","engineer_type":1}
  28. 28. Create table and import data (4) nemo@ubuntu-14:~$ bq load --source_format=NEWLINE_DELIMITED_JSON saigon_engineers.engineer_list data.json schema.json Upload complete. Waiting on bqjob_r23b898932d75d49a_000001554e5cae2f_1 ... (1s) Current status: DONE nemo@ubuntu-14:~$ bk load {PROJECT_ID}:{DATASET}.{TABLE} {data} {schema} Create table and import data https://cloud.google.com/bigquery/loading-data
  29. 29. Create table and import data (5) nemo@ubuntu-14:~$ bq load --source_format=NEWLINE_DELIMITED_JSON saigon_engineers.engineer_list data.json id:integer, name:string, engineer_type:integer Upload complete. Waiting on bqjob_r33b7802ea96b2c5d_000001554e4d21d5_1 ... (2s) Current status: DONE nemo@ubuntu-14:~$ Create table and import data : Another way
  30. 30. Create table and import data (6) nemo@ubuntu-14:~$ bq mk open-study-group- saigon:saigon_engineers.engineer_list schema.json nemo@ubuntu-14:~$ Create table bk mk {PROJECT_ID}:{DATASET}.{TABLE} {schema}
  31. 31. Create table and import data (7) nemo@ubuntu-14:~$ bq load --source_format=NEWLINE_DELIMITED_JSON saigon_engineers.engineer_list data.json Upload complete. Waiting on bqjob_r13717485c2c472e3_000001554e5b3ca3_1 ... (2s) Current status: DONE nemo@ubuntu-14:~$ Import data to database bk load {PROJECT_ID}:{DATASET}.{TABLE} {data}
  32. 32. Query (1) nemo@ubuntu-14:~$ bq show saigon_engineers.engineer_list Last modified Schema Total Rows Total Bytes Expiration ----------------- --------------------------- ------------ ------------- ------------ 14 Jun 10:02:35 |- id: integer 2 44 |- name: string |- engineer_type: integer nemo@ubuntu-14:~$
  33. 33. Query (2) nemo@ubuntu-14:~$ bq query "SELECT name FROM saigon_engineers.engineer_list" Waiting on bqjob_r12185d1aa88d92c8_0000015552d709d2_1 ... (0s) Current status: DONE +------+ | name | +------+ | nemo | | miki | +------+ nemo@ubuntu-14:~$
  34. 34. Query (3) nemo@ubuntu-14:~$ bq query --dry_run "SELECT name FROM saigon_engineers.engineer_list" Query successfully validated. Assuming the tables are not modified, running this query will process 12 bytes of data. nemo@ubuntu-14:~$ bk query --dry_run “QUERY” - get size of using memory before execution.
  35. 35. Hmm. (finished??) A bit more
  36. 36. 3. Tips for business use
  37. 37. Pricing Storage $0.02 per GB, per month Long Term Storage $0.01 per GB, per month Streaming Inserts $0.01 per 200 MB Queries $5 per TB (First 1 TB per month is free) subject to query pricing details. Loading data Free Copying data Free Exporting data Free Metadata operations Free List, get, patch, update and delete calls. It seems very cheap !!?
  38. 38. Pricing Storage $0.02 per GB, per month Long Term Storage $0.01 per GB, per month Streaming Inserts $0.01 per 200 MB Queries $5 per TB (First 1 TB per month is free) subject to query pricing details. Loading data Free Copying data Free Exporting data Free Metadata operations Free List, get, patch, update and delete calls. BigQuery is for BIG DATA
  39. 39. Column oriented (1) Sample case : database of Books ID (indexed) title (indexed) contents 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 1.2MB) 2 Cats are love Lorem ipsum dolor sit amet, consectetur (... 1.5MB) 3 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 0.8MB) select id, title from books where name = ‘The Cat’
  40. 40. Column oriented (2) ID (indexed) title (indexed) contents 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 1.2MB) 2 Cats are love Lorem ipsum dolor sit amet, consectetur (... 1.5MB) 3 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 0.8MB) select * from books where title = ‘The Cat’ @RDBMS index (name) hash data hash data hash data data in databaseIndexes scanned data
  41. 41. Column oriented (3) ID (indexed) title (indexed) contents 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 1.2MB) 2 Cats are love Lorem ipsum dolor sit amet, consectetur (... 1.5MB) 3 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 0.8MB) select * from books where title = ‘The Cat’ @BigQuery data in database scanned data
  42. 42. Column oriented (3) ID (indexed) title (indexed) contents 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 1.2MB) 2 Cats are love Lorem ipsum dolor sit amet, consectetur (... 1.5MB) 3 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 0.8MB) select * from books where title = ‘The Cat’ @BigQuery data in database scanned data Full-scan
 ANYTIME!!
  43. 43. Column oriented (4) ID (indexed) title (indexed) contents 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 1.2MB) 2 Cats are love Lorem ipsum dolor sit amet, consectetur (... 1.5MB) 3 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 0.8MB) select * from books where title = ‘The Cat’ @BigQuery data in database If your database is Tera-byte scale, $5 per query !!!!
  44. 44. Column oriented (5) ID (indexed) title (indexed) contents 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 1.2MB) 2 Cats are love Lorem ipsum dolor sit amet, consectetur (... 1.5MB) 3 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 0.8MB) select id, title from books where title = ‘The Cat’ @RDBMS index (name) hash data hash data hash data data in databaseIndexes scanned data
  45. 45. Column oriented (6) ID (indexed) title (indexed) contents 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 1.2MB) 2 Cats are love Lorem ipsum dolor sit amet, consectetur (... 1.5MB) 3 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 0.8MB) select id, title from books where title = ‘The Cat’ @BigQuery data in database scanned data
  46. 46. Column oriented (6) ID (indexed) title (indexed) contents 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 1.2MB) 2 Cats are love Lorem ipsum dolor sit amet, consectetur (... 1.5MB) 3 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 0.8MB) select id, title from books where title = ‘The Cat’ @BigQuery data in database scanned data Column Oriented
  47. 47. It's really dangerous! Please, Please set columns in queries.
  48. 48. Table division Sample case : database of Books select id, title from books where time in ‘2016/06/17’ : : : : ID (indexed) title (indexed) contents time (indexed) 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 1.2MB) 2016/01/01 00:00:00 2 Cats are love Lorem ipsum dolor sit amet, consectetur (... 1.5MB) 2016/01/01 00:01:23 353485397 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 0.8MB) 2016/06/17 00:01:46
  49. 49. Table division (1) index (time) hash data hash data hash data data in databaseIndexes scanned data : : : : ID (indexed) title (indexed) contents time (indexed) 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 2016/01/01 00:00:00 2 Cats are love Lorem ipsum dolor sit amet, consectetur (... 2016/01/01 00:01:23 353485397 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 0.8MB) 2016/06/17 00:01:46 select id, title from books where time in ‘2016/06/17’ @RDBMS
  50. 50. Table division (2) data in database scanned data : : : : ID (indexed) title (indexed) contents time (indexed) 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 1.2MB) 2016/01/01 00:00:00 2 Cats are love Lorem ipsum dolor sit amet, consectetur (... 1.5MB) 2016/01/01 00:01:23 353485397 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 0.8MB) 2016/06/17 00:01:46 select id, title from books where time in ‘2016/06/17’ @BigQuery Huge size
  51. 51. Table division (2) data in database scanned data : : : : ID (indexed) title (indexed) contents time (indexed) 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 1.2MB) 2016/01/01 00:00:00 2 Cats are love Lorem ipsum dolor sit amet, consectetur (... 1.5MB) 2016/01/01 00:01:23 353485397 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 0.8MB) 2016/06/17 00:01:46 select id, title from books where time in ‘2016/06/17’ @BigQuery Huge size
  52. 52. Table division (3) ID (indexed) title (indexed) contents time (indexed) 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 1.2MB) 2016/01/01 00:00:00 2 Cats are love Lorem ipsum dolor sit amet, consectetur (... 1.5MB) 2016/01/01 00:01:23 ID (indexed) title (indexed) contents time (indexed) 353485397 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 0.8MB) 2016/06/17 00:01:46 : Tables books_20160101 : books_20160617 Divide tables for each day.
  53. 53. Table division (4) ID (indexed) title (indexed) contents time (indexed) 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 1.2MB) 2016/01/01 00:00:00 2 Cats are love Lorem ipsum dolor sit amet, consectetur (... 1.5MB) 2016/01/01 00:01:23 ID (indexed) title (indexed) contents time (indexed) 353485397 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 0.8MB) 2016/06/17 00:01:46 : books_20160101 : books_20160617 select id, title from books where time in ‘2016/06/17’ @BigQuery
  54. 54. Table division (5) ID (indexed) title (indexed) contents time (indexed) 1 The Cat Lorem ipsum dolor sit amet, consectetur (... 1.2MB) 2016/01/01 00:00:00 books_20160101 :: ID (indexed) title (indexed) contents time (indexed) 353485397 The Great Catsby Lorem ipsum dolor sit amet, consectetur (... 0.8MB) 2016/06/16 00:01:46 books_20160616 select id, title from books where time in ‘2016/06/16 - 2016/06/17’ @BigQuery ID (indexed) title (indexed) contents time (indexed) 353485397 Littul Kittons Lorem ipsum dolor sit amet, consectetur (... 2016/06/17 00:01:46 books_20160617
  55. 55. Table division (6) select id, title from books where time in ‘2016/06/16 - 2016/06/17’ @BigQuery SELECT id, title FROM ( TABLE_DATE_RANGE(books_, TIMESTAMP(‘2016-06-16'), TIMESTAMP(‘2016-06-17') ) )
  56. 56. Table division (7) Other ways to divide tables. Table decorator 
 - https://cloud.google.com/bigquery/table-decorators “TABLE_QUERY” - https://cloud.google.com/bigquery/query-reference “Import from GCS is much faster than from local” 1. put data into GCS (Google Clould Storage ≒ S3 ??) 2. import the data from GCS. Other tips.
  57. 57. BigQuery is Fast Easy Cheap if it is used properly.
  58. 58. BigQuery is Fast Easy Cheap if it is used properly. Remember “--dry_run”
  59. 59. Thank you!

×