Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Hadoop 
BigQuery Connector 
Simon Su & Sunny Hu @ MiCloud
I am Simon Su 
var simon = {}; 
simon.aboutme = 'http://about.me/peihsinsu'; 
simon.nodejs = ‘http://opennodes.arecord.us'...
I am Sunny Hu 
var sunny = {}; 
sunny.aboutme = 'https://plus.google.com/u/0/+sunnyHU/posts'; 
sunny.email = sunnyhu@mitac...
● We are 蘇 胡 二人組 ...
We are MiCloud 
● 2011/11 MiCloud Launch 
● 2013/2 Google Apps Partner 
● 2013/9 Google Cloud Partner 
● 2014/4 Google Clo...
緣起 
● Dremel (BigQuery) 能提供大量及穩定服務 
● 2013, 平均每日服務量: 5,922,000,000 人次 
● 2012, 平均每日服務量: 5,134,000,000 人次 
● 2011, 平均每日服務量:...
What is the components of Hadoop... 
Strategy 
MapReduce 
HDFS 
Your idea for filtering information from the 
given datase...
You have better choice in Cloud... 
Strategy 
MapReduce 
HDFS 
Nothing can replace a good idea…, but fast... 
Cloud machin...
● The fast way run hadoop - docker
Google Provide 
Resources
● GCE Hadoop Utility
● GCE Cluster Tool - bdutil
Before Demo… Prepare 
1. Install google_cloud_sdk 
2. Install bdutil
google cloud sdk 
curl https://sdk.cloud.google.com | bash
● Auth the gcloud utility
● Setup default project 
● Test configuration….
Using bdutil... 
https://developers.google.com/hadoop/setting-up-a-hadoop-cluster
bdutil scopes 
● Design for fast create hadoop cluster 
● Quick run a hadoop task 
● Quick integrate google’s resources 
●...
Demo start first….
● Config your bdutil env.
● bdutil deploy -e bigquery_env.sh
● Checking the result...
● The Administration console
TeraSort 
https://www.mapr.com/fr/company/press/mapr-and-google-compute-engine-set-new-world-record-hadoop-terasort
You can win the game, too... 
…. (skip)
BigQuery Connector 
https://developers.google.com/hadoop/running-with-bigquery-connector
hadoop-w-0 hadoop-m hadoop-w-1
Demo start first….
Run a BigQuery Connector job...
Workflow... 
1. Dump sample data from [publicdata:samples.shakespeare] 
2. MapReduce to count the word display 
3. Update ...
Look into source code... 
● BigQueryInputFormat class 
● Input parameters 
● Mapper 
● BigQueryOutputFormat class 
● Outpu...
BigQueryInputFormat 
● Using a user-specified query to select the appropriate 
BigQuery objects. 
● Splitting the results ...
Input parameters 
● Project Id : GCP project id , eg. hadoop-conf-2014 
● Input Table Id :[optional projectId]:[datasetId]...
BigqueryOutputFormat Class 
● Provides Hadoop with the ability to write JsonObject 
values directly into a BigQuery table ...
Output parameters 
● Project Id : GCP project id ,eg. hadoop-conf-2014 
● Output Table Id :[optional projectId]:[datasetId...
bdutil house keeping... 
https://developers.google.com/hadoop/setting-up-a-hadoop-cluster
● GDamee ovleer - Dteelet e tthhe haedo ohp claustedr oop cluster
● Check project….
You cost in this lab... 
VM (n1-standard-1) * machines * 
hours 
$0.070 USD/Hour 24 1 
* *
Today’s Demo 
Using Docker...
● Using google optimized docker container 
localhost:~$ gcloud compute instances create simon-docker  
> --image https://w...
Other connectors 
BigQuery connector for Hadoop 
$ ./bdutil deploy -e bigquery_env.sh 
Datastore connector for Hadoop 
$ ....
http://goo.gl/PbHdDc
http://micloud.tw
http://jsdc-tw.kktix.cc/events/jsdc2014
Hadoop Conf 2014 - Hadoop BigQuery Connector
Hadoop Conf 2014 - Hadoop BigQuery Connector
Hadoop Conf 2014 - Hadoop BigQuery Connector
Hadoop Conf 2014 - Hadoop BigQuery Connector
Hadoop Conf 2014 - Hadoop BigQuery Connector
Hadoop Conf 2014 - Hadoop BigQuery Connector
Hadoop Conf 2014 - Hadoop BigQuery Connector
Hadoop Conf 2014 - Hadoop BigQuery Connector
Hadoop Conf 2014 - Hadoop BigQuery Connector
Hadoop Conf 2014 - Hadoop BigQuery Connector
Upcoming SlideShare
Loading in …5
×

Hadoop Conf 2014 - Hadoop BigQuery Connector

1,709 views

Published on

Hadoop Conference Taiwan 2014 Presentation.

Published in: Technology
  • http://dbmanagement.info/Tutorials/Hadoop.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Hadoop Conf 2014 - Hadoop BigQuery Connector

  1. 1. Hadoop BigQuery Connector Simon Su & Sunny Hu @ MiCloud
  2. 2. I am Simon Su var simon = {}; simon.aboutme = 'http://about.me/peihsinsu'; simon.nodejs = ‘http://opennodes.arecord.us'; simon.googleshare = 'http://gappsnews.blogspot.tw' simon.nodejsblog = ‘http://nodejs-in-example.blogspot.tw'; simon.blog = ‘http://peihsinsu.blogspot.com'; simon.slideshare = ‘http://slideshare.net/peihsinsu/'; simon.email = ‘simonsu.mail@gmail.com’; simon.say(‘Good luck to everybody!');
  3. 3. I am Sunny Hu var sunny = {}; sunny.aboutme = 'https://plus.google.com/u/0/+sunnyHU/posts'; sunny.email = sunnyhu@mitac.com.tw’; sunny.language =[‘Java’,’.NET’,’NodeJS’,’SQL’ ] sunny.skill = [ ‘Project management’,’System Analysis’, ’System design’,’Car ho lan’] sunny.say(‘寫code太苦悶,心情要sunny');
  4. 4. ● We are 蘇 胡 二人組 ...
  5. 5. We are MiCloud ● 2011/11 MiCloud Launch ● 2013/2 Google Apps Partner ● 2013/9 Google Cloud Partner ● 2014/4 Google Cloud Launch
  6. 6. 緣起 ● Dremel (BigQuery) 能提供大量及穩定服務 ● 2013, 平均每日服務量: 5,922,000,000 人次 ● 2012, 平均每日服務量: 5,134,000,000 人次 ● 2011, 平均每日服務量: 4,717,000,000 人次 ● 2010, 平均每日服務量: 3,627,000,000 人次 ● 2009, 平均每日服務量: 2,610,000,000 人次 ● 2008, 平均每日服務量: 1,745,000,000 人次
  7. 7. What is the components of Hadoop... Strategy MapReduce HDFS Your idea for filtering information from the given datasets Mass computing power to parallel load and process the requirements Persistence storage for parallel access, better with good performance...
  8. 8. You have better choice in Cloud... Strategy MapReduce HDFS Nothing can replace a good idea…, but fast... Cloud machines with unlimited resources, better with lower and scalable pricing... Object storage services, like: Google Cloud Storage, AWS S3...
  9. 9. ● The fast way run hadoop - docker
  10. 10. Google Provide Resources
  11. 11. ● GCE Hadoop Utility
  12. 12. ● GCE Cluster Tool - bdutil
  13. 13. Before Demo… Prepare 1. Install google_cloud_sdk 2. Install bdutil
  14. 14. google cloud sdk curl https://sdk.cloud.google.com | bash
  15. 15. ● Auth the gcloud utility
  16. 16. ● Setup default project ● Test configuration….
  17. 17. Using bdutil... https://developers.google.com/hadoop/setting-up-a-hadoop-cluster
  18. 18. bdutil scopes ● Design for fast create hadoop cluster ● Quick run a hadoop task ● Quick integrate google’s resources ● Quick clear finished resources
  19. 19. Demo start first….
  20. 20. ● Config your bdutil env.
  21. 21. ● bdutil deploy -e bigquery_env.sh
  22. 22. ● Checking the result...
  23. 23. ● The Administration console
  24. 24. TeraSort https://www.mapr.com/fr/company/press/mapr-and-google-compute-engine-set-new-world-record-hadoop-terasort
  25. 25. You can win the game, too... …. (skip)
  26. 26. BigQuery Connector https://developers.google.com/hadoop/running-with-bigquery-connector
  27. 27. hadoop-w-0 hadoop-m hadoop-w-1
  28. 28. Demo start first….
  29. 29. Run a BigQuery Connector job...
  30. 30. Workflow... 1. Dump sample data from [publicdata:samples.shakespeare] 2. MapReduce to count the word display 3. Update result to BigQuery specific table
  31. 31. Look into source code... ● BigQueryInputFormat class ● Input parameters ● Mapper ● BigQueryOutputFormat class ● Output parameters ● Reducer
  32. 32. BigQueryInputFormat ● Using a user-specified query to select the appropriate BigQuery objects. ● Splitting the results of the query evenly among the Hadoop nodes. ● Parsing the splits into java objects to pass to the mapper
  33. 33. Input parameters ● Project Id : GCP project id , eg. hadoop-conf-2014 ● Input Table Id :[optional projectId]:[datasetId].[table id]
  34. 34. BigqueryOutputFormat Class ● Provides Hadoop with the ability to write JsonObject values directly into a BigQuery table ● An extension of the Hadoop OutputFormat class
  35. 35. Output parameters ● Project Id : GCP project id ,eg. hadoop-conf-2014 ● Output Table Id :[optional projectId]:[datasetId].[table id] ● Output Table Schema :[{'name': 'Name','type': 'STRING'}, {'name': 'Number','type': 'INTEGER'}]
  36. 36. bdutil house keeping... https://developers.google.com/hadoop/setting-up-a-hadoop-cluster
  37. 37. ● GDamee ovleer - Dteelet e tthhe haedo ohp claustedr oop cluster
  38. 38. ● Check project….
  39. 39. You cost in this lab... VM (n1-standard-1) * machines * hours $0.070 USD/Hour 24 1 * *
  40. 40. Today’s Demo Using Docker...
  41. 41. ● Using google optimized docker container localhost:~$ gcloud compute instances create simon-docker > --image https://www.googleapis.com/compute/v1/projects/google-containers/global/images/container-vm-v20140522 > --zone asia-east1-a > --machine-type f1-micro localhost:~$ gcloud compute ssh simon-docker simonsu@simon-docker:~$ sudo docker search bdutil simonsu@simon-docker:~$ docker run -it peihsinsu/bdutil bash
  42. 42. Other connectors BigQuery connector for Hadoop $ ./bdutil deploy -e bigquery_env.sh Datastore connector for Hadoop $ ./bdutil deploy -e datastore_env.sh To use both BQ & Datastore $ ./bdutil deploy -e datastore_env.sh,bigquery_env.sh
  43. 43. http://goo.gl/PbHdDc
  44. 44. http://micloud.tw
  45. 45. http://jsdc-tw.kktix.cc/events/jsdc2014

×