SlideShare a Scribd company logo
1 of 4
Download to read offline
Document Title GCP Data Ingestion – History data migration (One time load)
Version 1.0
Document Summary This Document guides you to migrate hive data to BQ.
Team GDIA – ENOP
Copy Hive data to BQ
Description:
Irrespective of source format, all the hive tables will be converted to an external hive table in ORC format
and ORC files will be copied from HDFS to GCS using DISTCP. BQ load will pick the files from GCS and
load data to BQ tables.
Approach:
Step 1:
Step 2:
Step 3:
Additional Artifacts:
Log file generation, mail alert in case of success/failure scenarios.
Pre-requisite:
1. Gcloud utility should be installed in HPC (putty). Please follow the below instruction:
export no_proxy="localhost, 127.0.0.1, .ford.com"
export https_proxy=http://internet.ford.com:83
export http_proxy=http://internet.ford.com:83
export HTTPS_PROXY=http://internet.ford.com:83
export HTTP_PROXY=http://internet.ford.com:83
curl https://sdk.cloud.google.com | bash
2. Hive database should be created in prior in appropriate path.
Input:
Config file with 7 values separated by commas (Hive target DB, Hive Source DB, Hive Table name, HDFS
Path, gcs path, dataset name, BQ table name)
Source Hive Table
(Any Format)
Interim Hive Table
(external - orc)
Distcp
istcp
GCS Bucket BigQuery
BQ
Load
Interim Hive Table
(external - orc)
GCS Bucket
Sample config file:
Script:
history load.txt
Script Execution:
3 parameters are passed as command line arguments to the shell script.
1. Input config file -> This is generated manually by the user. Content of the input files are
already mentioned in previous steps.
2. Email id -> Id to which the success/failure alert should be sent.
3. Json Key file -> RM has the access to generate vault key for each environment.
Modification to be done in script:
In Line #4 log path should be updated accordingly. (Instead of <path> proper unix directory path where
the log file should be saved has to be given)
Command to execute the script using putty:
sh onprem_hist.sh path/config.txt email_id path/gcp_key.json
Output:
Log file:
Mail Alert:
History Load
Status.msg
Note:
Ensure to drop the interim database and tables that were created as part of history load.

More Related Content

Similar to htogcp.docx

Painless Perl Ports with cpan2port
Painless Perl Ports with cpan2portPainless Perl Ports with cpan2port
Painless Perl Ports with cpan2port
Benny Siegert
 
Apache web server installation/configuration, Virtual Hosting
Apache web server installation/configuration, Virtual HostingApache web server installation/configuration, Virtual Hosting
Apache web server installation/configuration, Virtual Hosting
webhostingguy
 

Similar to htogcp.docx (20)

k8s practice 2023.pptx
k8s practice 2023.pptxk8s practice 2023.pptx
k8s practice 2023.pptx
 
Prosit google-cloud
Prosit google-cloudProsit google-cloud
Prosit google-cloud
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Demystifying Docker for Data Scientists by Shaheen
Demystifying Docker for Data Scientists by ShaheenDemystifying Docker for Data Scientists by Shaheen
Demystifying Docker for Data Scientists by Shaheen
 
containerD
containerDcontainerD
containerD
 
EC CUBE 3.0.x installation guide
EC CUBE 3.0.x installation guideEC CUBE 3.0.x installation guide
EC CUBE 3.0.x installation guide
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
 
Howto Pxeboot
Howto PxebootHowto Pxeboot
Howto Pxeboot
 
OpenShift4 Installation by UPI on kvm
OpenShift4 Installation by UPI on kvmOpenShift4 Installation by UPI on kvm
OpenShift4 Installation by UPI on kvm
 
Fargate 를 이용한 ECS with VPC 1부
Fargate 를 이용한 ECS with VPC 1부Fargate 를 이용한 ECS with VPC 1부
Fargate 를 이용한 ECS with VPC 1부
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
 
Painless Perl Ports with cpan2port
Painless Perl Ports with cpan2portPainless Perl Ports with cpan2port
Painless Perl Ports with cpan2port
 
How to add system calls to OS/161
How to add system calls to OS/161How to add system calls to OS/161
How to add system calls to OS/161
 
Devopstore
DevopstoreDevopstore
Devopstore
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
Apache web server installation/configuration, Virtual Hosting
Apache web server installation/configuration, Virtual HostingApache web server installation/configuration, Virtual Hosting
Apache web server installation/configuration, Virtual Hosting
 
He Pi Xii2003
He Pi Xii2003He Pi Xii2003
He Pi Xii2003
 
Enabling Googley microservices with HTTP/2 and gRPC.
Enabling Googley microservices with HTTP/2 and gRPC.Enabling Googley microservices with HTTP/2 and gRPC.
Enabling Googley microservices with HTTP/2 and gRPC.
 
Setup oracle golden gate 11g replication
Setup oracle golden gate 11g replicationSetup oracle golden gate 11g replication
Setup oracle golden gate 11g replication
 

Recently uploaded

一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
bams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptxbams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptx
JocylDuran
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptxClient Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Stephen266013
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 

Recently uploaded (20)

一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
bams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptxbams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptx
 
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptxClient Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
 

htogcp.docx

  • 1. Document Title GCP Data Ingestion – History data migration (One time load) Version 1.0 Document Summary This Document guides you to migrate hive data to BQ. Team GDIA – ENOP
  • 2. Copy Hive data to BQ Description: Irrespective of source format, all the hive tables will be converted to an external hive table in ORC format and ORC files will be copied from HDFS to GCS using DISTCP. BQ load will pick the files from GCS and load data to BQ tables. Approach: Step 1: Step 2: Step 3: Additional Artifacts: Log file generation, mail alert in case of success/failure scenarios. Pre-requisite: 1. Gcloud utility should be installed in HPC (putty). Please follow the below instruction: export no_proxy="localhost, 127.0.0.1, .ford.com" export https_proxy=http://internet.ford.com:83 export http_proxy=http://internet.ford.com:83 export HTTPS_PROXY=http://internet.ford.com:83 export HTTP_PROXY=http://internet.ford.com:83 curl https://sdk.cloud.google.com | bash 2. Hive database should be created in prior in appropriate path. Input: Config file with 7 values separated by commas (Hive target DB, Hive Source DB, Hive Table name, HDFS Path, gcs path, dataset name, BQ table name) Source Hive Table (Any Format) Interim Hive Table (external - orc) Distcp istcp GCS Bucket BigQuery BQ Load Interim Hive Table (external - orc) GCS Bucket
  • 3. Sample config file: Script: history load.txt Script Execution: 3 parameters are passed as command line arguments to the shell script. 1. Input config file -> This is generated manually by the user. Content of the input files are already mentioned in previous steps. 2. Email id -> Id to which the success/failure alert should be sent. 3. Json Key file -> RM has the access to generate vault key for each environment. Modification to be done in script: In Line #4 log path should be updated accordingly. (Instead of <path> proper unix directory path where the log file should be saved has to be given) Command to execute the script using putty: sh onprem_hist.sh path/config.txt email_id path/gcp_key.json
  • 4. Output: Log file: Mail Alert: History Load Status.msg Note: Ensure to drop the interim database and tables that were created as part of history load.