Google Certified Professional - Data Engineer
Job Role Description
A Google Certified Professional - Data Engineer enables data-driven decision making by collecting,
transforming, and visualizing data. The data engineer should be able to design, build, maintain, and
troubleshoot data processing systems with a particular emphasis on the security, reliability,
fault-tolerance, scalability, fidelity, and efficiency of such systems. The data engineer should also be able
to analyze data to gain insight into business outcomes, build statistical models to support
decision-making, and create machine learning models to automate and simplify key business processes.
Certification Exam Guide
Section 1: Designing data processing systems
1.1 Designing flexible data representations. Considerations include:
● future advances in data technology
● changes to business requirements
● awareness of current state and how to migrate the design to a future state
● data modeling
● tradeoffs
● distributed systems
● schema design
1.2 Designing data pipelines. Considerations include:
● future advances in data technology
● changes to business requirements
● awareness of current state and how to migrate the design to a future state
● data modeling
● tradeoffs
● system availability
● distributed systems
● schema design
● common sources of error (eg. removing selection bias)
1.3 Designing data processing infrastructure. Considerations include:
● future advances in data technology
● changes to business requirements
● awareness of current state, how to migrate the design to the future state
● data modeling
● tradeoffs
● system availability
● distributed systems
● schema design
● capacity planning
● different types of architectures: message brokers, message queues, middleware,
service-oriented
Section 2: Building and maintaining data structures and databases
2.1 Building and maintaining flexible data representations
2.2 Building and maintaining pipelines. Considerations include:
● data cleansing
● batch and streaming
● transformation
● acquire and import data
● testing and quality control
● connecting to new data sources
2.3 Building and maintaining processing infrastructure. Considerations include:
● provisioning resources
● monitoring pipelines
● adjusting pipelines
● testing and quality control
Section 3: Analyzing data and enabling machine learning
3.1 Analyzing data. Considerations include:
● data profiling
● data correlation
● patterns and insights
● anomaly detection
● statistical models
● machine learning
● assessing the statistical relevance of conclusions
3.2 Transforming data to enable machine learning and pattern discovery. Considerations
include:
● repeatability
● generalization
● distributed computing
● improved model accuracy
3.3 Identifying or building data visualization and reporting tools. Considerations include:
● automation
● decision support
● data summarization
● enabling patterns and insights
Section 4: Modeling business processes for analysis and optimization
4.1 Mapping business requirements to data representations. Considerations include:
● working with business users
● gathering business requirements
4.2 Optimizing data representations, data infrastructure performance and cost.
Considerations include:
● resizing and scaling resources
● data cleansing, distributed systems
● high performance algorithms
● common sources of error (eg. removing selection bias)
Section 5: Ensuring reliability
5.1 Performing quality control. Considerations include:
● verification
● building and running test suites
● pipeline monitoring
5.2 Assessing, troubleshooting, and improving data representations and data processing
infrastructure.
5.3 Recovering data. Considerations include:
● planning (e.g. fault-tolerance)
● executing (e.g., rerunning failed jobs, performing retrospective re-analysis)
● stress testing data recovery plans and processes
Section 6: Visualizing data and advocating policy
6.1 Building (or selecting) data visualization and reporting tools. Considerations include:
● automation
● decision support
● data summarization, (e.g, translation up the chain, fidelity, trackability, integrity)
6.2 Advocating policies and publishing data and reports.
Section 7: ​ ​Designing for security and compliance
7.1 Designing secure data infrastructure and processes. Considerations include:
● Identify and Access Management (IAM)
● data security
● penetration testing
● Separation of Duties (SoD)
● security control
7.2 Designing for legal compliance. Considerations include:
● Health Insurance Portability and Accountability Act (HIPAA), Children’s Online
Privacy Protection Act (COPPA), etc.
● audits

Google certified-professional-data-engineer

  • 1.
    Google Certified Professional- Data Engineer Job Role Description A Google Certified Professional - Data Engineer enables data-driven decision making by collecting, transforming, and visualizing data. The data engineer should be able to design, build, maintain, and troubleshoot data processing systems with a particular emphasis on the security, reliability, fault-tolerance, scalability, fidelity, and efficiency of such systems. The data engineer should also be able to analyze data to gain insight into business outcomes, build statistical models to support decision-making, and create machine learning models to automate and simplify key business processes. Certification Exam Guide Section 1: Designing data processing systems 1.1 Designing flexible data representations. Considerations include: ● future advances in data technology ● changes to business requirements ● awareness of current state and how to migrate the design to a future state ● data modeling ● tradeoffs ● distributed systems ● schema design 1.2 Designing data pipelines. Considerations include: ● future advances in data technology ● changes to business requirements ● awareness of current state and how to migrate the design to a future state ● data modeling ● tradeoffs ● system availability ● distributed systems ● schema design ● common sources of error (eg. removing selection bias) 1.3 Designing data processing infrastructure. Considerations include: ● future advances in data technology ● changes to business requirements ● awareness of current state, how to migrate the design to the future state ● data modeling ● tradeoffs ● system availability ● distributed systems ● schema design ● capacity planning
  • 2.
    ● different typesof architectures: message brokers, message queues, middleware, service-oriented Section 2: Building and maintaining data structures and databases 2.1 Building and maintaining flexible data representations 2.2 Building and maintaining pipelines. Considerations include: ● data cleansing ● batch and streaming ● transformation ● acquire and import data ● testing and quality control ● connecting to new data sources 2.3 Building and maintaining processing infrastructure. Considerations include: ● provisioning resources ● monitoring pipelines ● adjusting pipelines ● testing and quality control Section 3: Analyzing data and enabling machine learning 3.1 Analyzing data. Considerations include: ● data profiling ● data correlation ● patterns and insights ● anomaly detection ● statistical models ● machine learning ● assessing the statistical relevance of conclusions 3.2 Transforming data to enable machine learning and pattern discovery. Considerations include: ● repeatability ● generalization ● distributed computing ● improved model accuracy 3.3 Identifying or building data visualization and reporting tools. Considerations include: ● automation ● decision support ● data summarization ● enabling patterns and insights
  • 3.
    Section 4: Modelingbusiness processes for analysis and optimization 4.1 Mapping business requirements to data representations. Considerations include: ● working with business users ● gathering business requirements 4.2 Optimizing data representations, data infrastructure performance and cost. Considerations include: ● resizing and scaling resources ● data cleansing, distributed systems ● high performance algorithms ● common sources of error (eg. removing selection bias) Section 5: Ensuring reliability 5.1 Performing quality control. Considerations include: ● verification ● building and running test suites ● pipeline monitoring 5.2 Assessing, troubleshooting, and improving data representations and data processing infrastructure. 5.3 Recovering data. Considerations include: ● planning (e.g. fault-tolerance) ● executing (e.g., rerunning failed jobs, performing retrospective re-analysis) ● stress testing data recovery plans and processes Section 6: Visualizing data and advocating policy 6.1 Building (or selecting) data visualization and reporting tools. Considerations include: ● automation ● decision support ● data summarization, (e.g, translation up the chain, fidelity, trackability, integrity) 6.2 Advocating policies and publishing data and reports. Section 7: ​ ​Designing for security and compliance 7.1 Designing secure data infrastructure and processes. Considerations include: ● Identify and Access Management (IAM) ● data security ● penetration testing ● Separation of Duties (SoD) ● security control 7.2 Designing for legal compliance. Considerations include:
  • 4.
    ● Health InsurancePortability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), etc. ● audits