Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Google certified-professional-data-engineer

388 views

Published on

Google Certified Professional Data Engineer - Certification Exam Guide

Published in: Technology
  • Be the first to comment

Google certified-professional-data-engineer

  1. 1. Google Certified Professional - Data Engineer Job Role Description A Google Certified Professional - Data Engineer enables data-driven decision making by collecting, transforming, and visualizing data. The data engineer should be able to design, build, maintain, and troubleshoot data processing systems with a particular emphasis on the security, reliability, fault-tolerance, scalability, fidelity, and efficiency of such systems. The data engineer should also be able to analyze data to gain insight into business outcomes, build statistical models to support decision-making, and create machine learning models to automate and simplify key business processes. Certification Exam Guide Section 1: Designing data processing systems 1.1 Designing flexible data representations. Considerations include: ● future advances in data technology ● changes to business requirements ● awareness of current state and how to migrate the design to a future state ● data modeling ● tradeoffs ● distributed systems ● schema design 1.2 Designing data pipelines. Considerations include: ● future advances in data technology ● changes to business requirements ● awareness of current state and how to migrate the design to a future state ● data modeling ● tradeoffs ● system availability ● distributed systems ● schema design ● common sources of error (eg. removing selection bias) 1.3 Designing data processing infrastructure. Considerations include: ● future advances in data technology ● changes to business requirements ● awareness of current state, how to migrate the design to the future state ● data modeling ● tradeoffs ● system availability ● distributed systems ● schema design ● capacity planning
  2. 2. ● different types of architectures: message brokers, message queues, middleware, service-oriented Section 2: Building and maintaining data structures and databases 2.1 Building and maintaining flexible data representations 2.2 Building and maintaining pipelines. Considerations include: ● data cleansing ● batch and streaming ● transformation ● acquire and import data ● testing and quality control ● connecting to new data sources 2.3 Building and maintaining processing infrastructure. Considerations include: ● provisioning resources ● monitoring pipelines ● adjusting pipelines ● testing and quality control Section 3: Analyzing data and enabling machine learning 3.1 Analyzing data. Considerations include: ● data profiling ● data correlation ● patterns and insights ● anomaly detection ● statistical models ● machine learning ● assessing the statistical relevance of conclusions 3.2 Transforming data to enable machine learning and pattern discovery. Considerations include: ● repeatability ● generalization ● distributed computing ● improved model accuracy 3.3 Identifying or building data visualization and reporting tools. Considerations include: ● automation ● decision support ● data summarization ● enabling patterns and insights
  3. 3. Section 4: Modeling business processes for analysis and optimization 4.1 Mapping business requirements to data representations. Considerations include: ● working with business users ● gathering business requirements 4.2 Optimizing data representations, data infrastructure performance and cost. Considerations include: ● resizing and scaling resources ● data cleansing, distributed systems ● high performance algorithms ● common sources of error (eg. removing selection bias) Section 5: Ensuring reliability 5.1 Performing quality control. Considerations include: ● verification ● building and running test suites ● pipeline monitoring 5.2 Assessing, troubleshooting, and improving data representations and data processing infrastructure. 5.3 Recovering data. Considerations include: ● planning (e.g. fault-tolerance) ● executing (e.g., rerunning failed jobs, performing retrospective re-analysis) ● stress testing data recovery plans and processes Section 6: Visualizing data and advocating policy 6.1 Building (or selecting) data visualization and reporting tools. Considerations include: ● automation ● decision support ● data summarization, (e.g, translation up the chain, fidelity, trackability, integrity) 6.2 Advocating policies and publishing data and reports. Section 7: ​ ​Designing for security and compliance 7.1 Designing secure data infrastructure and processes. Considerations include: ● Identify and Access Management (IAM) ● data security ● penetration testing ● Separation of Duties (SoD) ● security control 7.2 Designing for legal compliance. Considerations include:
  4. 4. ● Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), etc. ● audits

×