Data Engineering Roles
Adam Doyle
12/1/2021
Data Store Engineer
• Store Data, Retrieve Data, Optimize Data
• SQL (all flavors)
• Data Warehouse
• Data Lake
ETL Engineer
• Retrieve data from remote sources and move the data into a data
store.
• Data Enrichment
• Tool-based ETL products
• Programmatic ETL development
Stream Engineer
• Retrieve data from streaming data sources
• Handle multi-source and late-arriving data
• Stream data sources (Kafka, RabbitMQ)
• Programmatic processing (Spark)
Data Quality Engineer
• Profile and check for outliers
• Handle data quality issues
• Data Quality Tools (Informatica DQ, Great Expectations)
• Data Analysis/Profiling (SQL)
• Programmatic Adjustments
Visualization Engineer
• Develop internal data models within data visualization tools
• Create dashboards
• Data Visualization tools (Tableau, Power BI)
• Data analysis (SQL)
Deployment Engineer
• Deploys processes to production
• DevOps, CI/CD (Ansible/Terraform)
• Source Control (Git)
• Data deployment (Liquibase)
Operations Engineer
• Monitor data applications
• Troubleshooting production issues
• Data Analysis (SQL)
• Root Cause Analysis (Splunk)
Production Engineer
• Ensure that application code is ready to go to production
• Test Harness (SoapUI)
• Programming languages
• Understanding of Machine Learning processes
Cluster Engineer
• Work with clustered hardware and software to ensure deployment
and scalability.
• Cluster software (Hadoop, Kubernetes)
• Log Monitoring (Splunk)
Cloud Engineer
• Implement solutions in the cloud with both cloud-native technology
and conversions of on-premise solutions
• Cloud Platforms (Azure, AWS, GCP)
• Infrastructure as Code (Terraform)
Machine Learning Engineer
• Adapt Machine Learning Models to be deployed in production with
an emphasis on performance and scalability
• Machine Learning Platforms (Spark)
• Programming language (Python, Scala)
• Performance tuning
Feature Engineer
• Create informational features to be used in data science models – at
scale, at speed
• Extract information form data
• Aggredate data into information
• Apply business rules
• Data analysis (SQL)
• Programming language (Python, Scala)
Team Size Number of Roles

Data Engineering Roles

  • 1.
  • 4.
    Data Store Engineer •Store Data, Retrieve Data, Optimize Data • SQL (all flavors) • Data Warehouse • Data Lake
  • 5.
    ETL Engineer • Retrievedata from remote sources and move the data into a data store. • Data Enrichment • Tool-based ETL products • Programmatic ETL development
  • 6.
    Stream Engineer • Retrievedata from streaming data sources • Handle multi-source and late-arriving data • Stream data sources (Kafka, RabbitMQ) • Programmatic processing (Spark)
  • 7.
    Data Quality Engineer •Profile and check for outliers • Handle data quality issues • Data Quality Tools (Informatica DQ, Great Expectations) • Data Analysis/Profiling (SQL) • Programmatic Adjustments
  • 8.
    Visualization Engineer • Developinternal data models within data visualization tools • Create dashboards • Data Visualization tools (Tableau, Power BI) • Data analysis (SQL)
  • 9.
    Deployment Engineer • Deploysprocesses to production • DevOps, CI/CD (Ansible/Terraform) • Source Control (Git) • Data deployment (Liquibase)
  • 10.
    Operations Engineer • Monitordata applications • Troubleshooting production issues • Data Analysis (SQL) • Root Cause Analysis (Splunk)
  • 11.
    Production Engineer • Ensurethat application code is ready to go to production • Test Harness (SoapUI) • Programming languages • Understanding of Machine Learning processes
  • 12.
    Cluster Engineer • Workwith clustered hardware and software to ensure deployment and scalability. • Cluster software (Hadoop, Kubernetes) • Log Monitoring (Splunk)
  • 13.
    Cloud Engineer • Implementsolutions in the cloud with both cloud-native technology and conversions of on-premise solutions • Cloud Platforms (Azure, AWS, GCP) • Infrastructure as Code (Terraform)
  • 14.
    Machine Learning Engineer •Adapt Machine Learning Models to be deployed in production with an emphasis on performance and scalability • Machine Learning Platforms (Spark) • Programming language (Python, Scala) • Performance tuning
  • 15.
    Feature Engineer • Createinformational features to be used in data science models – at scale, at speed • Extract information form data • Aggredate data into information • Apply business rules • Data analysis (SQL) • Programming language (Python, Scala)
  • 16.