Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
1
Derek Gorthy
Senior Software Development Engineer
Yuan Feng
Software Development Engineer
Empowering Zillow’s
Develope...
2
Who We Are
Zillow Offers Data Engineering Team
@ Zillow
Derek Gorthy
Senior Software Development
Engineer, Big Data
Yuan ...
3
Agenda
● How We Think About Self-Service ETL
● Core Components
● Self-Service ETL in Action at Zillow
○ Zetlas
○ Zagger
...
Zillow
About Zillow
● Reimagining real estate to make it
easier to unlock life’s next chapter
● Offer customers an on-demand
exper...
How We Think About
Self-Service ETL
Zagger Integrations
Zagger Pipeline Utilities Package
User Interaction Zagger Managed Service
Integrations
Execution
Zetla...
What Is Self-Service ETL?
User Interaction Pipeline
Configuration
File
?
How We Think About Self-Service ETL
User Interaction Pipeline
Interpret Pipeline
Metadata
Render
Configuration
File
Opinion...
Core Components
User Interaction
User Interaction Pipeline
Interpret Pipeline
Metadata
Render
Configuration
File
Opinionated Unopinionated
Interpret User Input
User Interaction Pipeline
Interpret Pipeline
Metadata
Render
Configuration
File
Opinionated Unopiniona...
Pipeline Metadata
User Interaction Pipeline
Interpret Pipeline
Metadata
Render
Configuration
File
Opinionated Unopinionated
Render Pipeline
User Interaction Pipeline
Interpret Pipeline
Metadata
Render
Configuration
File
Opinionated Unopinionated
Data Pipeline & Shared Integrations
User Interaction Pipeline
Interpret Pipeline
Metadata
Render
Configuration
File
Opinion...
Self-Service ETL in
Action at Zillow
Applied Self-Service ETL - Zetlas
Motivation Features Target Users
● Modernized and reliable
self-service tool to
automate...
Zetlas UX Design
Applied Self-Service ETL - Zagger
Motivation Features Target Users
● Provide a
developer-friendly
abstraction from ETL too...
Zagger Integrations
Zagger Pipeline Utilities Package
User Interaction Zagger Managed Service
Integrations
Execution
Zetla...
Next Steps and Takeaways
Development Timeline
2019 2020 2021
Pipeler shared
Spark processing
library development
Zetlas official
launch in Zillow
Zag...
Takeaways
● UI must be designed to meet the needs of its users
● Self-service ETL isn’t just for non-data engineers
● Modu...
More From Zillow
Democratizing Data Quality Through a
Centralized Platform
5/27 @ 3:15 PM PST
Scaling AutoML-Driven Anomal...
Questions?
Thank you!
https://www.zillow.com/careers/
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Empowering Zillow’s Developers with Self-Service ETL

Download to read offline

As the amount of data and the number of unique data sources within an organization grow, handling the volume of new pipeline requests becomes difficult. Not all new pipeline requests are created equal — some are for business-critical datasets, others are for routine data preparation, and others are for experimental transformations that allow data scientists to iterate quickly on their solutions.

To meet the growing demand for new data pipelines, Zillow created multiple self-service solutions that enable any team to build, maintain, and monitor their data pipelines. These tools abstract away the orchestration, deployment, and Apache Spark processing implementation from their respective users. In this talk, Zillow engineers discuss two internal platforms they created to address the specific needs of two distinct user groups: data analysts and data producers. Each platform addresses the use cases of its intended user, leverages internal services through its modular design, and empowers users to create their own ETL without having to worry about how the ETL is implemented.

Members of Zillow’s data engineering team discuss:

Why they created two separate user interfaces to meet the needs different user groups
What degree of abstraction from the orchestration, deployment, processing, and other ancillary tasks that chose for each user group
How they leveraged internal services and packages, including their Apache Spark package — Pipeler, to democratize the creation of high-quality, reliable pipelines within Zillow

Empowering Zillow’s Developers with Self-Service ETL

  1. 1. 1 1 Derek Gorthy Senior Software Development Engineer Yuan Feng Software Development Engineer Empowering Zillow’s Developers with Self-Service ETL
  2. 2. 2 Who We Are Zillow Offers Data Engineering Team @ Zillow Derek Gorthy Senior Software Development Engineer, Big Data Yuan Feng Software Development Engineer, Big Data
  3. 3. 3 Agenda ● How We Think About Self-Service ETL ● Core Components ● Self-Service ETL in Action at Zillow ○ Zetlas ○ Zagger ● Next Steps and Takeaways
  4. 4. Zillow
  5. 5. About Zillow ● Reimagining real estate to make it easier to unlock life’s next chapter ● Offer customers an on-demand experience for selling, buying, renting and financing with transparency and nearly seamless end-to-end service ● Most-visited real estate website in the United States * As of Q4-2020
  6. 6. How We Think About Self-Service ETL
  7. 7. Zagger Integrations Zagger Pipeline Utilities Package User Interaction Zagger Managed Service Integrations Execution Zetlas DQ Module API Parser 1 Parser N Airflow Renderer ... ... Kafka Renderer
  8. 8. What Is Self-Service ETL? User Interaction Pipeline Configuration File ?
  9. 9. How We Think About Self-Service ETL User Interaction Pipeline Interpret Pipeline Metadata Render Configuration File Opinionated Unopinionated
  10. 10. Core Components
  11. 11. User Interaction User Interaction Pipeline Interpret Pipeline Metadata Render Configuration File Opinionated Unopinionated
  12. 12. Interpret User Input User Interaction Pipeline Interpret Pipeline Metadata Render Configuration File Opinionated Unopinionated
  13. 13. Pipeline Metadata User Interaction Pipeline Interpret Pipeline Metadata Render Configuration File Opinionated Unopinionated
  14. 14. Render Pipeline User Interaction Pipeline Interpret Pipeline Metadata Render Configuration File Opinionated Unopinionated
  15. 15. Data Pipeline & Shared Integrations User Interaction Pipeline Interpret Pipeline Metadata Render Configuration File Opinionated Unopinionated
  16. 16. Self-Service ETL in Action at Zillow
  17. 17. Applied Self-Service ETL - Zetlas Motivation Features Target Users ● Modernized and reliable self-service tool to automate SQL based workflows ● No coding experience needed to create ETL workflows ● UI-driven ● Rapid prototyping and deployment ● Job monitoring/alerting ● Automated validation ● Integration with multiple internal services ● Scalable and expandable ● Data scientists ● Data analysts
  18. 18. Zetlas UX Design
  19. 19. Applied Self-Service ETL - Zagger Motivation Features Target Users ● Provide a developer-friendly abstraction from ETL tools ● Create a service that automates data engineering ancillary tasks ● Create common processing patterns for fast pipeline development ● Integrates with Terraform ● Exposes create/delete endpoints for other access patterns ● Allows for custom interpreter creation ● Integration with multiple internal services ● Data engineers ● Data producer teams
  20. 20. Zagger Integrations Zagger Pipeline Utilities Package User Interaction Zagger Managed Service Integrations Execution Zetlas DQ Module API Parser 1 Parser N Airflow Renderer ... ... Kafka Renderer
  21. 21. Next Steps and Takeaways
  22. 22. Development Timeline 2019 2020 2021 Pipeler shared Spark processing library development Zetlas official launch in Zillow Zagger Managed Service and Pipeline Utilities Package library User Growth for Zagger and Zetlas ZETL retirement Zetlas and Zagger backend unification
  23. 23. Takeaways ● UI must be designed to meet the needs of its users ● Self-service ETL isn’t just for non-data engineers ● Modular platform design allows for capabilities to be developed in piecemeal ● Abstraction from tool-specific implementation gives flexibility
  24. 24. More From Zillow Democratizing Data Quality Through a Centralized Platform 5/27 @ 3:15 PM PST Scaling AutoML-Driven Anomaly Detection With Luminaire 5/27 @ 5:00 PM PST
  25. 25. Questions? Thank you! https://www.zillow.com/careers/
  • aratob

    Jun. 30, 2021

As the amount of data and the number of unique data sources within an organization grow, handling the volume of new pipeline requests becomes difficult. Not all new pipeline requests are created equal — some are for business-critical datasets, others are for routine data preparation, and others are for experimental transformations that allow data scientists to iterate quickly on their solutions. To meet the growing demand for new data pipelines, Zillow created multiple self-service solutions that enable any team to build, maintain, and monitor their data pipelines. These tools abstract away the orchestration, deployment, and Apache Spark processing implementation from their respective users. In this talk, Zillow engineers discuss two internal platforms they created to address the specific needs of two distinct user groups: data analysts and data producers. Each platform addresses the use cases of its intended user, leverages internal services through its modular design, and empowers users to create their own ETL without having to worry about how the ETL is implemented. Members of Zillow’s data engineering team discuss: Why they created two separate user interfaces to meet the needs different user groups What degree of abstraction from the orchestration, deployment, processing, and other ancillary tasks that chose for each user group How they leveraged internal services and packages, including their Apache Spark package — Pipeler, to democratize the creation of high-quality, reliable pipelines within Zillow

Views

Total views

55

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

4

Shares

0

Comments

0

Likes

1

×