The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analytics Engineer

The modern data team for the
modern data stack:
dbt & the role of the analytics engineer

Welcome
Jeremy Cohen
Associate Product Manager
he/him
jeremy@fishtownanalytics.com
@jerco (community.getdbt.com)

The modern data team
▪ Custom ingestion
▪ Orchestration
▪ ML endpoints
▪ Platform, architecture,
tooling: inform build vs.
buy
▪ Provide lean,
transformed data
ready for analysis
▪ SWE practices to
analytics code
▪ Maintain data
documentation
Analytics EngineerData Engineer Data Analyst
▪ Deep insights &
forecasting
▪ Close partnership
with business users
▪ Build & guarantee
critical reporting

What is dbt?
A. A python program
B. The heart of the modern data
stack
C. An analytics engineer’s best
friend
D. A community of top-class data
professionals
E. All of the above

What is dbt, actually?
▪ Define, test, document, and reuse complex data transformation
logic—just by writing SQL (and a little bit of YAML).
▪ dbt infers a DAG of transformations and runs models in order.
▪ Auto-generated documentation site, built from the same code as
your transformations.
The power of a framework, not the limitations of a GUI.

Extending SQL with Jinja
▪ Loops
▪ Macros
▪ Packages
A pythonic templating engine to write DRYer code and leverage open source innovations.

The dbt community, by the numbers
▪ 2800+ companies running dbt in production across 12+ databases
▪ 48 open source packages of reusable macros and models
▪ 23k views: our opinionated best practices for dbt project design
▪ 7k data professionals at the top of their game in dbt Slack

dbt +
▪ Open source plugin
▪ pip install dbt-spark
▪ Write business logic in
SparkSQL
▪ Dynamically template repetitive
SQL with Jinja
▪ Connect to any Spark cluster +
dbt run

Analytics engineering meets Delta Lake
▪ Access all core dbt features when you materialize models as Delta
tables
▪ Use merge to build incremental models + snapshot slowly changing
dimensions
▪ optimize zorder with hooks, operations, macros...
The power of a data lake, the flexibility of a modern data warehouse, the intuition of a common
modeling framework.

Announcing: dbt Cloud + Databricks
▪ Hosted IDE
▪ Compile + run SQL in real time
▪ Straightforward git flow
▪ No installation hassle
▪ Configurable job scheduler
▪ Continuous integration
▪ Host data documentation
▪ Persist dbt artifacts
DeployDevelop
Now in closed beta

How to deploy dbt?
▪ SaaS: up & running in minutes
▪ Enterprise: Fishtown-managed VPC, client-managed VPC, airgapped
on-prem, …
▪ You! dbt, the Spark plugin, the documentation site: it’s all open
source and can be deployed using standard infrastructure.
Build, buy, or balance

Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analytics Engineer

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analytics Engineer

Similar to The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analytics Engineer (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analytics Engineer