DBT Introduction
DBT 介绍
软件开发业务线
S/W-Dev
准备工作
环境
python3 -m venv dbt-env
source dbt-env/bin/activate # activate the environment for Mac and Linux
alias env_dbt='source <PATH_TO_VIRTUAL_ENV_CONFIG>/bin/activate'
安装adapter
dbt-mysql:
dbt-postgres: pip install dbt-postgres
This will install dbt-core and dbt-postgres only:
S/W-Dev
2
2
(dbt-mysql-py3.9) ➜ jaffle_shop git:(main) ✗ dbt --version
Core:
- installed: 1.7.2
- latest: 1.7.2 - Up to date!
Plugins:
- mysql5: 1.7.0 - Could not determine latest version
- mariadb: 1.7.0 - Could not determine latest version
- mysql: 1.7.0 - Ahead of latest version!
S/W-Dev
3
3
常用命令
dbt debug
dbt run — Runs the models you defined in your project
dbt build — Builds and tests your selected resources such as models, seeds,
snapshots, and tests
dbt test — Executes the tests you defined for your project
dbt show
dbt list
dbt deps
S/W-Dev
4
4
1. dbt init
新建一个dbt项目
三个配置文件
dbt_project.yml
packages.yml
profiles.yml
S/W-Dev
5
5
Config file
profile.yml 配置目标数据仓库
jaffle_shop:
target: dev
outputs:
dev:
# threads: 2
type: mysql
server: localhost
port: 3306
database: test
schema: test
username: root
password: '123456'
S/W-Dev
6
6
dbt_project.yml 配置项目目录等
name: jaffle_shop
config-version: 2
...
models:
jaffle_shop: # this matches the `name:`` config
+materialized: view # this applies to all models in the current project
marts:
+materialized: table # this applies to all models in the `marts/` directory
marketing:
+schema: marketing # this applies to all models in the `marts/marketing/`` directory
{{ config(
materialized="view",
schema="marketing"
) }}
with customer_orders as ...
S/W-Dev
7
7
packages file
packages.yml 配置项目依赖插件
packages:
- package: dbt-labs/codegen
version: 0.12.1
dbt deps #安装依赖
dbt --version
S/W-Dev
8
8
model dependence
with customers as (
select * from {{ ref('stg_customers') }}
),
orders as (
select * from {{ ref('stg_orders') }}
),
...
S/W-Dev
9
9
2. dbt debug
检查数据链接配置和环境是否正确
S/W-Dev
10
10
3. dbt seed
导入csv种子文件
通常是不需要的,dbt运行在目标数据库中
S/W-Dev
11
11
(dbt-mysql-py3.9) ➜ jaffle_shop git:(main) dbt seed
00:57:11 Running with dbt=1.7.2
00:57:11 Registered adapter: mysql=1.7.0
00:57:11 Found 5 models, 3 seeds, 20 tests, 0 sources, 0 exposures, 0 metrics, 373 macros, 0 groups, 0 semantic models
00:57:11
00:57:11 Concurrency: 1 threads (target='dev')
00:57:11
00:57:11 1 of 3 START seed file test.raw_customers ...................................... [RUN]
00:57:11 1 of 3 OK loaded seed file test.raw_customers .................................. [INSERT 100 in 0.23s]
00:57:11 2 of 3 START seed file test.raw_orders ......................................... [RUN]
00:57:11 2 of 3 OK loaded seed file test.raw_orders ..................................... [INSERT 99 in 0.12s]
00:57:11 3 of 3 START seed file test.raw_payments ....................................... [RUN]
00:57:11 3 of 3 OK loaded seed file test.raw_payments ................................... [INSERT 113 in 0.13s]
00:57:11
00:57:11 Finished running 3 seeds in 0 hours 0 minutes and 0.61 seconds (0.61s).
00:57:11
00:57:11 Completed successfully
00:57:11
00:57:11 Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
S/W-Dev
12
12
dbt run
运行模型 dbt run or dbt run -d
S/W-Dev
13
13
dbt test
数据检查
S/W-Dev
14
14
dbt docs
dbt docs generate
dbt docs serve
S/W-Dev
15
15
1.1. 测试问题
软件开发流程
需求工程
设计
实现
测试与上线
S/W-Dev
16
16

dbt data engineering introduction data build tools