© 2023 Snowflake Inc. All Rights Reserved
FROM RAW DATA TO
INTERACTIVE DATA APP!
Powered by Snowpark Python
© 2023 Snowflake Inc. All Rights Reserved
© 2023 Snowflake Inc. All Rights Reserved
Challenges in Developing Data Pipelines
- Troubleshooting & debugging failed jobs
- Multi-page stack trace
- Capacity management & resource sizing
- Setting up Infrastructure and Configs
- Executor memory
- Driver memory
- # of executors
- Z-ordering, V-ordering, ABC-ordering
- Partitioning, Bucketing, Salting
© 2023 Snowflake Inc. All Rights Reserved
Challenges in Developing Data Pipelines
- Troubleshooting a failed spark job
- Multi-page stack trace
- Setting up Infrastructure and Configs
- Executor memory
- Driver memory
- # of executors
- Z-ordering, V-ordering, ABC-ordering
- Partitioning, Bucketing, Salting
© 2023 Snowflake Inc. All Rights Reserved
Challenges Today
- Troubleshooting a failed spark job
- Multi-page stack trace
- Setting up Infrastructure and Configs
- Executor memory
- Driver memory
- # of executors
- Z-ordering, V-ordering, ABC-ordering
- Partitioning, Bucketing, Salting
© 2023 Snowflake Inc. All Rights Reserved
Challenges Today
- Troubleshooting a failed spark job
- Multi-page stack trace
- Setting up Infrastructure and Configs
- Executor memory
- Driver memory
- # of executors
- Z-ordering, V-ordering, ABC-ordering
- Partitioning, Bucketing, Salting
© 2023 Snowflake Inc. All Rights Reserved
Challenges Today
- Troubleshooting a failed spark job
- Multi-page stack trace
- Setting up Infrastructure and Configs
- Executor memory
- Driver memory
- # of executors
- Z-ordering, V-ordering, ABC-ordering
- Partitioning, Bucketing, Salting
© 2023 Snowflake Inc. All Rights Reserved
Challenges Today
- Troubleshooting a failed spark job
- Multi-page stack trace
- Setting up Infrastructure and Configs
- Executor memory
- Driver memory
- # of executors
- Z-ordering, V-ordering, ABC-ordering
- Partitioning, Bucketing, Salting
© 2023 Snowflake Inc. All Rights Reserved
Challenges Today
- Troubleshooting a failed spark job
- Multi-page stack trace
- Setting up Infrastructure and Configs
- Executor memory
- Driver memory
- # of executors
- Z-ordering, V-ordering, ABC-ordering
- Partitioning, Bucketing, Salting
© 2023 Snowflake Inc. All Rights Reserved
Challenges Today
- Troubleshooting a failed spark job
- Multi-page stack trace
- Setting up Infrastructure and Configs
- Executor memory
- Driver memory
- # of executors
- Z-ordering, V-ordering, ABC-ordering
- Partitioning, Bucketing, Salting
© 2023 Snowflake Inc. All Rights Reserved
Challenges Today
- Troubleshooting a failed spark job
- Multi-page stack trace
- Setting up Infrastructure and Configs
- Executor memory
- Driver memory
- # of executors
- Z-ordering, V-ordering, ABC-ordering
- Partitioning, Bucketing, Salting
© 2023 Snowflake Inc. All Rights Reserved
Challenges Today
- Troubleshooting a failed spark job
- Multi-page stack trace
- Setting up Infrastructure and Configs
- Executor memory
- Driver memory
- # of executors
- Z-ordering, V-ordering, ABC-ordering
- Partitioning, Bucketing, Salting
© 2023 Snowflake Inc. All Rights Reserved
Challenges Today
- Troubleshooting a failed spark job
- Multi-page stack trace
- Setting up Infrastructure and Configs
- Executor memory
- Driver memory
- # of executors
- Z-ordering, V-ordering, ABC-ordering
- Partitioning, Bucketing, Salting
© 2023 Snowflake Inc. All Rights Reserved
ENTER SNOWPARK
© 2023 Snowflake Inc. All Rights Reserved
Snowpark for Python
PYTHON • JAVA • SCALA
UDFs Stored Procedures
CLIENT SIDE
LIBRARIES
SERVER SIDE
RUNTIMES
Warehouses (Standard & Snowpark-Optimized)
DataFrame API
© 2023 Snowflake Inc. All Rights Reserved
Snowpark: Secure Deployment
& Processing Of Non-SQL Code
& more
Built-in Anaconda
Packages
Processing Engine
SQL Engine Python Secure
Sandbox
Snowflake Connector for Python
Object Serializer
Query Translator
@udf def detect_fraud()
Python Functions & SProcs
df.filter(df.state == ‘WA’)
DataFrameAPI
Python Bytecode
SQL Query
16
CLIENT SIDE
LIBRARIES
SERVER SIDE
RUNTIMES
© 2023 Snowflake Inc. All Rights Reserved
DATA STREAMING
WITH
DYNAMIC TABLES
© 2023 Snowflake Inc. All Rights Reserved
Streaming in Snowflake
BENEFITS: AFTER
Native support for streaming and continuous batch
data pipelines. Easy, declarative semantics and no
orchestration required, no infrastructure management
SIMPLIFIED PIPELINES
Streaming ingest as much as 50% cheaper than
files. Continuous incremental processing reduces
wasted compute
COST EFFECTIVE
Expanding ecosystem in the Data Cloud with consistent
and strong security, governance, and scalability
NATIVE TO DATA CLOUD
PAIN POINTS: BEFORE
Managing dependencies,
scheduling, and orchestration
COMPLEXITY
Rebuilding tables completely,
no incremental materialization
INEFFICIENCY
Brittle pipelines unable to react
to changes upstream
MANAGEABILITY
© 2023 Snowflake Inc. All Rights Reserved
Streaming ≠ Instantaneous
1 sec 1+ minutes 6+ hours
TIME
Time between event creation and action
VALUE
Value
to
business
SUMMIT
OF NOW
PEAK OF
SOON AFTER
MOUNTAIN
OF WISDOM
VALLEY OF
IRRELEVANCE
© 2023 Snowflake Inc. All Rights Reserved
Streaming ≠ Instantaneous
1 sec 1+ minutes 6+ hours
TIME
Time between event creation and action
COST
Cost
to
business
SUMMIT
OF NOW
PEAK OF
SOON AFTER
MOUNTAIN
OF WISDOM
VALLEY OF
IRRELEVANCE
HIGH COST,
LOW RETURN
LOW COST,
UNTAPPED POTENTIAL
20
© 2023 Snowflake Inc. All Rights Reserved
Streaming Pipelines at a Glance
INGEST TRANSFORM DELIVER
STORAGE SCHEDULING PROCESSING GOVERNANCE
Apps &
Services
OLTP
IoT
Kafka
Rows
Files
Snowpipe
Auto-Ingest
& Streaming
Tables Dynamic Tables*
Sharing
Replication
Native Apps
Worksheets
Dashboards
Serving
Unload
Python, Java, Scala
SQL
In Dev Private Public* GA
© 2023 Snowflake Inc. All Rights Reserved
Ingestion Options
COPY SNOWPIPE SNOWPIPE
STREAMING
Efficient bulk loading of files
Control your own
compute resources
Deterministic latency
Continuous ingestion of files
Serverless
Median latency ~30s
Near real-time ingestion
of rowsets
Client application needed
< 5s median latency
In Dev Private Public GA
© 2023 Snowflake Inc. All Rights Reserved
SNOWPIPE: FILES & STREAMING
In Dev Private Public GA
APPS & SERVICES
OLTP
BUSINESS
INTELLIGENCE
MACHINE
LEARNING
SHARING
COPY &
Snowpipe
Snowpipe Streaming*
& Kafka Connector
STREAMING
Rowsets
Kafka Topics
SNOWPIPE
• Designed for batched rowsets as files
• Auto-scaled ingestion (10M files/10TB per hr)
• Deduplication with file tracking
SNOWPIPE STREAMING
• For rowsets with variable arrival frequency:
insertRows()
• Focus on lower latency & cost
• Ordered ingestion within a channel
BATCH
Files
© 2023 Snowflake Inc. All Rights Reserved
Streaming Use Cases
Use existing event hubs to source data
Flexible latency-cost profiles
Run transformations with all reference data
instead of just single row transforms (ELT & ETL)
Powered by Snowflake apps
Add full power of Snowflake analytics from day 1
One place to query latest window of data &
full history + reference data
Proprietary (ISV-built) pipelines for
continuous analysis
CONNECTORS KAFKA / KINESIS SOURCES
SECURITY & LOG ANALYTICS
Aggregated logs from devices
No need to add event hubs if not needed
Simple post-ingestion cleanup
IOT / DEVICE LOGS
Ingest CDC streams with lower latency
Ensure exactly once semantics
Sourced from OLTP DBs, SaaS apps
Serverless so no clusters / stages to manage
© 2023 Snowflake Inc. All Rights Reserved
Dynamic Tables Overview
CREATE DYNAMIC TABLE <name>
TARGET_LAG = <duration>
WAREHOUSE = <warehouse_name>
AS <select>
SELECT * FROM <name>
Store Results
Automatic Refreshes
Any Query!
NEW TABLE TYPE THAT
AUTOMATICALLY AND CONTINUOUSLY
MATERIALIZES THE RESULTS OF A QUERY
In Dev Private Public GA
© 2023 Snowflake Inc. All Rights Reserved
Dynamic Tables Overview
CONSISTENTLY
FAST TO QUERY
In Dev Private Public GA
Immediate results
Freshness within LAG
Snapshot isolation
CREATE DYNAMIC TABLE <name>
TARGET_LAG = <duration>
WAREHOUSE = <warehouse_name>
AS <select>
SELECT * FROM <name>
© 2023 Snowflake Inc. All Rights Reserved
Key Features
In Dev Private Public GA
DECLARATIVE
DATA PIPELINES
Continuous data pipelines as easy
as SELECT. Complex pipelines with
hundreds of branches. Dynamic Tables
manage the scheduling and orchestration.
SQL
SUPPORT
Use any core SQL syntax to define
transformations, including joins,
unions, aggregations, window
functions, group bys, filters, etc.
USER-DEFINED
FRESHNESS
Controlled by a target lag for each
table, for sake of reduced cost and
improved performance. Data freshness
as low as 1 minute.
AUTOMATIC INCREMENTAL
REFRESHES
Refresh only what's changed, even for
complex queries, automatically (yes,
including UPDATEs and DELETEs!).
SNAPSHOT
ISOLATION
All Dynamic Tables in a DAG are
refreshed consistently from aligned
snapshots.
© 2023 Snowflake Inc. All Rights Reserved
FULL STACK
DATA ENGINEERING
WITH
SNOWPARK
© 2023 Snowflake Inc. All Rights Reserved
Full Stack DE with Python
© 2023 Snowflake Inc. All Rights Reserved
Let’s build a Data App!
Ad Spend Optimizer for Ski Gear Co.
© 2023 Snowflake Inc. All Rights Reserved
THANK YOU!

From Raw Data to an Interactive Data App in an Hour: Powered by Snowpark Python

  • 1.
    © 2023 SnowflakeInc. All Rights Reserved FROM RAW DATA TO INTERACTIVE DATA APP! Powered by Snowpark Python
  • 2.
    © 2023 SnowflakeInc. All Rights Reserved
  • 3.
    © 2023 SnowflakeInc. All Rights Reserved Challenges in Developing Data Pipelines - Troubleshooting & debugging failed jobs - Multi-page stack trace - Capacity management & resource sizing - Setting up Infrastructure and Configs - Executor memory - Driver memory - # of executors - Z-ordering, V-ordering, ABC-ordering - Partitioning, Bucketing, Salting
  • 4.
    © 2023 SnowflakeInc. All Rights Reserved Challenges in Developing Data Pipelines - Troubleshooting a failed spark job - Multi-page stack trace - Setting up Infrastructure and Configs - Executor memory - Driver memory - # of executors - Z-ordering, V-ordering, ABC-ordering - Partitioning, Bucketing, Salting
  • 5.
    © 2023 SnowflakeInc. All Rights Reserved Challenges Today - Troubleshooting a failed spark job - Multi-page stack trace - Setting up Infrastructure and Configs - Executor memory - Driver memory - # of executors - Z-ordering, V-ordering, ABC-ordering - Partitioning, Bucketing, Salting
  • 6.
    © 2023 SnowflakeInc. All Rights Reserved Challenges Today - Troubleshooting a failed spark job - Multi-page stack trace - Setting up Infrastructure and Configs - Executor memory - Driver memory - # of executors - Z-ordering, V-ordering, ABC-ordering - Partitioning, Bucketing, Salting
  • 7.
    © 2023 SnowflakeInc. All Rights Reserved Challenges Today - Troubleshooting a failed spark job - Multi-page stack trace - Setting up Infrastructure and Configs - Executor memory - Driver memory - # of executors - Z-ordering, V-ordering, ABC-ordering - Partitioning, Bucketing, Salting
  • 8.
    © 2023 SnowflakeInc. All Rights Reserved Challenges Today - Troubleshooting a failed spark job - Multi-page stack trace - Setting up Infrastructure and Configs - Executor memory - Driver memory - # of executors - Z-ordering, V-ordering, ABC-ordering - Partitioning, Bucketing, Salting
  • 9.
    © 2023 SnowflakeInc. All Rights Reserved Challenges Today - Troubleshooting a failed spark job - Multi-page stack trace - Setting up Infrastructure and Configs - Executor memory - Driver memory - # of executors - Z-ordering, V-ordering, ABC-ordering - Partitioning, Bucketing, Salting
  • 10.
    © 2023 SnowflakeInc. All Rights Reserved Challenges Today - Troubleshooting a failed spark job - Multi-page stack trace - Setting up Infrastructure and Configs - Executor memory - Driver memory - # of executors - Z-ordering, V-ordering, ABC-ordering - Partitioning, Bucketing, Salting
  • 11.
    © 2023 SnowflakeInc. All Rights Reserved Challenges Today - Troubleshooting a failed spark job - Multi-page stack trace - Setting up Infrastructure and Configs - Executor memory - Driver memory - # of executors - Z-ordering, V-ordering, ABC-ordering - Partitioning, Bucketing, Salting
  • 12.
    © 2023 SnowflakeInc. All Rights Reserved Challenges Today - Troubleshooting a failed spark job - Multi-page stack trace - Setting up Infrastructure and Configs - Executor memory - Driver memory - # of executors - Z-ordering, V-ordering, ABC-ordering - Partitioning, Bucketing, Salting
  • 13.
    © 2023 SnowflakeInc. All Rights Reserved Challenges Today - Troubleshooting a failed spark job - Multi-page stack trace - Setting up Infrastructure and Configs - Executor memory - Driver memory - # of executors - Z-ordering, V-ordering, ABC-ordering - Partitioning, Bucketing, Salting
  • 14.
    © 2023 SnowflakeInc. All Rights Reserved ENTER SNOWPARK
  • 15.
    © 2023 SnowflakeInc. All Rights Reserved Snowpark for Python PYTHON • JAVA • SCALA UDFs Stored Procedures CLIENT SIDE LIBRARIES SERVER SIDE RUNTIMES Warehouses (Standard & Snowpark-Optimized) DataFrame API
  • 16.
    © 2023 SnowflakeInc. All Rights Reserved Snowpark: Secure Deployment & Processing Of Non-SQL Code & more Built-in Anaconda Packages Processing Engine SQL Engine Python Secure Sandbox Snowflake Connector for Python Object Serializer Query Translator @udf def detect_fraud() Python Functions & SProcs df.filter(df.state == ‘WA’) DataFrameAPI Python Bytecode SQL Query 16 CLIENT SIDE LIBRARIES SERVER SIDE RUNTIMES
  • 17.
    © 2023 SnowflakeInc. All Rights Reserved DATA STREAMING WITH DYNAMIC TABLES
  • 18.
    © 2023 SnowflakeInc. All Rights Reserved Streaming in Snowflake BENEFITS: AFTER Native support for streaming and continuous batch data pipelines. Easy, declarative semantics and no orchestration required, no infrastructure management SIMPLIFIED PIPELINES Streaming ingest as much as 50% cheaper than files. Continuous incremental processing reduces wasted compute COST EFFECTIVE Expanding ecosystem in the Data Cloud with consistent and strong security, governance, and scalability NATIVE TO DATA CLOUD PAIN POINTS: BEFORE Managing dependencies, scheduling, and orchestration COMPLEXITY Rebuilding tables completely, no incremental materialization INEFFICIENCY Brittle pipelines unable to react to changes upstream MANAGEABILITY
  • 19.
    © 2023 SnowflakeInc. All Rights Reserved Streaming ≠ Instantaneous 1 sec 1+ minutes 6+ hours TIME Time between event creation and action VALUE Value to business SUMMIT OF NOW PEAK OF SOON AFTER MOUNTAIN OF WISDOM VALLEY OF IRRELEVANCE
  • 20.
    © 2023 SnowflakeInc. All Rights Reserved Streaming ≠ Instantaneous 1 sec 1+ minutes 6+ hours TIME Time between event creation and action COST Cost to business SUMMIT OF NOW PEAK OF SOON AFTER MOUNTAIN OF WISDOM VALLEY OF IRRELEVANCE HIGH COST, LOW RETURN LOW COST, UNTAPPED POTENTIAL 20
  • 21.
    © 2023 SnowflakeInc. All Rights Reserved Streaming Pipelines at a Glance INGEST TRANSFORM DELIVER STORAGE SCHEDULING PROCESSING GOVERNANCE Apps & Services OLTP IoT Kafka Rows Files Snowpipe Auto-Ingest & Streaming Tables Dynamic Tables* Sharing Replication Native Apps Worksheets Dashboards Serving Unload Python, Java, Scala SQL In Dev Private Public* GA
  • 22.
    © 2023 SnowflakeInc. All Rights Reserved Ingestion Options COPY SNOWPIPE SNOWPIPE STREAMING Efficient bulk loading of files Control your own compute resources Deterministic latency Continuous ingestion of files Serverless Median latency ~30s Near real-time ingestion of rowsets Client application needed < 5s median latency In Dev Private Public GA
  • 23.
    © 2023 SnowflakeInc. All Rights Reserved SNOWPIPE: FILES & STREAMING In Dev Private Public GA APPS & SERVICES OLTP BUSINESS INTELLIGENCE MACHINE LEARNING SHARING COPY & Snowpipe Snowpipe Streaming* & Kafka Connector STREAMING Rowsets Kafka Topics SNOWPIPE • Designed for batched rowsets as files • Auto-scaled ingestion (10M files/10TB per hr) • Deduplication with file tracking SNOWPIPE STREAMING • For rowsets with variable arrival frequency: insertRows() • Focus on lower latency & cost • Ordered ingestion within a channel BATCH Files
  • 24.
    © 2023 SnowflakeInc. All Rights Reserved Streaming Use Cases Use existing event hubs to source data Flexible latency-cost profiles Run transformations with all reference data instead of just single row transforms (ELT & ETL) Powered by Snowflake apps Add full power of Snowflake analytics from day 1 One place to query latest window of data & full history + reference data Proprietary (ISV-built) pipelines for continuous analysis CONNECTORS KAFKA / KINESIS SOURCES SECURITY & LOG ANALYTICS Aggregated logs from devices No need to add event hubs if not needed Simple post-ingestion cleanup IOT / DEVICE LOGS Ingest CDC streams with lower latency Ensure exactly once semantics Sourced from OLTP DBs, SaaS apps Serverless so no clusters / stages to manage
  • 25.
    © 2023 SnowflakeInc. All Rights Reserved Dynamic Tables Overview CREATE DYNAMIC TABLE <name> TARGET_LAG = <duration> WAREHOUSE = <warehouse_name> AS <select> SELECT * FROM <name> Store Results Automatic Refreshes Any Query! NEW TABLE TYPE THAT AUTOMATICALLY AND CONTINUOUSLY MATERIALIZES THE RESULTS OF A QUERY In Dev Private Public GA
  • 26.
    © 2023 SnowflakeInc. All Rights Reserved Dynamic Tables Overview CONSISTENTLY FAST TO QUERY In Dev Private Public GA Immediate results Freshness within LAG Snapshot isolation CREATE DYNAMIC TABLE <name> TARGET_LAG = <duration> WAREHOUSE = <warehouse_name> AS <select> SELECT * FROM <name>
  • 27.
    © 2023 SnowflakeInc. All Rights Reserved Key Features In Dev Private Public GA DECLARATIVE DATA PIPELINES Continuous data pipelines as easy as SELECT. Complex pipelines with hundreds of branches. Dynamic Tables manage the scheduling and orchestration. SQL SUPPORT Use any core SQL syntax to define transformations, including joins, unions, aggregations, window functions, group bys, filters, etc. USER-DEFINED FRESHNESS Controlled by a target lag for each table, for sake of reduced cost and improved performance. Data freshness as low as 1 minute. AUTOMATIC INCREMENTAL REFRESHES Refresh only what's changed, even for complex queries, automatically (yes, including UPDATEs and DELETEs!). SNAPSHOT ISOLATION All Dynamic Tables in a DAG are refreshed consistently from aligned snapshots.
  • 28.
    © 2023 SnowflakeInc. All Rights Reserved FULL STACK DATA ENGINEERING WITH SNOWPARK
  • 29.
    © 2023 SnowflakeInc. All Rights Reserved Full Stack DE with Python
  • 30.
    © 2023 SnowflakeInc. All Rights Reserved Let’s build a Data App! Ad Spend Optimizer for Ski Gear Co.
  • 31.
    © 2023 SnowflakeInc. All Rights Reserved THANK YOU!