4. Rapidly expanding
data
New users and
use cases
Diverse tools
and applications
ORGANIZATIONS ARE
STRUGLLING TO
SUPPORT THEIR
ANALYTICS NEEDS
Challenges in the world of analytics
13. • Allows loading
several tables from
one source
• In one statement
Multi-Table Insert
INSERT ALL
WHEN (SELECT COUNT(*)
FROM dv_hub_country hc
WHERE hc.hub_country_key= stg.hash_key) = 0
THEN
INTO dv.hub_country (hub_country_key, country_abbrv, hub_load_dts,
hub_rec_src)
VALUES (stg.hash_key, stg.country_abbrv, stg.load_dts, stg.rec_src)
WHEN (SELECT COUNT(*)
FROM dv.sat_countries sc
WHERE sc.hub_country_key = stg.hash_key AND
sc.hash_diff = MD5(country_name)) = 0
THEN
INTO dv.sat_countries (hub_country_key, sat_load_dts,
hash_diff, sat_rec_src, country_name)
VALUES (stg.hash_key, stg.load_dts,
stg.hash_diff, stg.rec_src, stg.country_name)
SELECT MD5(country_abbrv) AS hash_key, country_abbv, country_name,
MD5(country_name) AS hash_diff,
CURRENT_TIMESTAMP AS load_dts,
stage_rec_src AS rec_src
FROM stage.country stg;
14. Hashing
• Snowflake supports both MD5, SHA1 and SHA2 algorithms in standard,
HEX, and binary formats
• i.e. SHA2(), SHA2_HEX() or SHA2_BINARY()
SELECT SHA2_BINARY('Snowflake', 384);
736BD8A53845348830B1EE63A8CD3972F031F13B111F66FFDEC2271A7AE709662E5
03A0CA305BD50DA8D1CED48CD45D9
15. Perhaps You Wish to Split for Security Reasons?
From This
DV Physical Partitioning
Single Snowflake Database
Schema layout
In Snowflake – you
can easily split for
security reasons
To THIS!
DV Logical Partitioning
Non-sensitive
data
PII or sensitive data with
a Link to non-sensitive
data
Snowflake DB 1 Snowflake DB 2
21. Applying Virtual Warehouses to Data Vault
1. Use a Multi-Cluster
Warehouse for Source to
Stage loads
2. Use a separate Warehouse
for each DV entity type
• Zero contention
• Maximum parallelism
• Maximum flexibility -
each can be
independently sized
22. • Anticipated surges
• Explicitly increase WH nodes (T-shirt size) when expecting more data
• Explicitly increase MCWH minimum clusters when expecting more queries
• Can do both at once with ALTER WAREHOUSE
• Use cron or other scheduling/orchestration tool
• Unanticipated surges
• Rely on MCWH maximum clusters for some extra headroom
• Maximize Business Agility!
• Responsiveness for users
• Throughput and value extracted from variable compute power
• Minimize
• Cost and administrative overhead
Agile Warehouse Scaling – Best Practices
26. JSON Support with SQL - Variant
Apple 101.12 250 FIH-2316
Pear 56.22 202 IHO-6912
Orange 98.21 600 WHQ-6090
Structured data
(CSV)
Semi-structured data
(e.g. JSON, Avro, XML)
{ "firstName": "John",
"lastName": "Smith",
"height_cm": 167.64,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
"phoneNumbers": [
{ "type": "home", "number": "212 555-1234" },
{ "type": "office", "number": "646 555-4567" }
]
}
select v:lastName::string as last_name
from json_demo;
All Your Data!
27. • Create Sats using a VARIANT
• Hub Key
• Load DTS
• Rec Source
• VARIANT
• Load JSON doc into VARIANT column
• Create a View on the Sat for Business Vault
• View presents only the JSON attributes needed for
current use case
Combine Data Vault & Schema-on-Read
Benefits
ØNo change to Sat
definitions if structure
changes
ØBusiness Vault view is
unaffected
ØChange or add new BV
views over time as
needed
36. Founded in 2008, Aptus Health connects health and life
sciences companies with healthcare professionals,
healthcare consumers, and other members of the
healthcare ecosystem.
The company creates multichannel marketing campaigns
aimed at physicians and healthcare consumers through the
many digital channels the company owns and operates.
Aptus Health
37. Where They Started
Mobile App
Campaign
Management
Sample
Requests
Daily ETL
Apply Business Rules
Data Warehouse
- User Activity
- Campaign Activity
- Sample Request Activity
- User Data
- Mobile App Activity
Cube
Reports
Web & Mobile
Analytics
Extracts
Fill
Stage
Load F
D
D
D
D
D
D
D
D
Engagement
Platform
38. • A business-driven roadmap in collaboration with Engineering to drive priorities
• Central availability of data related to Healthcare Providers across all platforms
• Ability to efficiently answer business questions: new or previous
• New types of analysis should not require weeks of data preparation
• Do not lose data that could have future analytical value
• Ability to analyze historic events
• Clear documentation of all calculations/derivations of data for complete
transparency in meaning of information & to instill confidence
• Master Data Management: stop passing around spreadsheets
• Framework that will be extensible to both US and international data
• Timely data: minutes, not days
• Business reporting enabled with limited impact to data management projects
What did the business need?
39. • Breaking down data silos
• Integrate physically and logically disparate data
• Acceleration of integrating new data sets
• Enable broad use of data sets
• Ease integration of datasets across AH avoiding point-to-point integrations
• Scale, Elasticity, and Cost-effectiveness
What was IT asking for?