Prepared by
AWS/Snowflake Practice
Delivering tangible business outcomes on AWS/Snowflake
July 2019
Altis Consulting
Company
overview
• Established in 1998
• Offices in Sydney,
Melbourne, Canberra,
Auckland and London
• Vendor independent
• Largest ANZ specialist Data & Analytics Consultancy
100
Introduction
About Altis Consulting
Who knows about Altis?
Gartner recognition 2nd year running
What do we do? Consulting Services
Our information management model
AWS Experience – Sample Snowflake Projects
• Strategy/roadmap.
• Data platform architecture design.
• Implementation of greenfield data warehouse
• Snowflake design
• Production readiness assessment
• Implementation of real-time web analytics
• Migration from Redshift
• Data platform architecture design.
• Implementation of greenfield data warehouse
• Data Lake integration
• Strategy/roadmap.
• Data platform architecture design.
• Implementation scoping
• Architecture Guidance.
• Data Warehouse Implementation
• Data platform architecture design.
• Proof of concept
• Data Migration from legacy DW
Company 1
Company 2
Company 3
Company 4
Company 5
Company 6
Sample Architectures
Integrated Data Hub
DD MMMM YYYY
Presenter Name
Use Case: Data & Analytics Platform including Self-serve
Customer Challenge: Siloed data, multiple datamarts and
multiple versions of the truth
Solution:
- Data sources: Operational Data Sources
- Data ingestion: Dell Boomi
- Data Store: S3, Snowflake
- Data Processing: Matillion
- Data Presenation: Tableau
Benefits:
• Certified gas production dashboards distributed to the field
and allowing near-real time input/visibility of field
commentary back at head office
• Centrallised/certified copy of the truth for data & analytics
purposes across all subject areas
• Support for Advanced Analytics capabilities (eg. predictive
maintenance)
• Geospatial planning of work activities
Well
view
MDM
OpsDB
Others.
..
Staging
DWH Users
Source
Systems
DataLake
ODS
- Persisted
- Integrated
- Volatile (data gets updated)
- Most recent view of IG data
- PPDM aligned
An exact copy of
source-system
data, unchanged
- Persisted
- Integrated
- Non-Volatile
- Current and historical view
of IG data
- Subject Oriented
DW
Advanced
Analytics /
Data
Science
Use-
Cases
Batched
integrations
Extract
- Transient
- Required by
Matillion
- Only the latest
data from s3
Stage
- Persisted
- Not-integrated
- Volatile (data gets
updated)
- Best practice for
data loading flexiblity
- Transient
Transformation
Layer
- Time-series
- Near real-time
Advanced
Analytics
Custom
Application
Use Cases
Data
Integration
DW Modernization
DD MMMM YYYY
Presenter Name
Use Case: Greenfield Data Warehouse
Customer Challenge: No framework for warehousing and
reporting on sales. Reports generated directly from source system
databases via stored procedures. Limited ability to utilize modern
analytics and artificial intelligence technologies to improve
decision making.
Solution:
• Data Sources: RDS / Flat Files
• Data Store: Amazon S3
• ETL: Matillion
• Data Warehouse: Snowflake
• Reporting and analytics: Tableau
Benefits:
Established Data Warehouse according to best practice
Created metadata driven ETL framework which supports rapid
integration of new data sources
Built sales subject area allowing reproduction of existing reports
and adding self-service reports in Tableau
Business well positioned to leverage Analytics and AI
technologies
Landing
Amazon S3
Object Store
Data Acquisition Data WarehouseData Transformation
AWS Source
Systems
Snowflake
Processed
Amazon S3
Batch Data
Processing
Batch
Data
Extract
Presentation
Core DB
Retail Gift Cards
Amazon S3
Digital Data Analytics
DD MMMM YYYY
Presenter Name
Use Case: Web Analytics
Customer Challenge: Impossibility to access server stats data
in near real-time.
Solution:
- Data sources: Web server stats
- Data Ingestion: Kinesis Streams and Firehose
- Data Store: S3, Snowflake
- Data Processing: Python and Luigi
- Reporting and analytics: Tableau
Benefits:
- Real-time data ingestion
- JSON parsing allowing extraction of relevant data
- Web server stats can be access soon after the event
Web Servers
Staging
Amazon S3
Object StoreData Acquisition Data Warehouse
Data Transformation
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Snowflake
Presentation
ETL
EC2
Connecting with
courage, heart
and insight
Tel +61 2 9211 1522
OfficeSYD@altis.com.au
www.altis.com.au

Altis AWS Snowflake Practice

  • 1.
    Prepared by AWS/Snowflake Practice Deliveringtangible business outcomes on AWS/Snowflake July 2019 Altis Consulting
  • 2.
    Company overview • Established in1998 • Offices in Sydney, Melbourne, Canberra, Auckland and London • Vendor independent • Largest ANZ specialist Data & Analytics Consultancy 100 Introduction About Altis Consulting
  • 3.
    Who knows aboutAltis? Gartner recognition 2nd year running
  • 4.
    What do wedo? Consulting Services Our information management model
  • 5.
    AWS Experience –Sample Snowflake Projects • Strategy/roadmap. • Data platform architecture design. • Implementation of greenfield data warehouse • Snowflake design • Production readiness assessment • Implementation of real-time web analytics • Migration from Redshift • Data platform architecture design. • Implementation of greenfield data warehouse • Data Lake integration • Strategy/roadmap. • Data platform architecture design. • Implementation scoping • Architecture Guidance. • Data Warehouse Implementation • Data platform architecture design. • Proof of concept • Data Migration from legacy DW Company 1 Company 2 Company 3 Company 4 Company 5 Company 6
  • 6.
  • 7.
    Integrated Data Hub DDMMMM YYYY Presenter Name Use Case: Data & Analytics Platform including Self-serve Customer Challenge: Siloed data, multiple datamarts and multiple versions of the truth Solution: - Data sources: Operational Data Sources - Data ingestion: Dell Boomi - Data Store: S3, Snowflake - Data Processing: Matillion - Data Presenation: Tableau Benefits: • Certified gas production dashboards distributed to the field and allowing near-real time input/visibility of field commentary back at head office • Centrallised/certified copy of the truth for data & analytics purposes across all subject areas • Support for Advanced Analytics capabilities (eg. predictive maintenance) • Geospatial planning of work activities Well view MDM OpsDB Others. .. Staging DWH Users Source Systems DataLake ODS - Persisted - Integrated - Volatile (data gets updated) - Most recent view of IG data - PPDM aligned An exact copy of source-system data, unchanged - Persisted - Integrated - Non-Volatile - Current and historical view of IG data - Subject Oriented DW Advanced Analytics / Data Science Use- Cases Batched integrations Extract - Transient - Required by Matillion - Only the latest data from s3 Stage - Persisted - Not-integrated - Volatile (data gets updated) - Best practice for data loading flexiblity - Transient Transformation Layer - Time-series - Near real-time Advanced Analytics Custom Application Use Cases Data Integration
  • 8.
    DW Modernization DD MMMMYYYY Presenter Name Use Case: Greenfield Data Warehouse Customer Challenge: No framework for warehousing and reporting on sales. Reports generated directly from source system databases via stored procedures. Limited ability to utilize modern analytics and artificial intelligence technologies to improve decision making. Solution: • Data Sources: RDS / Flat Files • Data Store: Amazon S3 • ETL: Matillion • Data Warehouse: Snowflake • Reporting and analytics: Tableau Benefits: Established Data Warehouse according to best practice Created metadata driven ETL framework which supports rapid integration of new data sources Built sales subject area allowing reproduction of existing reports and adding self-service reports in Tableau Business well positioned to leverage Analytics and AI technologies Landing Amazon S3 Object Store Data Acquisition Data WarehouseData Transformation AWS Source Systems Snowflake Processed Amazon S3 Batch Data Processing Batch Data Extract Presentation Core DB Retail Gift Cards Amazon S3
  • 9.
    Digital Data Analytics DDMMMM YYYY Presenter Name Use Case: Web Analytics Customer Challenge: Impossibility to access server stats data in near real-time. Solution: - Data sources: Web server stats - Data Ingestion: Kinesis Streams and Firehose - Data Store: S3, Snowflake - Data Processing: Python and Luigi - Reporting and analytics: Tableau Benefits: - Real-time data ingestion - JSON parsing allowing extraction of relevant data - Web server stats can be access soon after the event Web Servers Staging Amazon S3 Object StoreData Acquisition Data Warehouse Data Transformation Amazon Kinesis Streams Amazon Kinesis Firehose Snowflake Presentation ETL EC2
  • 10.
    Connecting with courage, heart andinsight Tel +61 2 9211 1522 OfficeSYD@altis.com.au www.altis.com.au

Editor's Notes

  • #8 Source Systems - Modelled in 3NF - Current view of IG - Kept in sync with MDM (hopefully!) - Operational use / Reporting ODS - Modelled in 3NF (just like source systems!) - Most-recent view of IG - Source for DWH - One big IG source system! - Batch-updated, not real-time - No use-case at present (Operational reporting... but late?) - Part of IG's Architecture Standards ODS Requirements - Persist the most recent copy of data - Integrate across source systems - Take a balanced approach to normalisation, focus on deduplication, and ease of use, and ease of maintenance - Align naming conventions and concepts with PPDM - Batch updated ODS Uses - One stop IG source system for applications requiring data from multiple source systems (i.e. PPAC) - Operational reporting (TBC) DWH - Facts - Dimensions - Business Rules - Single Source of Truth - Strategic and Tactical reporting - Predictive Analytics (Machine Learning / Data Science) - Needed for ad hoc reporting and visualisation - Self Service BI Tools - Analyse DWH data - Visualise data - Data updated when the DWH updates - No business logic (already present in the DWH) - Many reports, same data - Drill-up, drill-down, drill-across, drill-through - Dimensional Modelling - Star Schema - Auditability - Tracability - EDW (Enterprise) - Decision Support System