Cloud and Analytics - From Platforms to an Ecosystem

Cloud and Analytics
– from Platforms to an Ecosystem
Ming Yuan, Zurich North America
David Carlson, Databricks

Agenda
▪ Data and Analytics at ZNA
▪ Data and Metadata
▪ Data Exploration and ETL
▪ Containerization
▪ DevOps in Analytics

Zurich is a data-enabled innovative company
• Data is used in day-to-day decision makings in key business
domains
• A strong data science team delivers predictive models and
business insights
• We are an early adopter of advanced analytics and cloud
analytics
Multiple Databases
On-premises Data Warehouse
Hadoop Data Lake
Cloud Data Lake
• Governance processes on data access and utilization are established
• Metadata is collected and stored in the repository system

Key capabilities support data analytics life cycle
• Data Discovery
• Data Integration
• Collaboration
• Business Impact (Operationalization)
• Scalability
• Multiple Personas
• Support multiple types of implementations
Ideation
Model Build
Model
Deployment
Model
Execution
Model
Monitoring

▪ Support ML and advanced analysis to discover business insights and drive
appropriate actions
▪ Enable cross-domain data sharing, aggregation, and integration
▪ Modernize the technical landscape to handle data sets that were previously
unprocessable
Data foundation and processing power
Data
▪ Optimize data processing and archiving strategies to reduce
operation costs
▪ Apply data governance best practices to manage utilization

Data lake consists of ADLS and Databricks® clusters
Provisioning Store
Landing Staging Active Archive
Change Data
Capture
(CDC) or full
snapshots
Enrich
Landing zone
data with
additional
Date format
fields and
remove
Special
characters.
CDC records
applied
(I, U, D) to
copy of
previous
day's data
Rolling
pointers to
previous
day's
Active…
Curation Layer
Universal
Data
Model
Curated
Data Sets
Data
Sources
Data
Consumption
Azure Subscription
Services
Enterprise
level
curated
datasets
covering
broad
utilization
Pertaining
to the
needs in
specific
business
domain

Metadata management and data discovery
▪ For metadata administrators
▪ Maintain business glossary for data domains that are owned by function or business units
▪ Import technical metadata and catalog it as data assets
▪ Curate technical metadata relating them to logical business terms
▪ Maintain data-flows mapping transformations
▪ For data consumers
▪ Search, explore and discover data assets and data lineage
▪ Interpret data with correct meaning and context
▪ Navigate data flows to analyses processes and assess change impact
▪ Evaluate data quality reports and drive improvement actions

Alation® Data Catalog manages metadata ingestions
Database
Data Warehouse
Cloud Data Lake
JSON Streams
Ingest and refresh schema, table, and column definitions
Build data lineage, popularity, common queries, and more
Profile and store sample data sets
Collect user information and usage metrics
Open APIs to programmatically import business glossaries
2,053,632

Intuitive user interfaces to access metadata
Users and Stewards
actively curate the pages
Natural-language
search to easily discover
unknowns
Everyone collaborates
and communicates
Query intelligently against
source systems

Data exploration and ETL implementations
▪ Explore, valid and analyze existing data sets
▪ Curate new data sets for model development
▪ Construct ETL flows with embedded AI/modeling components
▪ Release ETL flows to production environment
▪ Provide runtime environments to trigger, manage, and monitor ETL flows in
production

Leverage technical stack and skills across Personas
LINUX Server on
Azure Cloud
CENTRALIZED OR AD-
HOC DATA SOURCES,
DATA LAKE
AVAILABLE OR SPUN-UP
PROCESSING RESOURCES
Leveraging best
storage and
compute
resources
Dataiku deployment servers for
enterprise grade operationalization
PRODUCTION SYSTEMS
Centralized server to facilitate
access to data, and foster
collaboration
Browser
based user
interfaces
User/task specific
interaction modes
INTEGRATION WITH
METADATA SYSTEM

Containerization in building model API services
▪ Standardize the runtime environment using commonly used ML libraries for
development and production
▪ Elastically scale the system capacity for the development environment
▪ Easily migrate system stacks from development environment to production
▪ Build CI/CD pipelines and deployment environments based on
open standards
▪ Monitor and ensure the health of model implementations in
production

Containerize models as cloud-native applications
Client App
Client App
Orchestration
We observed improved agility in development, more portability in deployment, and better elasticity in production

DevOps in data & analytics
▪ For platform administrators
▪ Codify the installation and configuration of key components in the ecosystem
▪ Streamline the process of testing and upgrading systems to newer versions
▪ Automate system’s backup and restoration
▪ For model services developers
▪ Standardize the deployment pipelines to reduce the effort per project
▪ Increase the agility of deploying applications from development to production
▪ Reduce the time to fix bugs after production releases

CI/CD processes accelerate app deployments
Prod
Azure App
Services
Azure Container
Registry
Dev
Azure Pipeline
(Release)
Azure Pipeline
(Build)
Azure Code
Repos
Azure App
Services
Azure Container
Registry
QA
Azure Pipeline
(Release)
Azure Pipeline
(Build)
Azure Code
Repos
Azure App
Services

Analytical platforms fitting into different scenarios
are integrated as an ecosystem
Ideation Model Build Model Deployment Model Execution Model Monitoring

Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.

Zurich Insurance Group (Zurich), headquartered and founded in Switzerland, is a leading multi-line
insurance group with more than 140 years’ experience serving businesses worldwide, including over
100 years in North America. We are committed to delivering broad and flexible insurance solutions to
our customers and helping them understand, manage and minimize risk.
Through member companies in North America, Zurich is a leading commercial property-casualty
insurance provider serving small businesses, mid-sized and large companies, including multinational
corporations.
§ Approximately 55,000 employees
§ Managing complex risks for 7,600 international programs through our global network
§ Achieving USD 5.3 billion in business operating profit (BOP) in 2019
§ Providing comprehensive solutions and insights for 25 industries
§ Insuring more than 215,500 customers
§ Insuring more than 90 percent of the Fortune 500
The Alation Data Catalog and its logo is used with kind permission of Alation, Inc.
The Dataiku DSS and its logo is used with kind permission of Dataiku, Inc.
The Domino Data Lab and its logo is used with kind permission of Domino Data Lab, Inc.
Use of them does not endorse the products.

Cloud and Analytics - From Platforms to an Ecosystem

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cloud and Analytics - From Platforms to an Ecosystem

Similar to Cloud and Analytics - From Platforms to an Ecosystem (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Cloud and Analytics - From Platforms to an Ecosystem