Successfully reported this slideshow.
Your SlideShare is downloading. ×

Cloud and Analytics - From Platforms to an Ecosystem

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Intro to Delta Lake
Intro to Delta Lake
Loading in …3
×

Check these out next

1 of 18 Ad

Cloud and Analytics - From Platforms to an Ecosystem

Download to read offline

Zurich North America is one of the largest providers of insurance solutions and services in the world with customers representing a wide range of industries from agriculture to construction and more than 90 percent of the Fortune 500.

Zurich North America is one of the largest providers of insurance solutions and services in the world with customers representing a wide range of industries from agriculture to construction and more than 90 percent of the Fortune 500.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Cloud and Analytics - From Platforms to an Ecosystem (20)

Advertisement

More from Databricks (20)

Recently uploaded (20)

Advertisement

Cloud and Analytics - From Platforms to an Ecosystem

  1. 1. Cloud and Analytics – from Platforms to an Ecosystem Ming Yuan, Zurich North America David Carlson, Databricks
  2. 2. Agenda ▪ Data and Analytics at ZNA ▪ Data and Metadata ▪ Data Exploration and ETL ▪ Containerization ▪ DevOps in Analytics
  3. 3. Zurich is a data-enabled innovative company • Data is used in day-to-day decision makings in key business domains • A strong data science team delivers predictive models and business insights • We are an early adopter of advanced analytics and cloud analytics Multiple Databases On-premises Data Warehouse Hadoop Data Lake Cloud Data Lake • Governance processes on data access and utilization are established • Metadata is collected and stored in the repository system
  4. 4. Key capabilities support data analytics life cycle • Data Discovery • Data Integration • Collaboration • Business Impact (Operationalization) • Scalability • Multiple Personas • Support multiple types of implementations Ideation Model Build Model Deployment Model Execution Model Monitoring
  5. 5. ▪ Support ML and advanced analysis to discover business insights and drive appropriate actions ▪ Enable cross-domain data sharing, aggregation, and integration ▪ Modernize the technical landscape to handle data sets that were previously unprocessable Data foundation and processing power Data ▪ Optimize data processing and archiving strategies to reduce operation costs ▪ Apply data governance best practices to manage utilization
  6. 6. Data lake consists of ADLS and Databricks® clusters Provisioning Store Landing Staging Active Archive Change Data Capture (CDC) or full snapshots Enrich Landing zone data with additional Date format fields and remove Special characters. CDC records applied (I, U, D) to copy of previous day's data Rolling pointers to previous day's Active… Curation Layer Universal Data Model Curated Data Sets Data Sources Data Consumption Azure Subscription Services Enterprise level curated datasets covering broad utilization Pertaining to the needs in specific business domain
  7. 7. Metadata management and data discovery ▪ For metadata administrators ▪ Maintain business glossary for data domains that are owned by function or business units ▪ Import technical metadata and catalog it as data assets ▪ Curate technical metadata relating them to logical business terms ▪ Maintain data-flows mapping transformations ▪ For data consumers ▪ Search, explore and discover data assets and data lineage ▪ Interpret data with correct meaning and context ▪ Navigate data flows to analyses processes and assess change impact ▪ Evaluate data quality reports and drive improvement actions
  8. 8. Alation® Data Catalog manages metadata ingestions Database Data Warehouse Cloud Data Lake JSON Streams Ingest and refresh schema, table, and column definitions Build data lineage, popularity, common queries, and more Profile and store sample data sets Collect user information and usage metrics Open APIs to programmatically import business glossaries 2,053,632
  9. 9. Intuitive user interfaces to access metadata Users and Stewards actively curate the pages Natural-language search to easily discover unknowns Everyone collaborates and communicates Query intelligently against source systems
  10. 10. Data exploration and ETL implementations ▪ Explore, valid and analyze existing data sets ▪ Curate new data sets for model development ▪ Construct ETL flows with embedded AI/modeling components ▪ Release ETL flows to production environment ▪ Provide runtime environments to trigger, manage, and monitor ETL flows in production
  11. 11. Leverage technical stack and skills across Personas LINUX Server on Azure Cloud CENTRALIZED OR AD- HOC DATA SOURCES, DATA LAKE AVAILABLE OR SPUN-UP PROCESSING RESOURCES Leveraging best storage and compute resources Dataiku deployment servers for enterprise grade operationalization PRODUCTION SYSTEMS Centralized server to facilitate access to data, and foster collaboration Browser based user interfaces User/task specific interaction modes INTEGRATION WITH METADATA SYSTEM
  12. 12. Containerization in building model API services ▪ Standardize the runtime environment using commonly used ML libraries for development and production ▪ Elastically scale the system capacity for the development environment ▪ Easily migrate system stacks from development environment to production ▪ Build CI/CD pipelines and deployment environments based on open standards ▪ Monitor and ensure the health of model implementations in production
  13. 13. Containerize models as cloud-native applications Client App Client App Orchestration We observed improved agility in development, more portability in deployment, and better elasticity in production
  14. 14. DevOps in data & analytics ▪ For platform administrators ▪ Codify the installation and configuration of key components in the ecosystem ▪ Streamline the process of testing and upgrading systems to newer versions ▪ Automate system’s backup and restoration ▪ For model services developers ▪ Standardize the deployment pipelines to reduce the effort per project ▪ Increase the agility of deploying applications from development to production ▪ Reduce the time to fix bugs after production releases
  15. 15. CI/CD processes accelerate app deployments Prod Azure App Services Azure Container Registry Dev Azure Pipeline (Release) Azure Pipeline (Build) Azure Code Repos Azure App Services Azure Container Registry QA Azure Pipeline (Release) Azure Pipeline (Build) Azure Code Repos Azure App Services
  16. 16. Analytical platforms fitting into different scenarios are integrated as an ecosystem Ideation Model Build Model Deployment Model Execution Model Monitoring
  17. 17. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  18. 18. Zurich Insurance Group (Zurich), headquartered and founded in Switzerland, is a leading multi-line insurance group with more than 140 years’ experience serving businesses worldwide, including over 100 years in North America. We are committed to delivering broad and flexible insurance solutions to our customers and helping them understand, manage and minimize risk. Through member companies in North America, Zurich is a leading commercial property-casualty insurance provider serving small businesses, mid-sized and large companies, including multinational corporations. § Approximately 55,000 employees § Managing complex risks for 7,600 international programs through our global network § Achieving USD 5.3 billion in business operating profit (BOP) in 2019 § Providing comprehensive solutions and insights for 25 industries § Insuring more than 215,500 customers § Insuring more than 90 percent of the Fortune 500 The Alation Data Catalog and its logo is used with kind permission of Alation, Inc. The Dataiku DSS and its logo is used with kind permission of Dataiku, Inc. The Domino Data Lab and its logo is used with kind permission of Domino Data Lab, Inc. Use of them does not endorse the products.

×