SlideShare a Scribd company logo
Agile
Enterprise Analytics
on AWS
April 26, 2019
Copyright 2019 - Don Gillis Zapwerx
Overview
Enterprise data on average is growing at over 30% year over year, yet
traditional analytics approaches have proven to be expensive and
unyielding. The result is that a growing proportion of our data is unused
“dark data”.
However, there is an analytics “perfect storm” happening right now, to the
benefit of enterprises that know how to harness its power:
● Open data formats
● Open source analytics
● Low cost cloud storage
● Rapid cloud innovation
● Low cost pay-as-you-go queries
● Easy to use Serverless components
● Cheap and Accessible Machine Learning and AI tools
Being a Data Driven Organization
Enabling evidence based decision making
● Data Agility
○ Widely Trained data literacy
○ Quickly iterate on questions, queries &
analysis
● Data Access
○ In context integration
○ Wide availability
○ Lower cost scalability
● Data Governance
○ Centralized
○ Single sourced
○ Attributed & Controlled
● Data Community
○ Analytics for everyone
○ Shareable stories
“...enabling numbers
people with
imagination and
story people with
discipline.”
A. Damodaran
Why a Data Lake?
A data lake brings organization-wide discipline to data use and governance
● Data sources are defined, captured and maintained
● Data is initiated and updated automatically
● Data alignment & enrichment processes are explicit
● Data access authorization is defined, asserted, and audited.
● Data is accessible for ad-hoc enquiries
● Data is source-complete
● Data is portable and in well-known open formats.
S3 Data Lake Strategy
Tier 1
Raw data as received from
batch or streaming data
sources.
Apply real-time analytics.
Immediately process to
Tier 2, then archived to
low cost archival storage.
Tier 2
Raw data optimized in
structure and size. Ready
for multiple tool access.
Apply partition strategy.
Optimize for file size.
Apply a highly
compressible columnar
data format such as
parquet or ORC, allowing
casual queries.
Purpose built and/or tool
specific optimizations,
views and applications.
● Redshift
● ElasticSearch
● Elastic MapReduce
Tier 3 (...n)
Data Catalog
Single point of discovery, authorization and access control.
S3 Data Lake Strategy
Tier 1
Raw data as received from
batch or streaming data
sources.
Apply real-time analytics.
Immediately process to
Tier 2, then archived to
low cost archival storage.
Tier 2
Raw data optimized in
structure and size. Ready
for multiple tool access.
Apply partition strategy.
Optimize for file size.
Apply a highly
compressible columnar
data format such as
parquet or ORC, allowing
casual queries.
Purpose built and/or tool
specific optimizations,
views and applications.
● Redshift
● ElasticSearch
● Elastic MapReduce
Tier 3 (...n)
Data Catalog
Single point of discovery, authorization and access control.
Key Solution Elements
● Data Lake - Centralize your data into a data lake flowing from raw to fully prepared,
with each “Tier” having its defined purpose.
● Data Catalog - Establish a single point of data registration, discovery, access, and audit.
● Tiered Data Retention - Keep interesting Tier 1 raw data longterm for future uses
● Open Data Formats - at Tier 2, apply open standard columnar data formats for
portability, discoverability, speed, and compression. Access using open source
technologies like Hadoop, Presto, and columnar in-memory databases engines.
● Schema on Read - separating the schema from the data allows better portability,
flexibility and agility
● Serverless Components - using serverless or managed components for data streaming,
storage, and processing, making it both quick to experiment with and easy to scale.
Approach
● Identify starter use case and its success measures
● Open Access - provide secure but wide access to your data lake allowing the business to:
○ Gain insights
○ Enhance visibility
○ Discover new data applications
○ Make better data driven decisions
○ Drive and measure business value
● Include a long-term AI strategy
○ Retaining raw data
○ Build and enhance your data acquisition strategy
○ Experiment with easy to use AI tools
● Build a plan to encourage adoption
○ Develop a change management & communication plan
○ Provide training and workshops for data analysts, builders, and users
Security & Privacy
● Security - Be aware of the mounting liability of data privacy and security
○ Centralize point of authorization, access control, monitoring, and audit.
○ Use built-in encryption in transit and at rest
○ Build on cloud native security tools
○ Allow for GDPR type subject access requests
○ Use cloud native tooling for data protection
○ Use cloud native tooling for identity & access control, anomaly detection and response
Technology
● Simplify Tooling
○ Your data is big and complex and growing
○ Your tooling should not add to the complexity
○ Use simple cloud native services and patterns
● Machine Learning - Experiment with the easy to use cloud tools for
○ Classifying unstructured data
○ Audio / Video recognition
○ Anomaly detection
○ Personalization
○ Recommendations
Workshop Roles
Participants
The people who bring an understand your
business, its data and goals.
The people who will continue to develop,
manage, and secure your enterprise analytics
service.
Facilitator
Brings an understanding of modern
enterprise analytics and how to implement it
on Amazon Web Services (AWS).
Bring a strategy of how to build your agile
enterprise analytic service. This will act as a
basis upon which we will refine your vision,
and build your service startup plan.
Uses techniques like Value Stream Analysis
to define and refine processes.
Workshop deliverables
A Vision for your Agile Enterprise Analytics
service within the context of your business, its data,
and its goals.
Vision
Start-up Plan
A start-up plan for implementing an initial Agile
Enterprise Analytics service.
Project Proposal A proposal with pricing and terms and conditions.
Delivery Schedule
A recommended delivery schedule to meet your
needs.
Beyond the Workshop
What’s next...
With your Data Lake taking form, it may be time to build your skills in the
application of Machine Learning and AI. You can learn to build and maintain
accurate models, deploy those models efficiently on AWS, and take full
advantage of AI and machine learning to make better predictions faster and
improve your bottom line.
Contact Us
Thank you.
info@zapwerx.com
Copyright 2019 - Don Gillis, Zapwerx

More Related Content

What's hot

Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services Sector
DataWorks Summit
 
Lean Data Lineage
Lean Data LineageLean Data Lineage
Lean Data Lineage
Data to Value Ltd
 
Hopper energyservices
Hopper energyservicesHopper energyservices
Hopper energyservices
hopperdev
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 
ironSource Atom BigData Berlin
ironSource Atom BigData BerlinironSource Atom BigData Berlin
ironSource Atom BigData Berlin
Shimon Tolts
 
Machine Learning in the Data Science Context
Machine Learning in the Data Science ContextMachine Learning in the Data Science Context
Machine Learning in the Data Science Context
sisira samarasinghe
 
Introduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesIntroduction to Big Data using AWS Services
Introduction to Big Data using AWS Services
Anjani Phuyal
 
Data Structure and Types
Data Structure and TypesData Structure and Types
Data Structure and Types
Anjani Phuyal
 
Think Big Analytics Corporate Deck Hadoop Summit June 2011
Think Big Analytics Corporate Deck Hadoop Summit June 2011Think Big Analytics Corporate Deck Hadoop Summit June 2011
Think Big Analytics Corporate Deck Hadoop Summit June 2011
r_farnell
 
Big Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetu
Big Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetuBig Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetu
Big Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetu
Emre Sevinç
 
Introduction to Data Analysis, Storage & Processing Solutions
Introduction to Data Analysis, Storage & Processing SolutionsIntroduction to Data Analysis, Storage & Processing Solutions
Introduction to Data Analysis, Storage & Processing Solutions
Anjani Phuyal
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
Ivo Vachkov
 
Scaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemScaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark Ecosystem
Databricks
 
Finding the insights hidden in your graph data
Finding the insights hidden in your graph dataFinding the insights hidden in your graph data
Finding the insights hidden in your graph data
DataStax
 
ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup
ML Infra @ Spotify: Lessons Learned - Romain Yon -  NYC ML MeetupML Infra @ Spotify: Lessons Learned - Romain Yon -  NYC ML Meetup
ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup
Romain Yon
 
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
Phillip Delaney
 
Business Insight
Business InsightBusiness Insight
Business Insight
Microsoft
 
Why Business Intelligence Should Consider Agile Modern Data Delivery Platform
Why Business Intelligence Should Consider Agile Modern Data Delivery PlatformWhy Business Intelligence Should Consider Agile Modern Data Delivery Platform
Why Business Intelligence Should Consider Agile Modern Data Delivery Platform
syed_javed
 
Introduction to Cloud Applications
Introduction to Cloud ApplicationsIntroduction to Cloud Applications
Introduction to Cloud Applications
DataStax
 

What's hot (19)

Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services Sector
 
Lean Data Lineage
Lean Data LineageLean Data Lineage
Lean Data Lineage
 
Hopper energyservices
Hopper energyservicesHopper energyservices
Hopper energyservices
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
ironSource Atom BigData Berlin
ironSource Atom BigData BerlinironSource Atom BigData Berlin
ironSource Atom BigData Berlin
 
Machine Learning in the Data Science Context
Machine Learning in the Data Science ContextMachine Learning in the Data Science Context
Machine Learning in the Data Science Context
 
Introduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesIntroduction to Big Data using AWS Services
Introduction to Big Data using AWS Services
 
Data Structure and Types
Data Structure and TypesData Structure and Types
Data Structure and Types
 
Think Big Analytics Corporate Deck Hadoop Summit June 2011
Think Big Analytics Corporate Deck Hadoop Summit June 2011Think Big Analytics Corporate Deck Hadoop Summit June 2011
Think Big Analytics Corporate Deck Hadoop Summit June 2011
 
Big Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetu
Big Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetuBig Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetu
Big Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetu
 
Introduction to Data Analysis, Storage & Processing Solutions
Introduction to Data Analysis, Storage & Processing SolutionsIntroduction to Data Analysis, Storage & Processing Solutions
Introduction to Data Analysis, Storage & Processing Solutions
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Scaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemScaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark Ecosystem
 
Finding the insights hidden in your graph data
Finding the insights hidden in your graph dataFinding the insights hidden in your graph data
Finding the insights hidden in your graph data
 
ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup
ML Infra @ Spotify: Lessons Learned - Romain Yon -  NYC ML MeetupML Infra @ Spotify: Lessons Learned - Romain Yon -  NYC ML Meetup
ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup
 
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
 
Business Insight
Business InsightBusiness Insight
Business Insight
 
Why Business Intelligence Should Consider Agile Modern Data Delivery Platform
Why Business Intelligence Should Consider Agile Modern Data Delivery PlatformWhy Business Intelligence Should Consider Agile Modern Data Delivery Platform
Why Business Intelligence Should Consider Agile Modern Data Delivery Platform
 
Introduction to Cloud Applications
Introduction to Cloud ApplicationsIntroduction to Cloud Applications
Introduction to Cloud Applications
 

Similar to Agile enterprise analytics on aws

Big Data and Business Insight
Big Data and Business InsightBig Data and Business Insight
Big Data and Business Insight
Amazon Web Services
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY
 
Alteryx Desktop Designer Overview
Alteryx Desktop Designer OverviewAlteryx Desktop Designer Overview
Alteryx Desktop Designer Overview
Tridant
 
DevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleDevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-Oracle
atSistemas
 
Cloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummitCloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummit
Ming Yuan
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Streamsets Inc.
 
Cloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an EcosystemCloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an Ecosystem
Databricks
 
What are the core components of Azure Data Engineer courses.docx
What are the core components of Azure Data Engineer courses.docxWhat are the core components of Azure Data Engineer courses.docx
What are the core components of Azure Data Engineer courses.docx
kzayra69
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
DataScienceConferenc1
 
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Precisely
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
VMware Tanzu
 
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft Platforms
Sonata Software
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
DATAVERSITY
 
What are the basic key concepts before learning Azure Data Engineer.docx
What are the basic key concepts before learning Azure Data Engineer.docxWhat are the basic key concepts before learning Azure Data Engineer.docx
What are the basic key concepts before learning Azure Data Engineer.docx
Technogeeks
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Denodo
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
VMware Tanzu
 
Google на конференции Big Data Russia
Google на конференции Big Data RussiaGoogle на конференции Big Data Russia
Google на конференции Big Data Russia
rusbase.vc
 
The Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationThe Path to Data and Analytics Modernization
The Path to Data and Analytics Modernization
Analytics8
 
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONCisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONRenee Yao
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Denodo
 

Similar to Agile enterprise analytics on aws (20)

Big Data and Business Insight
Big Data and Business InsightBig Data and Business Insight
Big Data and Business Insight
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Alteryx Desktop Designer Overview
Alteryx Desktop Designer OverviewAlteryx Desktop Designer Overview
Alteryx Desktop Designer Overview
 
DevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleDevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-Oracle
 
Cloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummitCloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummit
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Cloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an EcosystemCloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an Ecosystem
 
What are the core components of Azure Data Engineer courses.docx
What are the core components of Azure Data Engineer courses.docxWhat are the core components of Azure Data Engineer courses.docx
What are the core components of Azure Data Engineer courses.docx
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
 
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
 
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft Platforms
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
What are the basic key concepts before learning Azure Data Engineer.docx
What are the basic key concepts before learning Azure Data Engineer.docxWhat are the basic key concepts before learning Azure Data Engineer.docx
What are the basic key concepts before learning Azure Data Engineer.docx
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Google на конференции Big Data Russia
Google на конференции Big Data RussiaGoogle на конференции Big Data Russia
Google на конференции Big Data Russia
 
The Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationThe Path to Data and Analytics Modernization
The Path to Data and Analytics Modernization
 
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONCisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 

Agile enterprise analytics on aws

  • 1. Agile Enterprise Analytics on AWS April 26, 2019 Copyright 2019 - Don Gillis Zapwerx
  • 2. Overview Enterprise data on average is growing at over 30% year over year, yet traditional analytics approaches have proven to be expensive and unyielding. The result is that a growing proportion of our data is unused “dark data”. However, there is an analytics “perfect storm” happening right now, to the benefit of enterprises that know how to harness its power: ● Open data formats ● Open source analytics ● Low cost cloud storage ● Rapid cloud innovation ● Low cost pay-as-you-go queries ● Easy to use Serverless components ● Cheap and Accessible Machine Learning and AI tools
  • 3. Being a Data Driven Organization Enabling evidence based decision making ● Data Agility ○ Widely Trained data literacy ○ Quickly iterate on questions, queries & analysis ● Data Access ○ In context integration ○ Wide availability ○ Lower cost scalability ● Data Governance ○ Centralized ○ Single sourced ○ Attributed & Controlled ● Data Community ○ Analytics for everyone ○ Shareable stories “...enabling numbers people with imagination and story people with discipline.” A. Damodaran
  • 4. Why a Data Lake? A data lake brings organization-wide discipline to data use and governance ● Data sources are defined, captured and maintained ● Data is initiated and updated automatically ● Data alignment & enrichment processes are explicit ● Data access authorization is defined, asserted, and audited. ● Data is accessible for ad-hoc enquiries ● Data is source-complete ● Data is portable and in well-known open formats.
  • 5. S3 Data Lake Strategy Tier 1 Raw data as received from batch or streaming data sources. Apply real-time analytics. Immediately process to Tier 2, then archived to low cost archival storage. Tier 2 Raw data optimized in structure and size. Ready for multiple tool access. Apply partition strategy. Optimize for file size. Apply a highly compressible columnar data format such as parquet or ORC, allowing casual queries. Purpose built and/or tool specific optimizations, views and applications. ● Redshift ● ElasticSearch ● Elastic MapReduce Tier 3 (...n) Data Catalog Single point of discovery, authorization and access control.
  • 6. S3 Data Lake Strategy Tier 1 Raw data as received from batch or streaming data sources. Apply real-time analytics. Immediately process to Tier 2, then archived to low cost archival storage. Tier 2 Raw data optimized in structure and size. Ready for multiple tool access. Apply partition strategy. Optimize for file size. Apply a highly compressible columnar data format such as parquet or ORC, allowing casual queries. Purpose built and/or tool specific optimizations, views and applications. ● Redshift ● ElasticSearch ● Elastic MapReduce Tier 3 (...n) Data Catalog Single point of discovery, authorization and access control.
  • 7.
  • 8. Key Solution Elements ● Data Lake - Centralize your data into a data lake flowing from raw to fully prepared, with each “Tier” having its defined purpose. ● Data Catalog - Establish a single point of data registration, discovery, access, and audit. ● Tiered Data Retention - Keep interesting Tier 1 raw data longterm for future uses ● Open Data Formats - at Tier 2, apply open standard columnar data formats for portability, discoverability, speed, and compression. Access using open source technologies like Hadoop, Presto, and columnar in-memory databases engines. ● Schema on Read - separating the schema from the data allows better portability, flexibility and agility ● Serverless Components - using serverless or managed components for data streaming, storage, and processing, making it both quick to experiment with and easy to scale.
  • 9. Approach ● Identify starter use case and its success measures ● Open Access - provide secure but wide access to your data lake allowing the business to: ○ Gain insights ○ Enhance visibility ○ Discover new data applications ○ Make better data driven decisions ○ Drive and measure business value ● Include a long-term AI strategy ○ Retaining raw data ○ Build and enhance your data acquisition strategy ○ Experiment with easy to use AI tools ● Build a plan to encourage adoption ○ Develop a change management & communication plan ○ Provide training and workshops for data analysts, builders, and users
  • 10. Security & Privacy ● Security - Be aware of the mounting liability of data privacy and security ○ Centralize point of authorization, access control, monitoring, and audit. ○ Use built-in encryption in transit and at rest ○ Build on cloud native security tools ○ Allow for GDPR type subject access requests ○ Use cloud native tooling for data protection ○ Use cloud native tooling for identity & access control, anomaly detection and response
  • 11. Technology ● Simplify Tooling ○ Your data is big and complex and growing ○ Your tooling should not add to the complexity ○ Use simple cloud native services and patterns ● Machine Learning - Experiment with the easy to use cloud tools for ○ Classifying unstructured data ○ Audio / Video recognition ○ Anomaly detection ○ Personalization ○ Recommendations
  • 12. Workshop Roles Participants The people who bring an understand your business, its data and goals. The people who will continue to develop, manage, and secure your enterprise analytics service. Facilitator Brings an understanding of modern enterprise analytics and how to implement it on Amazon Web Services (AWS). Bring a strategy of how to build your agile enterprise analytic service. This will act as a basis upon which we will refine your vision, and build your service startup plan. Uses techniques like Value Stream Analysis to define and refine processes.
  • 13. Workshop deliverables A Vision for your Agile Enterprise Analytics service within the context of your business, its data, and its goals. Vision Start-up Plan A start-up plan for implementing an initial Agile Enterprise Analytics service. Project Proposal A proposal with pricing and terms and conditions. Delivery Schedule A recommended delivery schedule to meet your needs.
  • 14. Beyond the Workshop What’s next... With your Data Lake taking form, it may be time to build your skills in the application of Machine Learning and AI. You can learn to build and maintain accurate models, deploy those models efficiently on AWS, and take full advantage of AI and machine learning to make better predictions faster and improve your bottom line.