Agile enterprise analytics on aws

Agile
Enterprise Analytics
on AWS
April 26, 2019
Copyright 2019 - Don Gillis Zapwerx

Overview
Enterprise data on average is growing at over 30% year over year, yet
traditional analytics approaches have proven to be expensive and
unyielding. The result is that a growing proportion of our data is unused
“dark data”.
However, there is an analytics “perfect storm” happening right now, to the
beneﬁt of enterprises that know how to harness its power:
● Open data formats
● Open source analytics
● Low cost cloud storage
● Rapid cloud innovation
● Low cost pay-as-you-go queries
● Easy to use Serverless components
● Cheap and Accessible Machine Learning and AI tools

Being a Data Driven Organization
Enabling evidence based decision making
● Data Agility
○ Widely Trained data literacy
○ Quickly iterate on questions, queries &
analysis
● Data Access
○ In context integration
○ Wide availability
○ Lower cost scalability
● Data Governance
○ Centralized
○ Single sourced
○ Attributed & Controlled
● Data Community
○ Analytics for everyone
○ Shareable stories
“...enabling numbers
people with
imagination and
story people with
discipline.”
A. Damodaran

Why a Data Lake?
A data lake brings organization-wide discipline to data use and governance
● Data sources are deﬁned, captured and maintained
● Data is initiated and updated automatically
● Data alignment & enrichment processes are explicit
● Data access authorization is deﬁned, asserted, and audited.
● Data is accessible for ad-hoc enquiries
● Data is source-complete
● Data is portable and in well-known open formats.

S3 Data Lake Strategy
Tier 1
Raw data as received from
batch or streaming data
sources.
Apply real-time analytics.
Immediately process to
Tier 2, then archived to
low cost archival storage.
Tier 2
Raw data optimized in
structure and size. Ready
for multiple tool access.
Apply partition strategy.
Optimize for ﬁle size.
Apply a highly
compressible columnar
data format such as
parquet or ORC, allowing
casual queries.
Purpose built and/or tool
speciﬁc optimizations,
views and applications.
● Redshift
● ElasticSearch
● Elastic MapReduce
Tier 3 (...n)
Data Catalog
Single point of discovery, authorization and access control.

Key Solution Elements
● Data Lake - Centralize your data into a data lake flowing from raw to fully prepared,
with each “Tier” having its defined purpose.
● Data Catalog - Establish a single point of data registration, discovery, access, and audit.
● Tiered Data Retention - Keep interesting Tier 1 raw data longterm for future uses
● Open Data Formats - at Tier 2, apply open standard columnar data formats for
portability, discoverability, speed, and compression. Access using open source
technologies like Hadoop, Presto, and columnar in-memory databases engines.
● Schema on Read - separating the schema from the data allows better portability,
flexibility and agility
● Serverless Components - using serverless or managed components for data streaming,
storage, and processing, making it both quick to experiment with and easy to scale.

Approach
● Identify starter use case and its success measures
● Open Access - provide secure but wide access to your data lake allowing the business to:
○ Gain insights
○ Enhance visibility
○ Discover new data applications
○ Make better data driven decisions
○ Drive and measure business value
● Include a long-term AI strategy
○ Retaining raw data
○ Build and enhance your data acquisition strategy
○ Experiment with easy to use AI tools
● Build a plan to encourage adoption
○ Develop a change management & communication plan
○ Provide training and workshops for data analysts, builders, and users

Security & Privacy
● Security - Be aware of the mounting liability of data privacy and security
○ Centralize point of authorization, access control, monitoring, and audit.
○ Use built-in encryption in transit and at rest
○ Build on cloud native security tools
○ Allow for GDPR type subject access requests
○ Use cloud native tooling for data protection
○ Use cloud native tooling for identity & access control, anomaly detection and response

Technology
● Simplify Tooling
○ Your data is big and complex and growing
○ Your tooling should not add to the complexity
○ Use simple cloud native services and patterns
● Machine Learning - Experiment with the easy to use cloud tools for
○ Classifying unstructured data
○ Audio / Video recognition
○ Anomaly detection
○ Personalization
○ Recommendations

Workshop Roles
Participants
The people who bring an understand your
business, its data and goals.
The people who will continue to develop,
manage, and secure your enterprise analytics
service.
Facilitator
Brings an understanding of modern
enterprise analytics and how to implement it
on Amazon Web Services (AWS).
Bring a strategy of how to build your agile
enterprise analytic service. This will act as a
basis upon which we will refine your vision,
and build your service startup plan.
Uses techniques like Value Stream Analysis
to define and refine processes.

Workshop deliverables
A Vision for your Agile Enterprise Analytics
service within the context of your business, its data,
and its goals.
Vision
Start-up Plan
A start-up plan for implementing an initial Agile
Enterprise Analytics service.
Project Proposal A proposal with pricing and terms and conditions.
Delivery Schedule
A recommended delivery schedule to meet your
needs.

Beyond the Workshop
What’s next...
With your Data Lake taking form, it may be time to build your skills in the
application of Machine Learning and AI. You can learn to build and maintain
accurate models, deploy those models eﬃciently on AWS, and take full
advantage of AI and machine learning to make better predictions faster and
improve your bottom line.

Agile enterprise analytics on aws

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Agile enterprise analytics on aws

Similar to Agile enterprise analytics on aws (20)

Recently uploaded

Recently uploaded (20)

Agile enterprise analytics on aws