In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon Machine Learning (Amazon ML) services work together to build a successful data lake for various roles, including data scientists and business users.
17. ”
“ • Public sector company evaluates properties across
Canada for use in establishing taxes
• Migrated off traditional IT architecture to AWS for
greater speed and agility
• Main valuation engine now runs 5,000 percent faster at
one-tenth the cost of previous architecture
• Developers release new features every one to two
weeks instead of three to six months in the past
MPAC Valuation Engine Runs 5,000% Faster Using AWS
MPAC provides a property assessment system for
Canada. It is based in Pickering, Ontario.
Nicole McNeill, Chief Financial Officer
”
“AWS has had a transformational
effect on our business, enabling
us to serve our business clients
better and faster than we ever
have before.
Data lake infrastructure
& management
23. Big data processing with Apache Spark & Hadoop
with Amazon EMR
Easy to use notebooks
Low cost vs on-premises
Elastic automatic scaling
Reliable 99.9% SLA
Secure with encryption and keys
Flexible, open source choice
Analytics
Enterprise-grade Easy Lowest cost
25. ”
“
National Bank of Canada Uses AWS to Generate New Revenue
National Bank of Canada is a leading Canadian
financial services organization.
The speed and performance
of AWS are impressive.
Data-manipulation processes
that took days are now down
to one minute.
• Wanted to more easily scale its data analysis platform
• Runs data analysis using the TickVault platform on
the AWS Cloud
• Scales to process and analyze hundreds of terabytes
of financial data
• Conducts data manipulations in one minute instead of
days
• Optimizes its trading operations and generates more
revenue
Pascal Bergeron
Director of Algorithmic Trading
”
“
Analytics
27. Data warehouse for business reporting
with Amazon Redshift
Fast: Up to 10x faster than traditional
data warehouses
Easy to set up, deploy, and manage
Cost-effective
Scale on-demand for large data
volume and high query concurrency
Query data in open formats directly
from the data lake
Analytics
28. CHALLENGE
Needed to analyze data to find
insights, identify opportunities, and
evaluate business performance.
The Oracle DW did not scale, was
difficult to maintain, and costly.
SOLUTION
Deployed a data lake with Amazon S3,
and run analytics with Amazon
Redshift, Amazon Redshift Spectrum,
and Amazon EMR.
Result: They doubled the data stored
(100PB), lowered costs, and was able
to gain insights faster.
50 PB of data
600,000 analytics jobs/day
Analytics
29. Real-time analytics for timely insights
with Amazon Kinesis
Make streaming data available to
multiple real-time analytics applications
Run streaming applications without
managing any infrastructure
Durable to reduce the probability
of data loss
Scalable to process data from hundreds
of thousands of sources with low latencies
Analytics
30. Operational analytics for logs and search
with Amazon Elasticsearch
Fully managed; deploy
production-ready cluster
in minutes
Direct access to Elasticsearch
open-source APIs, Logstash,
and Kibana
VPC support; at-rest and
in-transit encryption
Scale up and down easily
Analytics
31. Interactive analysis
with Amazon Athena
Interactive query service to analyze data in
Amazon S3 using standard SQL
No infrastructure to set up or manage, and
no data to load
Ability to run SQL queries on data archived
in Amazon S3 Glacier
(coming soon)
Analytics
34. Visual insights for everyone
with Amazon QuickSight
Pay only for what you use
Scale to tens of thousands of users
Embedded analytics
Build end-to-end BI solutions
Visualization &
machine learning
35. Advanced insights for everyone
With Amazon ML & AI services
Frameworks and interfaces for
machine learning practitioners
Platform services that make it
easy for any developer to get
started and get deep with ML
Application services that enable
developers to plugin pre-built
AI functionality into their apps
Visualization &
machine learning
Amazon S3
raw data Initial training data
is annotated by
human labelers
Active learning model
is trained from human
labeled data
Ambiguous data is sent to human
labelers for annotation
Human labeled data is then sent
back to retrain and improve the
machine learning model
Training data the
model understands is
automatically labeled
An accurate training dataset
is ready for use in
Amazon SageMaker
36. Central 1 & AWS
2019-Nov-05
Paul Save, Product Manager, Data Science
38. What is a Credit Union?
Full service
financial
institutions
Use a
cooperative
business
model
Safe and
stable –
provincially
regulated &
insured
Owned by
their
members –
people who
bank with
them
Profits have a
purpose – to
benefit the
people they were
built to serve
Boards are
elected
members from
local
community
44. Engage the right people at the right time
PrecariousActive Member
Optimal Retainment
Period
Asset loss
Churn
Feature
Feature
Feature
Next best
action
Probability * Cost
Jane
45. Complexity & Value of the models
~600 features created 15 models explored to determine
appropriateness for retention
10,000+ models trained
for tuning
>90% accuracy
on predictions
Continuous Improvement of the
model with marketing results data
51. WATCH VIDEO >>
Enhancing the
fan experience
One week of NFL games now creates 3TB
of data. NFL uses Amazon SageMaker to
analyze telemetry data to predict plays.
Computations that could take months to
refine now take only weeks or days.
Visualization &
machine learning