SlideShare a Scribd company logo
1 of 43
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Greg Khairallah, Business Development Manager, AWS
Malini Saxena, Senior Consultant, AWS
Raj Chary, VP of Technology / Architecture, WagglePractice
Lige Hensley, Chief Technology Officer, Ivy Tech
June 20, 2016
Easy Analytics with AWS
What to expect from this session
• AWS toolkit for analytics
• Understand stakeholders
• Demo
• Case Study – WagglePractice
• Case Study – Ivy Tech
• Q&A
AnalyzeStore
Amazon
Glacier
Amazon
S3
Amazon
DynamoDB
Amazon RDS,
Amazon Aurora
Big data portfolio—but what do I recommend?
AWS Data Pipeline
Amazon
CloudSearch
Amazon
EMR
Amazon EC2
Amazon
Redshift
Amazon
Machine
Learning
Amazon
Elasticsearch
Service
AWS Database
Migration
Amazon
Kinesis
Analytics
Amazon Kinesis
Firehose
AWS Import/Export
AWS Direct
Connect
Collect
Amazon Kinesis
Streams
Amazon
QuickSight
Match toolset to right persona
• Business intelligence (BI) analyst
• Primary tool is SQL
• Historical data resides in data warehouse such as
Amazon Redshift
• Data scientist—Uses programmatic languages such as R or
Python
• Application developer—Requires API to integrate with AWS
services
BI analyst
BI analyst with existing BI tools
BI Analyst
BI tools
Amazon EC2
Amazon Redshift
QuickSight API
• Primary tool is SQL
• Data is largely structured with well known data sources
• Primary concern is fast, consistent performance
• Need to extend SQL with custom functions
BI tools
Amazon EC2
Amazon QuickSight
Amazon QuickSight
Amazon Redshift system architecture
Leader node
• SQL endpoint
• Stores metadata
• Coordinates query execution
Compute nodes
• Local, columnar storage
• Execute queries in parallel
• Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB, Amazon EMR, or SSH
Two hardware platforms
• Optimized for data processing
• DS2: HDD; scale from 2 TB to 2 PB
• DC1: SSD; scale from 160 GB to 356 TB
10 GigE
(HPC)
JDBC/ODBC
New SQL functions
We add SQL functions regularly to expand Amazon Redshift’s query capabilities
Added 25+ window and aggregate functions since launch, including:
LISTAGG
[APPROXIMATE] COUNT
DROP IF EXISTS, CREATE IF NOT EXISTS
REGEXP_SUBSTR, _COUNT, _INSTR, _REPLACE
PERCENTILE_CONT, _DISC, MEDIAN
PERCENT_RANK, RATIO_TO_REPORT
We’ll continue iterating but also want to enable you to write your own
Window function examples: http://docs.aws.amazon.com/redshift/latest/dg/r_Window_function_examples.html
Scalar user defined functions
You can write UDFs using Python 2.7
• Syntax is largely identical to PostgreSQL UDF
• Python execution is performed in parallel
• System and network calls within UDFs are prohibited
Comes integrated with Pandas, NumPy, SciPy, DateUtil, and
Pytz analytic libraries
• Import your own libraries for even more flexibility
• Take advantage of thousands of functions available through Python
libraries to perform operations not easily expressed in SQL
A very fast, cloud-powered, business
intelligence service for 1/10 the cost of
traditional BI software
What is Amazon QuickSight?
Business
User
Business
User
QuickSight
APIQuickSight UI
Mobile Devices Web Browsers
Partner BI Products
MetadataData PrepConnectors SuggestionsSPICE
Amazon
S3
Amazon
Kinesis
Amazon
DynamoDB
Amazon EMRAmazon
Redshift
Amazon RDSFiles Third-party
Data scientist
Data scientist with existing toolsets
Data scientist Toolkits like SAS or
R Studio installed
with Amazon EC2
Unstructured data
Amazon S3
Structured data
Amazon Redshift
• Work with unstructured datasets
• Use existing toolsets to connect to Amazon Redshift
Querying Amazon Redshift with R packages
• RJDBC—Supports SQL queries
• dplyr—Uses R code for data
analysis
• RPostgreSQL—R compliant
driver or database Interface (DBI)R User
R Studio
Amazon
EC2
Unstructured data
Amazon S3
User profile
Amazon RDS
Amazon Redshift
Connecting R with Amazon Redshift blog post: https://blogs.aws.amazon.com/bigdata/post/Tx1G8828SPGX3PK/Connecting-R-with-Amazon-Redshift
Querying Amazon Redshift with R packages example
Application developer
Application developers can build smart
applications using Amazon Machine Learning
Structured data/predictions
Amazon Redshift
Generate/query
predictions
Amazon QuickSight
Application
Amazon Machine
Learning
Visualize
• All skill levels
• Amazon Machine Learning technology is accessed through APIs and SDKs
• Embed visualizations in applications
Demo
Raj Chary, WagglePractice
Vice President of Technology/Architecture
Smart, responsive practice
Math and ELA (Grades 2-8)
Provides students the right
challenge at the right time
What is Waggle?
Right Challenge, Right Time
Waggle looks for more than
correct answers. Waggle
continually analyzes each
student’s decisions and
progress. That way, students get
tougher material right when
they’re ready.
What is Waggle?
Productive Struggle
Waggle motivates students to
push themselves forward. How?
Through helpful hints,
supportive feedback, and
achievement badges that build
grit and confidence.
What is Waggle?
Constructive Grouping Waggle’s
insights means you can easily
group students together based
on learning needs. All without
sacrificing the quality of
individual instruction.
What is Waggle?
Waggle: Product Demo
• Data Creators
 Differentiated learning experience
 Fun and engaging
• Data Visualizers
 Seamless integration with application
 Analytics with a Story
 Actionable Data
Redshift: Data Warehouse Layout
Write Cluster
Compute – dw2.large
Redshift
Read Cluster
Compute – dw2.large
Redshift
History Cluster
Density – dw1.xlarge
Redshift
Initial and
Increment
al
{processed
} data
loads
Periodic Data
Snapshots for
historical analysis
Data
sources
For serving Jaspersoft
reports
APIs
OLTP
S3 COPY
S3 UnLoad
and Load
S3 UnLoad
and Load
Data mart
(aggregations)
NodesNodes
Staging
Datamart
(aggregations)
Nodes
S3 UnLoad and
Load
S3 UnLoad and
Load
+ UPSERTS
Results and Lessons Learned
• Performance Metrics
– Millions of records are processed in <1 minute
• LOAD/UNLOAD commands | UPSERTS | S3 COPY Command
– Report queries average < 1 to ~1.5 seconds
– {compression} – gained 20+% efficiencies in data retrieval
• Best Practices
– {sort keys} – lens-based data model: visualize data in variety of ways
– {commit stats} – Redshift is not a transactional system
– {nested loops} – no Cartesian products, ensure joins well managed
– {queries that queue} – tune the WLM configuration
– {query runtimes} – faster query means less queuing
– {stats missing} – analyze and vacuum when possible
– {alerts with tables} – monitor to ensure queries running optimally
Thank You
Ivy Tech & Amazon Redshift
May 25, 2016
• Transforming the culture of the College to be more data driven
• Moving from reporting silos to an Integrated Analytics system, we call
this a Data Democracy
• Collecting and analyzing a vast variety of data at a scale that no one
in Higher Ed is doing
• Using machine learning tools to identify students who may need
further assistance
• Starting this fall, we are implementing a one-on-one coaching
initiative for the students we identified with the machine learning tools
What We’re Doing
96% of organizations in the United States
use data in the same way.
…and it’s wrong.
But it’s not just education…
The “Standard” Approach
VIP
Relevant Data for Everyone
Data Regimes
Data Dictatorship: Data is controlled and its use is restricted. There
is asymmetric distribution of information based on your position
Data Aristocracy: Data analysts, scientists and PhDs are needed to
do anything meaningful. Power concentrates in the hands of these
employees and their supervisors
Data Anarchy: Business users feel underserved and take matters into
their own hands. They create “shadow IT” systems and work around
the “unresponsive” IT group
Data Democracy: Everybody gets timely and equitable access to
data. Line of business users are empowered and “own” the data.
Executives and IT get out of the way
1 Shash Hegde, Mariner, “The Rise of Data Regimes”, 9/12/13, http://www.mariner-usa.com/rise-data-regimes/ (image substitution for Mao Zedong)
Every organization moves through
increasingly complex stages of data
accessibility.
Data Maturity Model
… very few complete the transition to
Integrated Analytics
Stage 1: Report Silos
Request
Tracker
Banner Blackboard Luminis StarfishSCCM CAS
Authentication
This is what we have had for
decades at Ivy Tech…
Request
Tracker
Banner Blackboard Luminis StarfishSCCM CAS
Authentication
Stage 2: Data Warehousing
This is what
most
companies
do…
but we are
taking this a
step further…
Stage 3: Integrated Analytics
Request
Tracker
Banner Blackboard Luminis StarfishSCCM CAS
Authentication
Students by
Financial
Aid
Students
by
Award
Students
by
Term
Students
by
Class
Classes by
Class
Section
Students
These curated collections of
data are designed to enable
direct access to...
…the data you need, regardless of
where it came from. Quickly.
Easily.
GPA Graduation—Cumulative
Graduation Grade Point Average (Cumulative) is an indication of a student's academic progress for all
semester credit classes for all registered terms up to and including the selected term. Letter grades are
assigned points (A=4, B=3, C=2, D=1, F=0) and the GPA is calculated by taking the number of grade
points a student earned in a selected period of time divided by the total number of classes taken during
that same period.
GPA Graduation Cumulative = Sum of a student's total grade points earned in credit classes for all
classes for all registered terms up to and including the selected term / Sum of student's total classes
taken during that same period
NOTES ON USING THIS TERM: GPA Graduation - Cumulative does not include grades from remedial
classes.
Related Terms: [GPA Graduation - Term]
Questions?
Resources
Amazon Redshift Getting Started Guide:
http://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html
Scalar UDF Documentation: http://docs.aws.amazon.com/redshift/latest/dg/user-defined-
functions.html
Introduction to Python UDFs in Amazon Redshift:
https://blogs.aws.amazon.com/bigdata/post/Tx1IHV1G67CY53T/Introduction-to-Python-UDFs-in-
Amazon-Redshift
Connecting R with Amazon Redshift:
https://blogs.aws.amazon.com/bigdata/post/Tx1G8828SPGX3PK/Connecting-R-with-Amazon-
Redshift
Databricks Apache Spark–Amazon Redshift Tutorial: https://github.com/databricks/spark-
redshift/tree/master/tutorial
Amazon ML Getting Started Guide: https://aws.amazon.com/machine-learning/getting-started/
Amazon QuickSight (Preview Registration): https://aws.amazon.com/quicksight/
Thank you!

More Related Content

What's hot

SID305 AWS Certificate Manager Private CA
SID305 AWS Certificate Manager Private CASID305 AWS Certificate Manager Private CA
SID305 AWS Certificate Manager Private CAAmazon Web Services
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
 
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...Amazon Web Services
 
Best practices on building data lakes and lake formation
Best practices on building data lakes and lake formationBest practices on building data lakes and lake formation
Best practices on building data lakes and lake formationJohn Varghese
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
AWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAmazon Web Services
 
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)Amazon Web Services
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudAmazon Web Services
 
Advanced Security Best Practices Masterclass
Advanced Security Best Practices MasterclassAdvanced Security Best Practices Masterclass
Advanced Security Best Practices MasterclassAmazon Web Services
 
Data Science in Manufacturing and Automation
Data Science in Manufacturing and AutomationData Science in Manufacturing and Automation
Data Science in Manufacturing and AutomationRavishankar Rajagopalan
 
HPC on Azure for Reserach
HPC on Azure for ReserachHPC on Azure for Reserach
HPC on Azure for ReserachJürgen Ambrosi
 
Building Modern Streaming Analytics with Confluent on AWS
Building Modern Streaming Analytics with Confluent on AWSBuilding Modern Streaming Analytics with Confluent on AWS
Building Modern Streaming Analytics with Confluent on AWSconfluent
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxSwathiPonugumati
 
Getting Started with AWS Database Migration Service
Getting Started with AWS Database Migration ServiceGetting Started with AWS Database Migration Service
Getting Started with AWS Database Migration ServiceAmazon Web Services
 

What's hot (20)

ElastiCache & Redis
ElastiCache & RedisElastiCache & Redis
ElastiCache & Redis
 
SID305 AWS Certificate Manager Private CA
SID305 AWS Certificate Manager Private CASID305 AWS Certificate Manager Private CA
SID305 AWS Certificate Manager Private CA
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
 
Best practices on building data lakes and lake formation
Best practices on building data lakes and lake formationBest practices on building data lakes and lake formation
Best practices on building data lakes and lake formation
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
AWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices Webinar
 
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
 
Advanced Security Best Practices Masterclass
Advanced Security Best Practices MasterclassAdvanced Security Best Practices Masterclass
Advanced Security Best Practices Masterclass
 
AWS Tagging Strategy
AWS Tagging StrategyAWS Tagging Strategy
AWS Tagging Strategy
 
Data Science in Manufacturing and Automation
Data Science in Manufacturing and AutomationData Science in Manufacturing and Automation
Data Science in Manufacturing and Automation
 
HPC on Azure for Reserach
HPC on Azure for ReserachHPC on Azure for Reserach
HPC on Azure for Reserach
 
Building Modern Streaming Analytics with Confluent on AWS
Building Modern Streaming Analytics with Confluent on AWSBuilding Modern Streaming Analytics with Confluent on AWS
Building Modern Streaming Analytics with Confluent on AWS
 
Amazon S3 Masterclass
Amazon S3 MasterclassAmazon S3 Masterclass
Amazon S3 Masterclass
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
Getting Started with AWS Database Migration Service
Getting Started with AWS Database Migration ServiceGetting Started with AWS Database Migration Service
Getting Started with AWS Database Migration Service
 

Similar to Easy Analytics with AWS Tools and Case Studies

March 2016 PHXTUG Meeting
March 2016 PHXTUG MeetingMarch 2016 PHXTUG Meeting
March 2016 PHXTUG MeetingMichael Perillo
 
AWS 클라우드를 통한 교육 및 연구 혁신 - AWS Summit Seoul 2017
AWS 클라우드를 통한 교육 및 연구 혁신 - AWS Summit Seoul 2017AWS 클라우드를 통한 교육 및 연구 혁신 - AWS Summit Seoul 2017
AWS 클라우드를 통한 교육 및 연구 혁신 - AWS Summit Seoul 2017Amazon Web Services Korea
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists CCG
 
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...Blackboard APAC
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxGautamPopli1
 
AWS Initiate Day Dublin 2019 – Big Data Meets AI
AWS Initiate Day Dublin 2019 – Big Data Meets AIAWS Initiate Day Dublin 2019 – Big Data Meets AI
AWS Initiate Day Dublin 2019 – Big Data Meets AIAmazon Web Services
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAmazon Web Services
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIAmazon Web Services
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data ScienceAjay Ohri
 
교육의 진화, 클라우드는 어떤 역할을 하는가 :: Vincent Quah :: AWS Summit Seoul 2016
교육의 진화, 클라우드는 어떤 역할을 하는가 :: Vincent Quah :: AWS Summit Seoul 2016교육의 진화, 클라우드는 어떤 역할을 하는가 :: Vincent Quah :: AWS Summit Seoul 2016
교육의 진화, 클라우드는 어떤 역할을 하는가 :: Vincent Quah :: AWS Summit Seoul 2016Amazon Web Services Korea
 
Modern Data Warehousing with Amazon Redshift
Modern Data Warehousing with Amazon RedshiftModern Data Warehousing with Amazon Redshift
Modern Data Warehousing with Amazon RedshiftAmazon Web Services
 
Next Generation Education: Technology in the Classroom and Beyond
Next Generation Education: Technology in the Classroom and BeyondNext Generation Education: Technology in the Classroom and Beyond
Next Generation Education: Technology in the Classroom and BeyondAmazon Web Services
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Betacowork
 
Data and AI in education
Data and AI in educationData and AI in education
Data and AI in educationJisc
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Ali Alkan
 

Similar to Easy Analytics with AWS Tools and Case Studies (20)

March 2016 PHXTUG Meeting
March 2016 PHXTUG MeetingMarch 2016 PHXTUG Meeting
March 2016 PHXTUG Meeting
 
AWS 클라우드를 통한 교육 및 연구 혁신 - AWS Summit Seoul 2017
AWS 클라우드를 통한 교육 및 연구 혁신 - AWS Summit Seoul 2017AWS 클라우드를 통한 교육 및 연구 혁신 - AWS Summit Seoul 2017
AWS 클라우드를 통한 교육 및 연구 혁신 - AWS Summit Seoul 2017
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists
 
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
AWS Initiate Day Dublin 2019 – Big Data Meets AI
AWS Initiate Day Dublin 2019 – Big Data Meets AIAWS Initiate Day Dublin 2019 – Big Data Meets AI
AWS Initiate Day Dublin 2019 – Big Data Meets AI
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
교육의 진화, 클라우드는 어떤 역할을 하는가 :: Vincent Quah :: AWS Summit Seoul 2016
교육의 진화, 클라우드는 어떤 역할을 하는가 :: Vincent Quah :: AWS Summit Seoul 2016교육의 진화, 클라우드는 어떤 역할을 하는가 :: Vincent Quah :: AWS Summit Seoul 2016
교육의 진화, 클라우드는 어떤 역할을 하는가 :: Vincent Quah :: AWS Summit Seoul 2016
 
Modern Data Warehousing with Amazon Redshift
Modern Data Warehousing with Amazon RedshiftModern Data Warehousing with Amazon Redshift
Modern Data Warehousing with Amazon Redshift
 
Next Generation Education: Technology in the Classroom and Beyond
Next Generation Education: Technology in the Classroom and BeyondNext Generation Education: Technology in the Classroom and Beyond
Next Generation Education: Technology in the Classroom and Beyond
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Data and AI in education
Data and AI in educationData and AI in education
Data and AI in education
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Easy Analytics with AWS Tools and Case Studies

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Greg Khairallah, Business Development Manager, AWS Malini Saxena, Senior Consultant, AWS Raj Chary, VP of Technology / Architecture, WagglePractice Lige Hensley, Chief Technology Officer, Ivy Tech June 20, 2016 Easy Analytics with AWS
  • 2. What to expect from this session • AWS toolkit for analytics • Understand stakeholders • Demo • Case Study – WagglePractice • Case Study – Ivy Tech • Q&A
  • 3. AnalyzeStore Amazon Glacier Amazon S3 Amazon DynamoDB Amazon RDS, Amazon Aurora Big data portfolio—but what do I recommend? AWS Data Pipeline Amazon CloudSearch Amazon EMR Amazon EC2 Amazon Redshift Amazon Machine Learning Amazon Elasticsearch Service AWS Database Migration Amazon Kinesis Analytics Amazon Kinesis Firehose AWS Import/Export AWS Direct Connect Collect Amazon Kinesis Streams Amazon QuickSight
  • 4. Match toolset to right persona • Business intelligence (BI) analyst • Primary tool is SQL • Historical data resides in data warehouse such as Amazon Redshift • Data scientist—Uses programmatic languages such as R or Python • Application developer—Requires API to integrate with AWS services
  • 6. BI analyst with existing BI tools BI Analyst BI tools Amazon EC2 Amazon Redshift QuickSight API • Primary tool is SQL • Data is largely structured with well known data sources • Primary concern is fast, consistent performance • Need to extend SQL with custom functions BI tools Amazon EC2 Amazon QuickSight Amazon QuickSight
  • 7. Amazon Redshift system architecture Leader node • SQL endpoint • Stores metadata • Coordinates query execution Compute nodes • Local, columnar storage • Execute queries in parallel • Load, backup, restore via Amazon S3; load from Amazon DynamoDB, Amazon EMR, or SSH Two hardware platforms • Optimized for data processing • DS2: HDD; scale from 2 TB to 2 PB • DC1: SSD; scale from 160 GB to 356 TB 10 GigE (HPC) JDBC/ODBC
  • 8. New SQL functions We add SQL functions regularly to expand Amazon Redshift’s query capabilities Added 25+ window and aggregate functions since launch, including: LISTAGG [APPROXIMATE] COUNT DROP IF EXISTS, CREATE IF NOT EXISTS REGEXP_SUBSTR, _COUNT, _INSTR, _REPLACE PERCENTILE_CONT, _DISC, MEDIAN PERCENT_RANK, RATIO_TO_REPORT We’ll continue iterating but also want to enable you to write your own Window function examples: http://docs.aws.amazon.com/redshift/latest/dg/r_Window_function_examples.html
  • 9. Scalar user defined functions You can write UDFs using Python 2.7 • Syntax is largely identical to PostgreSQL UDF • Python execution is performed in parallel • System and network calls within UDFs are prohibited Comes integrated with Pandas, NumPy, SciPy, DateUtil, and Pytz analytic libraries • Import your own libraries for even more flexibility • Take advantage of thousands of functions available through Python libraries to perform operations not easily expressed in SQL
  • 10. A very fast, cloud-powered, business intelligence service for 1/10 the cost of traditional BI software What is Amazon QuickSight?
  • 11. Business User Business User QuickSight APIQuickSight UI Mobile Devices Web Browsers Partner BI Products MetadataData PrepConnectors SuggestionsSPICE Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon EMRAmazon Redshift Amazon RDSFiles Third-party
  • 13. Data scientist with existing toolsets Data scientist Toolkits like SAS or R Studio installed with Amazon EC2 Unstructured data Amazon S3 Structured data Amazon Redshift • Work with unstructured datasets • Use existing toolsets to connect to Amazon Redshift
  • 14. Querying Amazon Redshift with R packages • RJDBC—Supports SQL queries • dplyr—Uses R code for data analysis • RPostgreSQL—R compliant driver or database Interface (DBI)R User R Studio Amazon EC2 Unstructured data Amazon S3 User profile Amazon RDS Amazon Redshift Connecting R with Amazon Redshift blog post: https://blogs.aws.amazon.com/bigdata/post/Tx1G8828SPGX3PK/Connecting-R-with-Amazon-Redshift
  • 15. Querying Amazon Redshift with R packages example
  • 17. Application developers can build smart applications using Amazon Machine Learning Structured data/predictions Amazon Redshift Generate/query predictions Amazon QuickSight Application Amazon Machine Learning Visualize • All skill levels • Amazon Machine Learning technology is accessed through APIs and SDKs • Embed visualizations in applications
  • 18. Demo
  • 19.
  • 20. Raj Chary, WagglePractice Vice President of Technology/Architecture
  • 21. Smart, responsive practice Math and ELA (Grades 2-8) Provides students the right challenge at the right time What is Waggle?
  • 22. Right Challenge, Right Time Waggle looks for more than correct answers. Waggle continually analyzes each student’s decisions and progress. That way, students get tougher material right when they’re ready. What is Waggle?
  • 23. Productive Struggle Waggle motivates students to push themselves forward. How? Through helpful hints, supportive feedback, and achievement badges that build grit and confidence. What is Waggle?
  • 24. Constructive Grouping Waggle’s insights means you can easily group students together based on learning needs. All without sacrificing the quality of individual instruction. What is Waggle?
  • 25. Waggle: Product Demo • Data Creators  Differentiated learning experience  Fun and engaging • Data Visualizers  Seamless integration with application  Analytics with a Story  Actionable Data
  • 26. Redshift: Data Warehouse Layout Write Cluster Compute – dw2.large Redshift Read Cluster Compute – dw2.large Redshift History Cluster Density – dw1.xlarge Redshift Initial and Increment al {processed } data loads Periodic Data Snapshots for historical analysis Data sources For serving Jaspersoft reports APIs OLTP S3 COPY S3 UnLoad and Load S3 UnLoad and Load Data mart (aggregations) NodesNodes Staging Datamart (aggregations) Nodes S3 UnLoad and Load S3 UnLoad and Load + UPSERTS
  • 27. Results and Lessons Learned • Performance Metrics – Millions of records are processed in <1 minute • LOAD/UNLOAD commands | UPSERTS | S3 COPY Command – Report queries average < 1 to ~1.5 seconds – {compression} – gained 20+% efficiencies in data retrieval • Best Practices – {sort keys} – lens-based data model: visualize data in variety of ways – {commit stats} – Redshift is not a transactional system – {nested loops} – no Cartesian products, ensure joins well managed – {queries that queue} – tune the WLM configuration – {query runtimes} – faster query means less queuing – {stats missing} – analyze and vacuum when possible – {alerts with tables} – monitor to ensure queries running optimally
  • 28.
  • 30. Ivy Tech & Amazon Redshift May 25, 2016
  • 31. • Transforming the culture of the College to be more data driven • Moving from reporting silos to an Integrated Analytics system, we call this a Data Democracy • Collecting and analyzing a vast variety of data at a scale that no one in Higher Ed is doing • Using machine learning tools to identify students who may need further assistance • Starting this fall, we are implementing a one-on-one coaching initiative for the students we identified with the machine learning tools What We’re Doing
  • 32. 96% of organizations in the United States use data in the same way. …and it’s wrong. But it’s not just education…
  • 34. Relevant Data for Everyone
  • 35. Data Regimes Data Dictatorship: Data is controlled and its use is restricted. There is asymmetric distribution of information based on your position Data Aristocracy: Data analysts, scientists and PhDs are needed to do anything meaningful. Power concentrates in the hands of these employees and their supervisors Data Anarchy: Business users feel underserved and take matters into their own hands. They create “shadow IT” systems and work around the “unresponsive” IT group Data Democracy: Everybody gets timely and equitable access to data. Line of business users are empowered and “own” the data. Executives and IT get out of the way 1 Shash Hegde, Mariner, “The Rise of Data Regimes”, 9/12/13, http://www.mariner-usa.com/rise-data-regimes/ (image substitution for Mao Zedong)
  • 36. Every organization moves through increasingly complex stages of data accessibility. Data Maturity Model … very few complete the transition to Integrated Analytics
  • 37. Stage 1: Report Silos Request Tracker Banner Blackboard Luminis StarfishSCCM CAS Authentication This is what we have had for decades at Ivy Tech…
  • 38. Request Tracker Banner Blackboard Luminis StarfishSCCM CAS Authentication Stage 2: Data Warehousing This is what most companies do… but we are taking this a step further…
  • 39. Stage 3: Integrated Analytics Request Tracker Banner Blackboard Luminis StarfishSCCM CAS Authentication Students by Financial Aid Students by Award Students by Term Students by Class Classes by Class Section Students These curated collections of data are designed to enable direct access to... …the data you need, regardless of where it came from. Quickly. Easily.
  • 40. GPA Graduation—Cumulative Graduation Grade Point Average (Cumulative) is an indication of a student's academic progress for all semester credit classes for all registered terms up to and including the selected term. Letter grades are assigned points (A=4, B=3, C=2, D=1, F=0) and the GPA is calculated by taking the number of grade points a student earned in a selected period of time divided by the total number of classes taken during that same period. GPA Graduation Cumulative = Sum of a student's total grade points earned in credit classes for all classes for all registered terms up to and including the selected term / Sum of student's total classes taken during that same period NOTES ON USING THIS TERM: GPA Graduation - Cumulative does not include grades from remedial classes. Related Terms: [GPA Graduation - Term]
  • 42. Resources Amazon Redshift Getting Started Guide: http://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html Scalar UDF Documentation: http://docs.aws.amazon.com/redshift/latest/dg/user-defined- functions.html Introduction to Python UDFs in Amazon Redshift: https://blogs.aws.amazon.com/bigdata/post/Tx1IHV1G67CY53T/Introduction-to-Python-UDFs-in- Amazon-Redshift Connecting R with Amazon Redshift: https://blogs.aws.amazon.com/bigdata/post/Tx1G8828SPGX3PK/Connecting-R-with-Amazon- Redshift Databricks Apache Spark–Amazon Redshift Tutorial: https://github.com/databricks/spark- redshift/tree/master/tutorial Amazon ML Getting Started Guide: https://aws.amazon.com/machine-learning/getting-started/ Amazon QuickSight (Preview Registration): https://aws.amazon.com/quicksight/

Editor's Notes

  1. In this session, we will start by looking at current trends in data-driven development, then look at how Redshift is evolving from a traditional data warehouse and expanding its capabilities to handle more sophisticated data analytics use cases. We will then discuss how to build advanced data driven applications with Amazon Machine Learning and use Amazon QuickSight for visualization. Finally, Radhika will take over and do a brief demo and then we will open up the session for Q&A. Just to set expectations, this is not a deep dive in any of the services. There are SKO sessions on service deep dives and roadmaps.
  2. This is the AWS Big Data portfolio. We have tools like Direct Connect and Import Export that can bring in a lot of data. We can persist that data into a number of storage services from S3 to DynamoDB to EMR and RedShift for further analysis. Amazon Redshift provides a fast, fully managed, petabyte-scale data warehouse for less than $1000 per terabyte per year. Amazon Elastic MapReduce provides a managed, easy to use analytics platform built around the powerful Hadoop framework. Recently we announced Amazon Kinesis, a managed service for real-time processing of streaming big data. Amazon Glacier allows you to backup and archive an unlimited amount of data at just 1 cent per GB per month. Automate and schedule big data processing workloads with Data Pipeline. The tools to support big data collection, computation along with collaboration and sharing are all available in a couple of clicks, with AWS.
  3. These personas are not hard and fast rules but a good grouping of categories of our users. We can expect a single individual could wear multiple hats. We also want to emphasize the flexibility that AWS provides to mat h the use case to a broad, rich set of analytic tools. Contrast this to existing solutions that require a large capital expense that locks in future questions to previous platform decisions
  4. Thank you Greg. Let’s start by looking at the BI Analyst Again, just keep in mind these personas are general characterizations and may have significant overlap—depending on whatever project you are working on.
  5. What do BI Analysts do? Just to level set, they spend most of their time analyzing data to figure out market and business trends for companies with a goal to increase profits and efficiency. For example, they analyze past trends, and current conditions, and then communicate those trends using reports. A typical use case here is analyzing sales data for a retail company to optimize profits. For these types of use cases like historical analysis, Amazon Redshift is a great platform for such workloads. A BI Analysts can connect to Redshift using three options: They can install their existing BI Tools on EC2 and connect to Redshift We recently announced Amazon QuickSight, a low cost BI tool that connects to all your data in the cloud including Redshift We are working with our BI partners like Tableau, TIBCO Jaspersoft to integrate Amazon Quicksight and Greg will discuss QuickSight in more detail. BI Analysts primarily use SQL to manipulate data. With Redshift, they can extend SQL with new Redshift SQL functions and add custom functions via user defined functions or (UDFs). Lets look at how Redshift is evolving from a traditional data warehouse and expanding its capabilities to handle more sophisticated data analytics use cases.
  6. Redshift has a massively parallel processing (MPP) data warehouse architecture. Redshift automatically distributes data and query load across all the compute nodes to take advantage of all available resources. This means that all queries are executed in parallel across all the physical resources. You can extend Native SQL with Python UDFs or custom libraries. Python’s compiled byte code also runs in parallel across all your compute nodes. This horizontal scale out architecture makes it it easy to add nodes to your data warehouse and enables you to maintain fast query performance as your data warehouse grows. Redshift is a MPP—horizontally scalable Allows paralleization Native SQL or extension with python When the leader node receives a SQL query, it generates compiled C++ code. This compiled code is then broadcast to the compute nodes and executed in parallel. Python and UDFs also get executed at the compute nodes for parallelization. Python is an interpreted language. The generated python byte code is also distributed across the compute nodes and executed in parallel just as a normal SQL query. Keep in mind that compiled code executes faster than interpreted code and uses less compute capacity. When benchmarking your queries, you should always compare the times for the second execution of a query, because the first execution time includes the overhead of compiling the code. Python is a great language for data manipulation and analysis, but the programs are often a bottleneck when consuming data from large data warehouses. Python is an interpreted language, so for large data sets you may find yourself playing “tricks” with the language to scale across multiple processes in order to distribute the workload. In Amazon Redshift, the Python logic is pushed across the MPP system and all the scaling is handled by AWS. The Python execution in Amazon Redshift is done in parallel just as a normal SQL query, so Amazon Redshift will take advantage of all of the CPU cores in your cluster to execute your UDFs.  They may not be as fast as native SQL, which is compiled to machine code, but the automated management of parallelism will alleviate  many of the struggles associated with scaling the performance of your own Python code. Python and UDFs gets executed at the compute nodes for parallelization C++ code—does the same the same thing Interpreted and has the ability—scalability to overcome interpretation but otherwise the full python interpreter is available to you and it runs as compiled byte code in parallel on all the compute nodes. When benchmarking your queries, you should always compare the times for the second execution of a query, because the first execution time includes the overhead of compiling the code. Redshift is a columnar, multi-node, shared nothing cluster, database service There’s a leader node that you connect with SQL clients Data resides in the compute nodes and that’s where all the work happens All the work happens in parallel directly through the compute nodes Since this is advanced Redshift session, we will not focus on the details. There’s a Getting Started guide for Redshift and we will provide a link at the end of the webinar.
  7. We’ve been iterating quickly and we add SQL functions regularly This is because we want to expand the range of analytics you can do in SQL We have added over 25 since launch - new aggregate and window functions List Agg is quite popular with our customers - The idea here is that you can aggregate multiple values into a single string which is quite powerful We also added the Approximate count function for very large datasets REGEX capabilities and so on… We are going to continue iterating here but we also want to enable you to add your own functionality to Redshift, and so we’ve added … ----
  8. Scalar User Defined Functions. UDFs provide more flexible querying and analysis, enabling a broader range of analytic and data science use cases. You can write your custom functions in python 2.7 The syntax is very similar to what you can do with PostgreSQL UDF We do sandbox the UDFs so you can’t make filesystem or network calls, but otherwise the full python interpreter is available to you and it runs as compiled byte code in parallel on all the compute nodes. The python library in Redshift comes with Pandas, NumPy, Scipy for analytics, so those functions are available to you for use, but you can also bring in your own libraries. You also have the flexibility to import your own libraries into Redshift ---
  9. Time to first insight from months to minutes Goes from GBs to Terabytes Very fast response to queries Cost down to 1/10th Fast to get started—sign up and go
  10. The title Data Scientiest is realatively and has been around for only a few years. It was actually coined in 2008 by data and analytics leads at LinkedIn and Facebook. This title has come to replace data analyst.
  11. A Data Scientist can have formal training in a quantitavie field but they can also emerge from any field that has a strong data and computational focus. A data scientists’ most basic, universal skill is the ability to write code, but they also have a solid foundation in math, statistics, probability, and computer science. They work with complex datasets and they are good at converting masses of unstructured data into structured data, but most importantly, getting it into a form that can be analyzed. Often they are creative in displaying information visually and making the patterns they find clear and compelling. Data Scientists can use their existing toolsets to access Redshift. Let’s look at some toolsets that data scientists use to analyze data.
  12. Similarly, you can install packages in R to query Redshift. For example, you can install the RJDBC package to load Redshift’s JDBC driver and then send SQL queries to Amazon Redshift. The difference here between python UDFs and R in this scenario is that you have to take data out of the database to do your analysis. You can also write R code using the dplyr package to analyze your data in Redshift. First, you have to create a connection to Redshift via the RPostgreSQL package. Once connect R to Redshift, you have the flexibility to add predictive modeling using Rs packages. There’s a big data blog that shows you how to Connect R with Redshift in detail.
  13. Now that we better understand the role of a Data Engineer, lets switch gears and discuss the Application Developer persona.
  14. Yesterday, there was an article on The Wall Street Journal, and Hilary Mason, CEO and founder of Fast Forward Labs, which advises companies on data science and machine learning, observed that many many companies are now built on data and cannot exist without it. “Companies doing well … are building products using unique, proprietary data. Data and analytics has become a new competency for developers. All developers are doing something with data in addition to their web, mobile or backend development efforts. Developers now treat instrumenting and collecting events as one of the most important parts of the development process. And because of this, applications are logging tremendous amounts of data and so developers are picking up the skills to analyze this data to derive some value from it and to build useful applications on top of the data the application generates. Beyond historical analysis and knowing what is happening here, organizations are using the data they already have to make accurate, actionable predictions about what will happen in the future. They are building a new breed of smart applications using these predictions. As you can see, Machine learning is becoming an increasingly important tool to build advanced data-driven applications. The primary emphasis is that these application developers are not experts in machine learning but want to extend their application with ML constructs through an API.
  15. As airline passengers we always want to book flights when the likelihood of flight delays is minimum and with carriers that historically have the least amount of delays . In this demo we analyze those two use case by using historical flight and weather data to develop predictive model for flight delays using Amazon Analytic services . Before we take a look at a demo , let’s take a quick look at the architecture behind the solution. We begin the exercise by downloading publicly accessible flight and weather data and staging it in a s3 bucket . The datasets are then copied into a redshift cluster .We then use ML to create and develop models using redshift as a Datasource. We deploy a sample training model and store predictions in s3 bucket . Then we run batch predictions on the full data set based on that model we just created and finally use Quicksight for visualization. First step , We create a redshift cluster. This is the redshift cluster creation screen , once you create the cluster you would connect to it and load the raw data from a s3 bucket into it using the copy command. Here is the schema that I used. Here we have created a UDF in Redshift that calculates the numbers of days from a public holiday , This is important because we want to know the impact of holiday on the flight delays . We are using that in close_to_holiday column You will also see an example of copy command that can be used to copy data from s3 into redshift. Now let’s start with Machine learning . We start with creating a new Data source object and machine learning model using Redshift as a datasource. We need to fill in cluster detail ,as you see it identifies my demo cluster ,we then specify a IAM role , specify a staging location for ML model ,and issue a query against the redshift cluster and the table that contains my training data . After validation we are presented the schema , you can allow Amazon Machine learning to infer the data types or, you can modify these if needed ,the next step is to select a target which is the attribute that ML is learning to predict which is dep_delay attribute for us and you can optionally select a row identifier to include with your prediction.. we review the setting and create a DS object. Next we create a machine learning Model , here you see ML automatically detects this to be a binary classification model because the target attribute (flight_delay) is binary . It is attached to the right data source .We now review the settings and create a machine learning model. you can also control the training process by setting parameters as model size , number of passes you want over your training data . In this case we take the default values and review settings and create the model. When the model is ready , first thing you want to do is look at how it performed . For binary classification model , the prediction accuracy is defined using an industry standard area under the curve score . As you can see we received a score of 0.76 which is considered good . The actual output of a ML algorithm is a prediction score ,lets take a deeper look into model performance .so a model such as this is bound to have errors , false positives and false negatives . We want to reduce the number of false positives because that means people are going to miss their flights .We can do that by simply moving the slider to a threshold where it makes sense for our business case. for example if you look at target distribution report you will notice we have 13K rows corresponding to no delays and 8 k Reports corresponding to delay . Similarly you would also notice how the different attributes correlate t o your target, for example departure hour correlates to delay more than the day of the week. For numerical attributes , Machine learning provides information like mean median values , and neat histograms to understand how the data is distributed . Once you are done with that , we are ready to create the batch predictions.To create batch predictions we first need the model we want ti deploy. We will use the model we just created. In the second step we need test data , I have already specified a test data here , and finally you want a place where your predictions are stored in this case S3 bucket . You review your setting and you create your batch predictions . When batch predictions are ready ,their output is in s3 bucket that you can download , uncompress and take a peek at it . As you can see the results consists of a small number of columns including the prediction score given by ML and a best answer which is calculated by comparing the prediction score based on the cutoff value . Once we have the predictions ready to go , we can visualize them using Quicksight . I copied the predictions back to redshift cluster , connected the quick sight to redshift cluster and created couple of visualizations. steps are fairly straightforward .I am simply entering my cluster details here and I am on my way to create a connection were I have a graph where time Vs number of flights delayed and the We can see 6 pm is what you want to avoid . With that we encourage you to create your own analysis and applications with Machine learning , Redshift and Quicksight and I turn it over to Greg to wrap things up. Thank you !