0
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or i...
Overview
Designing BI & big data solutions in the cloud
Not the only way to do it (but one that we have seen)
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Data	
  App	
   App	
  
h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/	
  
Data	
  has	
  gravity	
 ...
Data	
  
App	
   App	
  
h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/	
  
latency	
   Throughput	
...
Data	
  
h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/	
  
…easier	
  to	
  move	
  applica0ons	
  ...
Courtesy
http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-
cloud.html
S3 as a “single source of truth”
S3
Getting your Data into AWS
Amazon S3
Corporate	
  Data	
  
Center	
  
•  Console Upload
•  FTP
•  AWS Import Export
•  S3 ...
Write directly to a data source
Your	
  applica+on	
   Amazon S3
DynamoDB	
  
Any	
  other	
  data	
  
store	
  
Amazon S3...
Queue, pre-process and then write
Amazon	
  Simple	
  
Queue	
  Service	
  
(SQS)	
  
Amazon S3
DynamoDB	
  
Any	
  other	...
Amazon	
  SQS	
  
Amazon S3
DynamoDB	
  
Any	
  SQL	
  or	
  NoSQL	
  
Store	
  
Log	
  Aggrega+on	
  	
  
tools	
  
Choos...
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Hadoop based Analysis
Amazon S3 Amazon
EMR
Amazon	
  SQS	
  
DynamoDB	
  
Any	
  SQL	
  or	
  NoSQL	
  
Store	
  
Log	
  A...
EMR is Hadoop in the Cloud	

Amazon Elastic MapReduce (EMR)?
EMR	
  Cluster
S3
Put	
  the	
  data	
  
into	
  S3	
  
Choose:	
  Hadoop	
  distribuGon,	
  
#	
  of	
  nodes,	
  types	
...
Resize Nodes
EMR Cluster
You	
  can	
  easily	
  add	
  
and	
  remove	
  nodes	
  
1	
  instance	
  for	
  100	
  hours	
  
=	
  
100	
  instances	
  for	
  1	
  hour	
  
Small	
  instance	
  =	
  $5.50	
  (including	
  EMR	
  –	
  without:	
  $4.40)	
  
1	
  instance	
  for	
  1000	
  hours	
  
=	
  
1000	
  instances	
  for	
  1	
  hour	
  
Small	
  instance	
  =	
  $55	
  (including	
  EMR	
  –	
  without:	
  $44)	
  
	
  
When	
  you	
  turn	
  off	
  your	
  cloud	
  resources,	
  you	
  
actually	
  stop	
  paying	
  for	
  them	
  
SQL based processing
Amazon S3 Amazon
EMR
Amazon
Redshift
Pre-processing
framework
Petabyte scale
Columnar Data -
warehous...
Amazon Redshift is a fast and powerful, fully
managed, petabyte-scale data warehouse service
in the AWS cloud
What is Amaz...
Demo: Amazon Redshift
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Your choice of BI Tools
Amazon S3 Amazon
EMR
Amazon
Redshift
Pre-processing
framework
Amazon	
  SQS	
  
DynamoDB	
  
Any	
...
Demo
Jaspersoft as a BI Frontend
Sharing results and visualizations
Amazon S3 Amazon
EMR
Amazon
Redshift
Web App Server
Visualization tools
Amazon	
  SQS	
...
Sharing results and visualizations
Amazon S3 Amazon
EMR
Amazon
Redshift Business
Intelligence Tools
Business
Intelligence ...
Geospatial Visualizations
Amazon S3 Amazon
EMR
Amazon
Redshift Business
Intelligence Tools
Business
Intelligence Tools
GIS...
Rinse and Repeat
Amazon S3 Amazon
EMR
Amazon
Redshift
Visualization tools
Business
Intelligence Tools
Business
Intelligenc...
The complete architecture
Amazon S3 Amazon
EMR
Amazon
Redshift
Visualization tools
Business
Intelligence Tools
Business
In...
Real Time
Amazon Kinesis
•  Real-time processing
•  Massive scale
•  Integrated
•  Use cases:
•  Real-time log analysis
•  Real-time...
Amazon Kinesis Data Flow
Data
Sources
App.4	
  
	
  
[Machine	
  
Learning]	
  
AWS	
  Endpoint	
  
App.1	
  
	
  
[Aggreg...
Use cases
SkillPages
Customer Use Case
Everyone Needs
Skilled People
At Home
At Work
In Life
Repeatedly
Who they are
What they can do
Your real life connections to them
Examples of what they can do
Data Architecture
Data Analyst
Raw Data
Get
Data
Join via Facebook
Add a Skill Page
Invite Friends
Web Servers Amazon S3
U...
We	
  found	
  that	
  Amazon	
  Redshi^	
  offers	
  the	
  
performance	
  we	
  needed	
  while	
  freeing	
  us	
  from...
Stack – analysis and sharing
ApplicationStack
Scala/Liftweb API Machines WWW Machines Batch Jobs
Scala Application code
Mo...
Everything that was a limited
resource
is now a programmable resource
•  Hadoop Technology and Use Cases:
http://www.powerof60.com/
•  http://aws.amazon.com/de
•  Start with the Free Tier:
htt...
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on aws
Upcoming SlideShare
Loading in...5
×

B3 - Business intelligence apps on aws

638

Published on



Business intelligence is often described as a set of methodologies and technologies that transform raw data into meaningful and useful information for business purposes. But this simple description hides many technical challenges IT teams struggle with. This session will show how to build business intelligence applications leveraging AWS, from the raw data import, consumption and storage down to the information production. We will also cover best practices for services such as Amazon Redshift or Amazon RDS, and how to use applications such as SAP Hana, Jaspersoft and others.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
638
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
51
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "B3 - Business intelligence apps on aws"

  1. 1. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Business Intelligence Applications on AWS Steffen Krause, Amazon Web Services @sk_bln
  2. 2. Overview Designing BI & big data solutions in the cloud Not the only way to do it (but one that we have seen)
  3. 3. Generation Collection & storage Analytics & computation Collaboration & sharing
  4. 4. Generation Collection & storage Analytics & computation Collaboration & sharing
  5. 5. Data  App   App   h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/   Data  has  gravity   Compute  Storage   Big  Data  
  6. 6. Data   App   App   h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/   latency   Throughput   …and  iner0a  at  volume…   Compute  Storage   Big  Data  
  7. 7. Data   h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/   …easier  to  move  applica0ons  to  the  data   Compute  Storage   Big  Data  
  8. 8. Courtesy http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in- cloud.html S3 as a “single source of truth” S3
  9. 9. Getting your Data into AWS Amazon S3 Corporate  Data   Center   •  Console Upload •  FTP •  AWS Import Export •  S3 API •  Direct Connect •  Storage Gateway •  3rd Party Commercial Apps •  Tsunami UDP
  10. 10. Write directly to a data source Your  applica+on   Amazon S3 DynamoDB   Any  other  data   store   Amazon S3 Amazon  EC2    
  11. 11. Queue, pre-process and then write Amazon  Simple   Queue  Service   (SQS)   Amazon S3 DynamoDB   Any  other  data   store  
  12. 12. Amazon  SQS   Amazon S3 DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools   Choose depending upon design
  13. 13. Generation Collection & storage Analytics & computation Collaboration & sharing
  14. 14. Hadoop based Analysis Amazon S3 Amazon EMR Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  15. 15. EMR is Hadoop in the Cloud Amazon Elastic MapReduce (EMR)?
  16. 16. EMR  Cluster S3 Put  the  data   into  S3   Choose:  Hadoop  distribuGon,   #  of  nodes,  types  of  nodes,   custom  configs,  Hive/Pig/etc.   Get  the  output   from  S3   Launch  the  cluster  using   the  EMR  console,  CLI,   SDK,  or  APIs   You  can  also   store  everything   in  HDFS   How does EMR work ?
  17. 17. Resize Nodes EMR Cluster You  can  easily  add   and  remove  nodes  
  18. 18. 1  instance  for  100  hours   =   100  instances  for  1  hour  
  19. 19. Small  instance  =  $5.50  (including  EMR  –  without:  $4.40)  
  20. 20. 1  instance  for  1000  hours   =   1000  instances  for  1  hour  
  21. 21. Small  instance  =  $55  (including  EMR  –  without:  $44)    
  22. 22. When  you  turn  off  your  cloud  resources,  you   actually  stop  paying  for  them  
  23. 23. SQL based processing Amazon S3 Amazon EMR Amazon Redshift Pre-processing framework Petabyte scale Columnar Data - warehouse Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  24. 24. Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud What is Amazon Redshift ? Easy to provision and scale No upfront costs, pay as you go High performance at a low price Open and flexible with support for popular BI tools
  25. 25. Demo: Amazon Redshift
  26. 26. Generation Collection & storage Analytics & computation Collaboration & sharing
  27. 27. Your choice of BI Tools Amazon S3 Amazon EMR Amazon Redshift Pre-processing framework Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  28. 28. Demo Jaspersoft as a BI Frontend
  29. 29. Sharing results and visualizations Amazon S3 Amazon EMR Amazon Redshift Web App Server Visualization tools Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  30. 30. Sharing results and visualizations Amazon S3 Amazon EMR Amazon Redshift Business Intelligence Tools Business Intelligence Tools Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  31. 31. Geospatial Visualizations Amazon S3 Amazon EMR Amazon Redshift Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Visualization tools Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  32. 32. Rinse and Repeat Amazon S3 Amazon EMR Amazon Redshift Visualization tools Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Amazon data pipeline Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  33. 33. The complete architecture Amazon S3 Amazon EMR Amazon Redshift Visualization tools Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Amazon data pipeline Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  34. 34. Real Time
  35. 35. Amazon Kinesis •  Real-time processing •  Massive scale •  Integrated •  Use cases: •  Real-time log analysis •  Real-time data analytics •  Social media monitoring •  Financial transactions •  Online machine learning
  36. 36. Amazon Kinesis Data Flow Data Sources App.4     [Machine   Learning]   AWS  Endpoint   App.1     [Aggregate  &   De-­‐ Duplicate]   Data Sources Data Sources Data Sources App.2     [Metric   ExtracGon]   S3 DynamoDB   Redshift App.3   [Sliding   Window   Analysis]   Data Sources Availability Zone Shard 1 Shard 2 Shard N Availability Zone Availability Zone
  37. 37. Use cases
  38. 38. SkillPages Customer Use Case Everyone Needs Skilled People At Home At Work In Life Repeatedly
  39. 39. Who they are What they can do Your real life connections to them Examples of what they can do
  40. 40. Data Architecture Data Analyst Raw Data Get Data Join via Facebook Add a Skill Page Invite Friends Web Servers Amazon S3 User Action Trace Events EMR Hive Scripts Process Content •  Process log files with regular expressions to parse out the info we need. •  Processes cookies into useful searchable data such as Session, UserId, API Security token. •  Filters surplus info like internal varnish logging. Amazon S3 Aggregated Data Raw Events Internal Web Excel Tableau Amazon Redshift
  41. 41. We  found  that  Amazon  Redshi^  offers  the   performance  we  needed  while  freeing  us  from   the  licensing  costs  of  our  previous  soluGon   With  Amazon  Redshi^  and  Tableau,  anyone  in  the   company  can  set  up  any  queries  they  like—from   how  users  are  reacGng  to  a  feature,  to  growth  by   demographic  or  geography,  to  the  impact  sales   efforts  have  had  in  different  areas.  It’s  very   flexible   Jon  Hoffman,  So<ware  Engineer,  Foursquare   0 0.2 0.4 0.6 Female Male Gender 0 20 40 60 80 Age Foursquare Gorilla Coffee Gray's Papaya Amorino When do people go to a place?
  42. 42. Stack – analysis and sharing ApplicationStack Scala/Liftweb API Machines WWW Machines Batch Jobs Scala Application code Mongo/Postgres/ Flat Files Databases Logs DataStack Amazon S3 Database Dumps Log Files Hadoop Elastic Map Reduce Hive/Ruby/ Mahout Analytics Dashboard Map Reduce Jobs mongoexport postgres dump Flume
  43. 43. Everything that was a limited resource is now a programmable resource
  44. 44. •  Hadoop Technology and Use Cases: http://www.powerof60.com/ •  http://aws.amazon.com/de •  Start with the Free Tier: http://aws.amazon.com/de/free/ •  25 US$ credits for new German customers: http://aws.amazon.com/de/campaigns/account/ •  Twitter: @AWS_Aktuell •  Facebook: http://www.facebook.com/awsaktuell •  Webinars: http://aws.amazon.com/de/about-aws/events/ Resources
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×