© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or i...
Overview
Designing BI & big data solutions in the cloud
Not the only way to do it (but one that we have seen)
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
DataApp App
http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
Data has gravity
ComputeStorage Big Data
Data
http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
…and inertia at volume…
ComputeStorage Big Data
Data
http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
…easier to move applications to the data
ComputeStorage...
Courtesy http://techblog.netflix.com/2013/01/hadoop-platform-as-
service-in-cloud.html
S3 as a “single source of truth”
S3
Getting your Data into AWS
Amazon S3
Corporate Data
Center
• Console Upload
• FTP
• AWS Import Export
• S3 API
• Direct Co...
Write directly to a data source
Your application Amazon S3
DynamoDB
Any other data
store
Amazon S3
Amazon EC2
Queue, pre-process and then write
Amazon Simple
Queue Service
(SQS)
Amazon S3
DynamoDB
Any other data
store
Amazon SQS
Amazon S3
DynamoDB
Any SQL or NoSQL
Store
Log Aggregation
tools
Choose depending upon design
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Hadoop based Analysis
Amazon S3 Amazon
EMR
Amazon SQS
DynamoDB
Any SQL or NoSQL
Store
Log Aggregation
tools
EMR is Hadoop in the Cloud
Amazon Elastic MapReduce (EMR)?
EMR Cluster
S3
Put the data
into S3
Choose: Hadoop distribution,
# of nodes, types of nodes,
custom configs, Hive/Pig/etc....
Resize Nodes
EMR Cluster
You can easily add
and remove nodes
1 instance for 100 hours
=
100 instances for 1 hour
Small instance = $5.50
(including EMR – without: $4.40)
1 instance for 1000 hours
=
1000 instances for 1 hour
Small instance = $55
(including EMR – without: $44)
When you turn off your cloud resources, you
actually stop paying for them
SQL based processing
Amazon S3 Amazon
EMR
Amazon
Redshift
Pre-processing
framework
Petabyte scale
Columnar Data -
warehous...
Amazon Redshift is a fast and powerful, fully
managed, petabyte-scale data warehouse service
in the AWS cloud
What is Amaz...
Demo: Amazon Redshift
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Your choice of BI Tools
Amazon S3 Amazon
EMR
Amazon
Redshift
Pre-processing
framework
Amazon SQS
DynamoDB
Any SQL or NoSQL...
Demo
Jaspersoft as a BI Frontend
Sharing results and visualizations
Amazon S3 Amazon
EMR
Amazon
Redshift
Web App Server
Visualization tools
Amazon SQS
Dyna...
Sharing results and visualizations
Amazon S3 Amazon
EMR
Amazon
Redshift Business
Intelligence Tools
Business
Intelligence ...
Geospatial Visualizations
Amazon S3 Amazon
EMR
Amazon
Redshift Business
Intelligence Tools
Business
Intelligence Tools
GIS...
Rinse and Repeat
Amazon S3 Amazon
EMR
Amazon
Redshift
Visualization tools
Business
Intelligence Tools
Business
Intelligenc...
The complete architecture
Amazon S3 Amazon
EMR
Amazon
Redshift
Visualization tools
Business
Intelligence Tools
Business
In...
Real Time
Amazon Kinesis
• Real-time processing
• Massive scale
• Integrated
• Use cases:
• Real-time log analysis
• Real-time data ...
Amazon Kinesis Data Flow
Data
Sources
App.4
[Machine
Learning]
AWSEndpoint
App.1
[Aggregate
& De-
Duplicate]
Data
Sources
...
Use cases
SkillPages
Customer Use Case
Everyone Needs
Skilled People
At Home
At Work
In Life
Repeatedly
Data Architecture
Data Analyst
Raw Data
Get
Data
Join via Facebook
Add a Skill Page
Invite Friends
Web Servers Amazon S3
U...
We found that Amazon Redshift offers the
performance we needed while freeing us from
the licensing costs of our previous s...
Stack – analysis and sharing
ApplicationStack
Scala/Liftweb API Machines WWW Machines Batch Jobs
Scala Application code
Mo...
Everything that was a limited
resource
is now a programmable resource
• Hadoop Technology and Use Cases:
http://www.powerof60.com/
• http://aws.amazon.com/de
• Start with the Free Tier:
http:/...
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
Upcoming SlideShare
Loading in …5
×

AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS

1,218 views

Published on

Business intelligence is often described as a set of methodologies and technologies that transform raw data into meaningful and useful information for business purposes. But this simple description hides many technical challenges IT teams struggle with. This session will show how to build business intelligence applications leveraging AWS, from the raw data import, consumption and storage down to the information production. We will also cover best practices for services such as Amazon Redshift or Amazon RDS, and how to use applications such as SAP Hana, Jaspersoft and others.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,218
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
35
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • EMR supports multiple instance types including the latest HS1 instance types
    EMR now supports High Storage Instances (hs1.8xlarge) in US East. These new instances offer 48 TB of storage across 24 hard disk drives, 35 EC2 Compute Units (ECUs) of compute capacity, 117 GB of RAM, 10 Gbps networking, and 2.4+ GB per second of sequential I/O performance. High Storage Instances are ideally suited for Hadoop and they significantly reduce the cost of processing very large data sets on EMR. We look forward to adding support for High Storage Instances in additional regions early next year.
  • And the concept of adding nodes works well with hadoop – especially on the cloud since 10 nodes running for 10 hours costs the same as 100 nodes running for 1 hour.
  • Vertical scaling on commodity hardware. Perfect for Hadoop.
  • AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS

    1. 1. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Business Intelligence Applications on AWS Steffen Krause, Amazon Web Services @sk_bln
    2. 2. Overview Designing BI & big data solutions in the cloud Not the only way to do it (but one that we have seen)
    3. 3. Generation Collection & storage Analytics & computation Collaboration & sharing
    4. 4. Generation Collection & storage Analytics & computation Collaboration & sharing
    5. 5. DataApp App http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/ Data has gravity ComputeStorage Big Data
    6. 6. Data http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/ …and inertia at volume… ComputeStorage Big Data
    7. 7. Data http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/ …easier to move applications to the data ComputeStorage Big Data
    8. 8. Courtesy http://techblog.netflix.com/2013/01/hadoop-platform-as- service-in-cloud.html S3 as a “single source of truth” S3
    9. 9. Getting your Data into AWS Amazon S3 Corporate Data Center • Console Upload • FTP • AWS Import Export • S3 API • Direct Connect • Storage Gateway • 3rd Party Commercial Apps • Tsunami UDP
    10. 10. Write directly to a data source Your application Amazon S3 DynamoDB Any other data store Amazon S3 Amazon EC2
    11. 11. Queue, pre-process and then write Amazon Simple Queue Service (SQS) Amazon S3 DynamoDB Any other data store
    12. 12. Amazon SQS Amazon S3 DynamoDB Any SQL or NoSQL Store Log Aggregation tools Choose depending upon design
    13. 13. Generation Collection & storage Analytics & computation Collaboration & sharing
    14. 14. Hadoop based Analysis Amazon S3 Amazon EMR Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
    15. 15. EMR is Hadoop in the Cloud Amazon Elastic MapReduce (EMR)?
    16. 16. EMR Cluster S3 Put the data into S3 Choose: Hadoop distribution, # of nodes, types of nodes, custom configs, Hive/Pig/etc. Get the output from S3 Launch the cluster using the EMR console, CLI, SDK, or APIs You can also store everything in HDFS How does EMR work ?
    17. 17. Resize Nodes EMR Cluster You can easily add and remove nodes
    18. 18. 1 instance for 100 hours = 100 instances for 1 hour
    19. 19. Small instance = $5.50 (including EMR – without: $4.40)
    20. 20. 1 instance for 1000 hours = 1000 instances for 1 hour
    21. 21. Small instance = $55 (including EMR – without: $44)
    22. 22. When you turn off your cloud resources, you actually stop paying for them
    23. 23. SQL based processing Amazon S3 Amazon EMR Amazon Redshift Pre-processing framework Petabyte scale Columnar Data - warehouse Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
    24. 24. Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud What is Amazon Redshift ? Easy to provision and scale No upfront costs, pay as you go High performance at a low price Open and flexible with support for popular BI tools
    25. 25. Demo: Amazon Redshift
    26. 26. Generation Collection & storage Analytics & computation Collaboration & sharing
    27. 27. Your choice of BI Tools Amazon S3 Amazon EMR Amazon Redshift Pre-processing framework Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
    28. 28. Demo Jaspersoft as a BI Frontend
    29. 29. Sharing results and visualizations Amazon S3 Amazon EMR Amazon Redshift Web App Server Visualization tools Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
    30. 30. Sharing results and visualizations Amazon S3 Amazon EMR Amazon Redshift Business Intelligence Tools Business Intelligence Tools Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
    31. 31. Geospatial Visualizations Amazon S3 Amazon EMR Amazon Redshift Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Visualization tools Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
    32. 32. Rinse and Repeat Amazon S3 Amazon EMR Amazon Redshift Visualization tools Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Amazon data pipeline Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
    33. 33. The complete architecture Amazon S3 Amazon EMR Amazon Redshift Visualization tools Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Amazon data pipeline Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
    34. 34. Real Time
    35. 35. Amazon Kinesis • Real-time processing • Massive scale • Integrated • Use cases: • Real-time log analysis • Real-time data analytics • Social media monitoring • Financial transactions • Online machine learning
    36. 36. Amazon Kinesis Data Flow Data Sources App.4 [Machine Learning] AWSEndpoint App.1 [Aggregate & De- Duplicate] Data Sources Data Sources Data Sources App.2 [Metric Extraction] S3 DynamoDB Redshift App.3 [Sliding Window Analysis] Data Sources Availability Zone Shard 1 Shard 2 Shard N Availability Zone Availability Zone
    37. 37. Use cases
    38. 38. SkillPages Customer Use Case Everyone Needs Skilled People At Home At Work In Life Repeatedly
    39. 39. Data Architecture Data Analyst Raw Data Get Data Join via Facebook Add a Skill Page Invite Friends Web Servers Amazon S3 User Action Trace Events EMR Hive Scripts Process Content • Process log files with regular expressions to parse out the info we need. • Processes cookies into useful searchable data such as Session, UserId, API Security token. • Filters surplus info like internal varnish logging. Amazon S3 Aggregated Data Raw Events Internal Web Excel Tableau Amazon Redshift
    40. 40. We found that Amazon Redshift offers the performance we needed while freeing us from the licensing costs of our previous solution With Amazon Redshift and Tableau, anyone in the company can set up any queries they like—from how users are reacting to a feature, to growth by demographic or geography, to the impact sales efforts have had in different areas. It’s very flexible Jon Hoffman, Software Engineer, Foursquare 0 0.2 0.4 0.6 Female Male Gender 0 50 100 Age Foursquare Gorilla Coffee Gray's Papaya Amorino When do people go to a place?
    41. 41. Stack – analysis and sharing ApplicationStack Scala/Liftweb API Machines WWW Machines Batch Jobs Scala Application code Mongo/Postgres/Flat Files Databases Logs DataStack Amazon S3 Database Dumps Log Files Hadoop Elastic Map Reduce Hive/Ruby/Mahout Analytics Dashboard Map Reduce Jobs mongoexport postgres dump Flume
    42. 42. Everything that was a limited resource is now a programmable resource
    43. 43. • Hadoop Technology and Use Cases: http://www.powerof60.com/ • http://aws.amazon.com/de • Start with the Free Tier: http://aws.amazon.com/de/free/ • 25 US$ credits for new German customers: http://aws.amazon.com/de/campaigns/account/ • Twitter: @AWS_Aktuell • Facebook: http://www.facebook.com/awsaktuell • Webinars: http://aws.amazon.com/de/about-aws/events/ Resources

    ×