B3 - Business intelligence apps on aws
Upcoming SlideShare
Loading in...5
×
 

B3 - Business intelligence apps on aws

on

  • 527 views

...



Business intelligence is often described as a set of methodologies and technologies that transform raw data into meaningful and useful information for business purposes. But this simple description hides many technical challenges IT teams struggle with. This session will show how to build business intelligence applications leveraging AWS, from the raw data import, consumption and storage down to the information production. We will also cover best practices for services such as Amazon Redshift or Amazon RDS, and how to use applications such as SAP Hana, Jaspersoft and others.

Statistics

Views

Total Views
527
Views on SlideShare
526
Embed Views
1

Actions

Likes
0
Downloads
45
Comments
0

1 Embed 1

http://wiki.metrigroup.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

B3 - Business intelligence apps on aws B3 - Business intelligence apps on aws Presentation Transcript

  • © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Business Intelligence Applications on AWS Steffen Krause, Amazon Web Services @sk_bln
  • Overview Designing BI & big data solutions in the cloud Not the only way to do it (but one that we have seen)
  • Generation Collection & storage Analytics & computation Collaboration & sharing
  • Generation Collection & storage Analytics & computation Collaboration & sharing
  • Data  App   App   h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/   Data  has  gravity   Compute  Storage   Big  Data  
  • Data   App   App   h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/   latency   Throughput   …and  iner0a  at  volume…   Compute  Storage   Big  Data  
  • Data   h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/   …easier  to  move  applica0ons  to  the  data   Compute  Storage   Big  Data  
  • Courtesy http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in- cloud.html S3 as a “single source of truth” S3
  • Getting your Data into AWS Amazon S3 Corporate  Data   Center   •  Console Upload •  FTP •  AWS Import Export •  S3 API •  Direct Connect •  Storage Gateway •  3rd Party Commercial Apps •  Tsunami UDP
  • Write directly to a data source Your  applica+on   Amazon S3 DynamoDB   Any  other  data   store   Amazon S3 Amazon  EC2    
  • Queue, pre-process and then write Amazon  Simple   Queue  Service   (SQS)   Amazon S3 DynamoDB   Any  other  data   store  
  • Amazon  SQS   Amazon S3 DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools   Choose depending upon design
  • Generation Collection & storage Analytics & computation Collaboration & sharing
  • Hadoop based Analysis Amazon S3 Amazon EMR Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  • EMR is Hadoop in the Cloud Amazon Elastic MapReduce (EMR)?
  • EMR  Cluster S3 Put  the  data   into  S3   Choose:  Hadoop  distribuGon,   #  of  nodes,  types  of  nodes,   custom  configs,  Hive/Pig/etc.   Get  the  output   from  S3   Launch  the  cluster  using   the  EMR  console,  CLI,   SDK,  or  APIs   You  can  also   store  everything   in  HDFS   How does EMR work ?
  • Resize Nodes EMR Cluster You  can  easily  add   and  remove  nodes  
  • 1  instance  for  100  hours   =   100  instances  for  1  hour  
  • Small  instance  =  $5.50  (including  EMR  –  without:  $4.40)  
  • 1  instance  for  1000  hours   =   1000  instances  for  1  hour  
  • Small  instance  =  $55  (including  EMR  –  without:  $44)    
  • When  you  turn  off  your  cloud  resources,  you   actually  stop  paying  for  them  
  • SQL based processing Amazon S3 Amazon EMR Amazon Redshift Pre-processing framework Petabyte scale Columnar Data - warehouse Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  • Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud What is Amazon Redshift ? Easy to provision and scale No upfront costs, pay as you go High performance at a low price Open and flexible with support for popular BI tools
  • Demo: Amazon Redshift
  • Generation Collection & storage Analytics & computation Collaboration & sharing
  • Your choice of BI Tools Amazon S3 Amazon EMR Amazon Redshift Pre-processing framework Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  • Demo Jaspersoft as a BI Frontend
  • Sharing results and visualizations Amazon S3 Amazon EMR Amazon Redshift Web App Server Visualization tools Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  • Sharing results and visualizations Amazon S3 Amazon EMR Amazon Redshift Business Intelligence Tools Business Intelligence Tools Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  • Geospatial Visualizations Amazon S3 Amazon EMR Amazon Redshift Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Visualization tools Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  • Rinse and Repeat Amazon S3 Amazon EMR Amazon Redshift Visualization tools Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Amazon data pipeline Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  • The complete architecture Amazon S3 Amazon EMR Amazon Redshift Visualization tools Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Amazon data pipeline Amazon  SQS   DynamoDB   Any  SQL  or  NoSQL   Store   Log  Aggrega+on     tools  
  • Real Time
  • Amazon Kinesis •  Real-time processing •  Massive scale •  Integrated •  Use cases: •  Real-time log analysis •  Real-time data analytics •  Social media monitoring •  Financial transactions •  Online machine learning
  • Amazon Kinesis Data Flow Data Sources App.4     [Machine   Learning]   AWS  Endpoint   App.1     [Aggregate  &   De-­‐ Duplicate]   Data Sources Data Sources Data Sources App.2     [Metric   ExtracGon]   S3 DynamoDB   Redshift App.3   [Sliding   Window   Analysis]   Data Sources Availability Zone Shard 1 Shard 2 Shard N Availability Zone Availability Zone
  • Use cases
  • SkillPages Customer Use Case Everyone Needs Skilled People At Home At Work In Life Repeatedly
  • Who they are What they can do Your real life connections to them Examples of what they can do
  • Data Architecture Data Analyst Raw Data Get Data Join via Facebook Add a Skill Page Invite Friends Web Servers Amazon S3 User Action Trace Events EMR Hive Scripts Process Content •  Process log files with regular expressions to parse out the info we need. •  Processes cookies into useful searchable data such as Session, UserId, API Security token. •  Filters surplus info like internal varnish logging. Amazon S3 Aggregated Data Raw Events Internal Web Excel Tableau Amazon Redshift
  • We  found  that  Amazon  Redshi^  offers  the   performance  we  needed  while  freeing  us  from   the  licensing  costs  of  our  previous  soluGon   With  Amazon  Redshi^  and  Tableau,  anyone  in  the   company  can  set  up  any  queries  they  like—from   how  users  are  reacGng  to  a  feature,  to  growth  by   demographic  or  geography,  to  the  impact  sales   efforts  have  had  in  different  areas.  It’s  very   flexible   Jon  Hoffman,  So<ware  Engineer,  Foursquare   0 0.2 0.4 0.6 Female Male Gender 0 20 40 60 80 Age Foursquare Gorilla Coffee Gray's Papaya Amorino When do people go to a place?
  • Stack – analysis and sharing ApplicationStack Scala/Liftweb API Machines WWW Machines Batch Jobs Scala Application code Mongo/Postgres/ Flat Files Databases Logs DataStack Amazon S3 Database Dumps Log Files Hadoop Elastic Map Reduce Hive/Ruby/ Mahout Analytics Dashboard Map Reduce Jobs mongoexport postgres dump Flume
  • Everything that was a limited resource is now a programmable resource
  • •  Hadoop Technology and Use Cases: http://www.powerof60.com/ •  http://aws.amazon.com/de •  Start with the Free Tier: http://aws.amazon.com/de/free/ •  25 US$ credits for new German customers: http://aws.amazon.com/de/campaigns/account/ •  Twitter: @AWS_Aktuell •  Facebook: http://www.facebook.com/awsaktuell •  Webinars: http://aws.amazon.com/de/about-aws/events/ Resources