• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
AWS Activate Webinar - Growing on AWS
 

AWS Activate Webinar - Growing on AWS

on

  • 462 views

Growth Hacking on AWS

Growth Hacking on AWS

Statistics

Views

Total Views
462
Views on SlideShare
335
Embed Views
127

Actions

Likes
4
Downloads
13
Comments
0

1 Embed 127

http://www.scoop.it 127

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    AWS Activate Webinar - Growing on AWS AWS Activate Webinar - Growing on AWS Presentation Transcript

    • Growing on Amazon Web Services Abhishek Sinha Amazon Web Services @abysinha
    • Our  Journey  Today
    • Growth  Hacking Growth  hacking  is  a  marke9ng  technique  developed  by  technology   startups  which  uses  crea9vity,  analy9cal  thinking,  and  social  metrics  to   sell  products  and  gain  exposure At  Airbnb,  we  look  into  all  possible  ways  to   improve  our  product  and  user  experience.   OCen  9mes  this  involves  lots  of  analy9cs   behind  the  scene.”
    • Learn  and   Iterate   MVP  Hypothesis  
    • Learn  and   Iterate   MVP  Hypothesis   Hosts  with  professional  photography  will  get  more  business.   And  hosts  will  sign  up  for  professional  photography  as  a  service.”   Build  a  MVP  –  20  Photographers     Saw  the  proverbial  “Hockey  SEck”  
    • Airbnb  then  scaled  the  Idea •  Professional  Photography  Services   •  Increased  the  requirements  of  Photo  Quality   •  Watermarked  Photos  for  authen@city   •  Key  Metrics  Tracked  –  “Shoots  per  month”   •  April  2012  –  5000  shoots  per  month   •  Growth  can  some@mes  come  from  unexpected  areas    
    • Our  Journey  Today
    • Growth  hacking  is  a  marke9ng   technique  developed  by  technology   startups  which  uses  crea9vity,   analy9cal  thinking,  and  social  metrics   to  sell  products  and  gain  exposure BUILD-­‐MEASURE-­‐LEARN The  fundamental  ac9vity  of  a   startup  is  to  turn  ideas  into   products,  measure  how  customers   respond,  and  then  learn  whether   to  pivot  or  persevere.  All  successful   startup  processes  should  be   geared  to  accelerate  that  feedback   loop.
    • In  a  startup,  the  purpose  of  analy@cs  is   to  iterate  to  product/market  fit  before   the  money  runs  out   -­‐  Lean  analy@cs  by  Alistair  Croll  and  Ben  Yoskowitz  
    • Our  Journey  Today Metrics Lean
    • What  do  these  metrics  look  like  ? Depends  upon  what  stage  your  startup  is  at     And  what  is  your  favorite  analyEcs  framework  ?  
    • Dave  Mcclure  Pirate  Metrics Source  :  hIp://www.slideshare.net/dmc500hats/startup-­‐metrics-­‐for-­‐pirates-­‐long-­‐version  
    • Lean  Analy9cs  Stages Credits  –  Alistair  Croll  and  Ben  Yoskovitz  
    • One  Metric  that  maXers   f(stage,  business)  =  metric  that   maIers     Bit.ly/BigLeanTable     Credits  –  Alistair  Croll  and  Ben  Yoskovitz  
    • Example  –  E-­‐commerce Stage   Metrics   Empathy   How  do  buyers  become  aware  of  the  need  ?   How  do  they  try  to  find  the  solu@on?  What  pain  do  they  encounter  as  a  result?   What  are  their  demographics  and  tech  profiles?   S@ckiness   Conversion,  Shopping  cart  size   Acquisi@on  :  cost  of  finding  new  buyers     Loyalty  :  Percent  of  buyers  who  return  in  90  days   Virality   Acquisi@on  mode:  customer  acquisi@on  cost,  volume  of  sharing   Loyalty  model:  ability  to  reac@vate,  volume  of  buyers  who  return   Revenue   Transac@on  value,  revenue  per  customer,  ra@o  of  acquisi@on  cost  to  LTV,  direct   sales  metrics   Scale   Affiliates,  Channels,  white-­‐label  product  ra@ngs,  reviews,  support  costs,  return  RMA   and  refunds,  channel  conflict   Source:  Bit.ly/BigLeanTable  
    • Our  Journey  Today Lean Which  one  should  I  focus  on  ? Preferably  one  (bit.ly/BigLeanTable) 2 Metrics What  do  they  look  like  ? Depends  upon  stage  and  type  of  startup 1
    • Where  do  I  get  these  metrics  from  ?  
    • Logs  –  Used  for  and  Types… • Opera9onal  Metrics   • Applica9on/Business   related  metrics   •  Opera9ng  system  logs •  Web  Server  Logs •  Database  logs •  CDN  Logs •  Applica9on  Logs
    • User  Engagement  in  Online  Video [Source: Conviva Viewer Experience Report – 2013]
    • Requirements  for  Gaming  company Cost  Analysis Data  transfer •  By  date/9me •  By  edge  loca9on •  By  date/9me  within   an  edge  loca9on •  By  top  X  URLs •  By  HTTP  vs.  HTTPS Marke9ng Top  URLs •  As-­‐is  count •  By  content  type •  By  edge  loca9on •  By  edge  loca9on  and   content  type Requests  served •  By  edge  loca9on Revenue •  By  edge  loca9on Top  games •  By  age •  By  income •  By  gender Opera9ons Error  rates •  By  top  X  URLs •  By  edge  loca9on •  By  edge  loca9on  and   content  type Revenue Top  games •  By  revenue •  By  edge  loca9on  and   revenue Top  ads •  That  lead  to  a  game   purchase
    • Requirements  for  Gaming  company Cost  Analysis   Data  transfer • By  date/9me • By  edge  loca9on • By  date/9me  within  an   edge  loca9on • By  top  X  URLs • By  HTTP  vs.  HTTPS Cloudfront  logs Web  Server  Logs
    • Available  Data  Sources  (Gaming) Metric Sources Data  transfer  by  date/@me CloudFront  logs Data  transfer  by  edge  loca@on CloudFront  logs Data  transfer  by  date/@me  within  an  edge  loca@on CloudFront  logs Data  transfer  by  top  x  URLs CloudFront  logs,  web  servers  logs Data  transfer  by  hXp  vs  HTTPS CloudFront  logs Top  URLs CloudFront  logs,  web  servers  logs Top  URLs  by  Content  Type CloudFront  logs Top  URLs  by  Edge  Loca@on CloudFront  logs Top  URLs  by  Edge  Loca@on  and  Content  Type CloudFront  logs Error  rates  by  top  x  URLs CloudFront  logs,  web  servers  logs Error  rate  by  edge  loca@on CloudFront  logs Error  Rate  by  edge  loca@on  and  content  type CloudFront  logs Requests  served  by  edge  loca@on CloudFront  logs Revenue  by  edge  loca@on CloudFront  logs,  OrdersDB,  app  servers  logs Top  games  segmented  by  age CloudFront  logs,  user  profile Top  games  segmented  by  income CloudFront  logs,  user  profile Top  games  segmented  by  gender CloudFront  logs,  user  profile Top  games  by  revenue CloudFront  logs,  OrdersDB Top  games  by  edge  loca@on  and  revenue CloudFront  logs,  OrdersDB Top  game  revenue  segmented  by  age CloudFront  logs,  OrdersDB,  user  profile
    • Our  Journey  Today Lean Which  one  should  I  focus  on  ? Preferably  one  (bit.ly/BigLeanTable) 2 Metrics What  do  they  look  like  ? Depends  upon  stage  and  type  of  startup 1 Where  do  I  find  them  ? They  are  all  hidden  in  your  logs  (So  don’t   throw  away  logs  to  create  disk  space  !) 3
    • How  to  process  logs  on  AWS  
    • CloudFront  Access  Log  Format #Version: 1.0 #Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query 2012-05-25 22:01:30 AMS1 4448 94.212.249.78 GET d1234567890213.cloudfront.net /YT0KthT/F5SOWdDPqNqQF07tiTOXqJMpfD dlb3LMwv3/jP3/CINm/yDSy0MsRcWJN/Simutrans.exe 200 http://AtRJw2kxg0EMW.com/kZetr/YCb6AM9N2xt2 Mozilla/ 5.0%20(compatible;%20M SIE%209.0;%20Windows%20NT%206.1;%20WOW64;%20Trident/5.0) uid=100&oid=108625181 2012-05-25 22:01:30 AMS1 4952 94.212.249.78 GET d1234567890213.cloudfront.net /66IG584/ CPCxY0P44BGb5ZOd3qSUrauL05 0LOvFwaMj/eH/caw/Blob Wars-Blob And Conquer.exe 200 http://AtRJw2kxg0EMW.com/kZetr/YCb6AM9N2xt2 Mozilla/ 5.0%20(compatible;%20M SIE%209.0;%20Windows%20NT%206.1;%20WOW64;%20Trident/5.0) uid=100&oid=108625184 2012-05-25 22:01:30 AMS1 4556 78.8.5.135 GET d1234567890213.cloudfront.net /SwlufjC/ xEjH3BRbXMXwmFWqzKt7od6tlW R3e13LhmH/V3eF/lo6g/AstroMenace.exe 200 http://AtRJw2kxg0EMW.com/AC1vg/1727EWfb7fPt Opera/9.80%20(Windows %20NT%205.1;%20U;%20pl)%2 0Presto/2.10.229%20Version/11.60 uid=100&oid=108625189 2012-05-25 22:01:30 AMS1 47172 78.8.5.135 GET d1234567890213.cloudfront.net /Di1cXoN/ TskldkSHcgkvZXQEmv5vOVR25X 5UTisFkRq/pQa/wCjUXZb/Z1HRuGlo/Kroz.exe 200 http://AtRJw2kxg0EMW.com/AC1vg/1727EWfb7fPt Opera/ 9.80%20(Windows%20NT%205.1;%20U; %20pl)%20Presto/2.10.229%20Version/11.60 uid=100&oid=108625206
    • Sample  Your  Data  with  R > sample_data <- read.delim(”SampleFiles/E123ABCDEF.2012-05-25-22.NEfbhLN3", header=F) > sample_data <- sample_data[-1:-2,] > View(sample_data) > m <- ggplot(sample_data, aes(x = factor(V9))) > m + geom_histogram() + scale_y_log10() + xlab('Error Codes') + ylab('log(Frequency)')
    • Complete  Rstudio  Interface Model   vCPU   Mem   (GiB)   SSD   Storage   (GB)   r3.large   2   15   1  x  32     r3.xlarge   4   30.5   1  x  80     r3.2xlarge   8   61   1  x  160     r3.4xlarge   16   122   1  x  320     r3.8xlarge     32     244     2  x  320    
    • Our  Journey  Today Lean Which  one  should  I  focus  on  ? Preferably  one  (bit.ly/BigLeanTable) 2 Metrics What  do  they  look  like  ? Depends  upon  stage  and  type  of  startup 1 Where  do  I  find  them  ? They  are  all  hidden  in  your  logs  (So  don’t   throw  away  logs  to  create  disk  space  !) 3 How  do  I  process  these  logs  ? Simple  tools  like  awk/sed,  SQL,  R 4
    • Two  approaches  to  Scale  your  log   processing 1.  DIY   2.  Use  prepackaged  3rd  party  soCware
    • 3rd  Party  Tools •  Sumologic •  Loggly •  SnowPlow  analy9cs •  Papertrail •  Logstash  +  Kibana  +  elas9cSearch •  Log.io •  Treasure  Data and  many  more  solu9ons  in  the  market  with  varied  levels  of  depth
    • Our  Journey  Today Lean Which  one  should  I  focus  on  ? Preferably  one  (bit.ly/BigLeanTable) 2 Metrics What  do  they  look  like  ? Depends  upon  stage  and  type  of  startup 1 Where  do  I  find  them  ? They  are  all  hidden  in  your  logs  (So  don’t   throw  away  logs  to  create  disk  space  !) 3 How  do  I  process  these  logs  ? Simple  tools  like  awk/sed,  SQL,  R 4 What  if  I  have  too  many  logs  ?  How  do  I  scale   processing Get  a  3rd  party  tool  or  build  it  yourself 5
    • DIY  Scalable  Log  Processing  Plahorm
    • Data  Analy9cs  Plahorm Log  shipping   and   aggrega@on   Storage   Transforma@on   Analysis   Visualiza@on  
    • Log  shipping   and   aggrega@on   Storage   Transforma@on   Analysis   Visualiza@on   Data  Analy9cs  Plahorm
    • Collec9on  of  Data Sources   Aggrega@on  and   shipping     Tool   Data  Sink   Web  Servers   Applica@on  servers   Connected  Devices   Mobile  Phones   Etc   Scalable  method  to  collect   and  aggregate   Flume,  Kaja,  Kinesis,   Queue   Reliable  and  durable   des@na@on  OR   Des@na@ons    
    • 43 Run  your  own  log  collector   Your  applicaEon   Amazon S3 DynamoDB   Any  other  data   store   Amazon S3 Amazon  EC2     1
    • Use  a  Queue   Amazon  Simple   Queue  Service   (SQS)   Amazon S3 DynamoDB   Any  other  data   store   2
    • Use  a  Tool  like  FLUME,  Fluentd,KAFKA,  HONU   etc Flume, Fluentd running on EC2 Amazon S3 Any  other  data   store   HDFS 4
    •  Data   Sources   App.4     [Machine   Learning]                                       AWS  Endpoint   App.1     [Aggregate  &   De-­‐Duplicate]    Data   Sources   Data   Sources    Data   Sources   App.2     [Metric   ExtracEon]   S3 DynamoDB   Redshift App.3   [Sliding   Window   Analysis]    Data   Sources   Availability Zone Shard  1   Shard  2   Shard  N   Availability Zone Availability Zone Introducing  Amazon  Kinesis     Managed  Service  for  Real-­‐Time  Processing  of  Big  Data   EMR
    • 47 Easy  AdministraEon       Managed  service  for  real-­‐@me  streaming  data   collec@on,  processing  and  analysis.  Simply   create  a  new  stream,  set  the  desired  level  of   capacity,  and  let  the  service  handle  the  rest.         Real-­‐Eme  Performance         Perform  con@nual  processing  on  streaming   big  data.  Processing  latencies  fall  to  a  few   seconds,  compared  with  the  minutes  or  hours   associated  with  batch  processing.             High  Throughput.  ElasEc         Seamlessly  scale  to  match  your  data   throughput  rate  and  volume.  You  can  easily   scale  up  to  gigabytes  per  second.  The  service   will  scale  up  or  down  based  on  your   opera@onal  or  business  needs.       S3,  EMR,  Storm,  Redshib,  &  DynamoDB   IntegraEon       Reliably  collect,  process,  and  transform  all  of   your  data  in  real-­‐@me  &  deliver  to  AWS  data   stores  of  choice,  with  Connectors  for  S3,   Redshil,  and  DynamoDB.           Build  Real-­‐Eme  ApplicaEons       Client  libraries  that  enable  developers  to   design  and  operate  real-­‐@me  streaming  data   processing  applica@ons.                   Low  Cost       Cost-­‐efficient  for  workloads  of  any  scale.  You   can  get  started  by  provisioning  a  small   stream,  and  pay  low  hourly  rates  only  for   what  you  use.               Amazon  Kinesis:  Key  Developer  Benefits
    • Data  Analy9cs  Plahorm Log  shipping   and   aggrega@on   Storage   Transforma@on   Analysis   Visualiza@on  
    • Choice  of  storage  systems  (Structure  and  Volume) Structure   Low  High   Large   Small   Size   S3   RDS   Dynamo  DB   NoSQL   EBS   1
    • Choice  of  storage  systems  (Structure  and  Volume) Structure   Low  High   Large   Small   Size   S3   RDS   Dynamo  DB   NoSQL   EBS   1
    • Courtesy  hXp://techblog.nenlix.com/2013/01/hadoop-­‐planorm-­‐as-­‐service-­‐in-­‐cloud.html   S3  as  a  “single  source  of  truth” S3
    • Data  Analy9cs  Plahorm Log  shipping   and   aggrega@on   Storage   Transforma@on   Analysis   Visualiza@on  
    • Hadoop  based  Analysis Amazon  SQS   Amazon S3 DynamoDB   Any  SQL  or  NO  SQL   Store   Log  AggregaEon     tools   Amazon EMR
    • Your  choice  of  tools  on  Hadoop/EMR Amazon  SQS   Amazon S3 DynamoDB   Any  SQL  or  NO  SQL   Store   Log  AggregaEon     tools   Amazon EMR
    • Pig  for  Access  Logs  Analysis RAW_LOG = LOAD 's3://myoutputbucket/aggregate/' AS (ts:chararray, url:chararray…); LOGS_BASE_F = FILTER RAW_LOG BY url MATCHES '^GET /__track.*$’; LOGS_BASE_F_W_PARAM = FOREACH LOGS_BASE_F GENERATE url, DATE_TIME(ts, 'dd/MMM/yyyy:HH:mm:ss Z') as dt, SUBSTRING(DATE_TIME(ts, 'dd/MMM/yyyy:HH:mm:ss Z') ,0, 10 ) as day, … status, REGEX_EXTRACT(url, '^GET /([^?]+)', 1) AS action: chararray, REGEX_EXTRACT(url, 'idt=([^&]+)', 1) AS idt: chararray, REGEX_EXTRACT(url, 'idc=([^&]+)', 1) AS idc: chararray; I1 = FILTER LOGS_BASE_F_W_PARAM by action == 'clic' or action == 'display'; LOGS_SHORT = FOREACH I1 GENERATE uuid, action, dt, day, ida, idas, act, idp, idcmp ,idc; G1 = GROUP LOGS_SHORT BY (uuid,idc); store G1 into ‘s3://mybucket/sessions/’; Load  and  Filter   (cat  /  grep)   Parse   (awk)   Store   (>)  
    • Data  analy9cs  Plahorm Log  shipping   and   aggrega@on   Storage   Transforma@on   Analysis   Visualiza@on  
    • Hadoop is good for 1.  Ad Hoc Query analysis 2.  Large Unstructured Data Sets 3.  Machine Learning and Advanced Analytics 4.  Schema less
    • SQL based processing for unstructured data Amazon  SQS   Amazon S3 DynamoDB   Any  SQL  or  NO  SQL   Store   Log  AggregaEon     tools   Amazon EMR Amazon Redshift Pre-processing framework Petabyte scale Columnar Data - warehouse
    • You  might  not  need  pre-­‐processing  (e.g.  JSON,  CSV) Amazon  SQS   Amazon S3 DynamoDB   Any  SQL  or  NO  SQL   Store   Log  AggregaEon     tools   Amazon Redshift Petabyte scale Columnar Data - warehouse
    • COPY  into  Amazon  RedshiC create table cf_logs ( d date, t char(8), edge char(4), bytes int, cip varchar(15), verb char(3), distro varchar(MAX), object varchar(MAX), status int, Referer varchar(MAX), agent varchar(MAX), qs varchar(MAX) ) copy cf_logs from 's3://big-data/logs/E123ABCDEF/' credentials 'aws_access_key_id=<key_id>;aws_secret_access_key=<secret_key>' IGNOREHEADER 2 GZIP DELIMITER 't' DATEFORMAT 'YYYY-MM-DD'
    • But  Data  Warehouses  is  for  Enterprises  ?
    • Rela@onal  data  warehouse   Massively  parallel   Petabyte  scale   Fully  managed;  zero  admin   Low  cost  point   Open  Interface   Amazon     Redshil   Redshift is Data-warehouse done the AWS Way
    • Your choice of BI Tools on the cloud Amazon  SQS   Amazon S3 DynamoDB   Any  SQL  or  NO  SQL   Store   Log  AggregaEon     tools   Amazon EMR Amazon Redshift Pre-processing framework
    • Choose  Your  Favorite   Visualiza9on  Tool Tableau  (Windows  instance)   R   Jaspersol   QlikView   MicroStrategy   SiSense   …  
    • Our  Journey  Today Lean Which  one  should  I  focus  on  ? Preferably  one  (bit.ly/BigLeanTable) 2 Metrics What  do  they  look  like  ? Depends  upon  stage  and  type  of  startup 1 Where  do  I  find  them  ? They  are  all  hidden  in  your  logs  (So  don’t   throw  away  logs  to  create  disk  space  !) 3 How  do  I  process  these  logs  ? Simple  tools  like  awk/sed,  SQL,  R 4 What  if  I  have  too  many  logs  ?  How  do  I  scale   processing Get  a  3rd  party  tool  or  build  it  yourself 5 How  do  I  build  a  log  analy9cs  plahorm   myself 1.  Ship  and  aggregate  your  logs  using  either   Flume,  Kinesis,  Fluentd  and  store  them  in  S3 2.  Process  them  using  Hadoop  (EMR)  or  RedshiC 3.  Run  your  our  visualiza9on  tool  on  it 6
    • Standing  on  shoulder  of  Giants “With  Amazon  RedshiC  and  Tableau,  anyone  in  the  company  can  set  up  any  queries  they  like—from   how  users  are  reac9ng  to  a  feature,  to  growth  by  demographic  or  geography,  to  the  impact  sales   efforts  have  had  in  different  areas.  It’s  very  flexible,” “Using  Amazon  Elas9c  MapReduce  Yelp  was  able  to  save  $55,000  in  upfront  hardware  costs  and  get   up  and  running  in  a  marer  of  days  not  months.  However,  most  important  to  Yelp  is  the  opportunity   cost.  “With  AWS,  our  developers  can  now  do  things  they  couldn’t  before,”  says  Marin.  “Our  systems   team  can  focus  their  energies  on  other  challenges” “Ini9ally  we  used  Amazon  RedshiC  as  a  data  mart  for  the  data  science  team.  Now,  it  is  increasingly   used  for  produc9on  data  mart  tasks  such  as  providing  our  marke9ng  department  with  fresh  data  to   make  informed  decisions  and  automa9cally  op9mize  our  adver9sing,"  said  Cooper  McGuire,   Managing  Director,  at  Zalora.  "Addi9onally,  Amazon  RedshiC  is  simple  to  use  and  reliable.  With  one   click,  we  can  rapidly  scale  up  or  down  in  real  9me  in  alignment  with  business  requirements.  We  have   been  able  to  eliminate  significant  maintenance  costs  and  overhead  associated  with  tradi9onal   solu9ons  and  external  consultants
    • Finally,  a  Small  Warning Abraham  Wald  (1902-­‐1950)  
    • A  B   C  
    • In  Summary •  Growth  Hacking  =  Understanding  your  business  to  op9mize  it •  You  can’t  op9mize  what  you  don’t  measure   •  Logs  are  your  goldmine  –  they  contain  everything  you  want  to   measure •  S3  is  a  good  place  to  store  all  your  logs  because  of  Durability  and  Cost   •  Build  an  analy9cs  plahorm  that  enables  developers  and  analysts  to   gain  interes9ng  insights  with  the  choice  of  tool  they  want •  Most  Important  –  Innova9on  and  growth  will  come  from  areas  you   least  thought  it  could  !
    • Thank  You  !   sinhaar@amazon.com   @abysinha