SlideShare a Scribd company logo
1 of 71
Download to read offline
Growing on Amazon Web Services
Abhishek Sinha
Amazon Web Services
@abysinha
Our  Journey  Today
Growth  Hacking
Growth  hacking  is  a  marke9ng  technique  developed  by  technology  
startups  which  uses  crea9vity,  analy9cal  thinking,  and  social  metrics  to  
sell  products  and  gain  exposure
At  Airbnb,  we  look  into  all  possible  ways  to  
improve  our  product  and  user  experience.  
OCen  9mes  this  involves  lots  of  analy9cs  
behind  the  scene.”
Learn	
  and	
  
Iterate	
  
MVP	
  Hypothesis	
  
Learn	
  and	
  
Iterate	
  
MVP	
  Hypothesis	
  
Hosts	
  with	
  professional	
  photography	
  will	
  get	
  more	
  business.	
  
And	
  hosts	
  will	
  sign	
  up	
  for	
  professional	
  photography	
  as	
  a	
  service.”	
  
Build	
  a	
  MVP	
  –	
  20	
  Photographers	
  	
  
Saw	
  the	
  proverbial	
  “Hockey	
  SEck”	
  
Airbnb  then  scaled  the  Idea
•  Professional	
  Photography	
  Services	
  
•  Increased	
  the	
  requirements	
  of	
  Photo	
  Quality	
  
•  Watermarked	
  Photos	
  for	
  authen@city	
  
•  Key	
  Metrics	
  Tracked	
  –	
  “Shoots	
  per	
  month”	
  
•  April	
  2012	
  –	
  5000	
  shoots	
  per	
  month	
  
•  Growth	
  can	
  some@mes	
  come	
  from	
  unexpected	
  areas	
  	
  
Our  Journey  Today
Growth  hacking  is  a  marke9ng  
technique  developed  by  technology  
startups  which  uses  crea9vity,  
analy9cal  thinking,  and  social  metrics  
to  sell  products  and  gain  exposure
BUILD-­‐MEASURE-­‐LEARN
The  fundamental  ac9vity  of  a  
startup  is  to  turn  ideas  into  
products,  measure  how  customers  
respond,  and  then  learn  whether  
to  pivot  or  persevere.  All  successful  
startup  processes  should  be  
geared  to  accelerate  that  feedback  
loop.
In	
  a	
  startup,	
  the	
  purpose	
  of	
  analy@cs	
  is	
  
to	
  iterate	
  to	
  product/market	
  fit	
  before	
  
the	
  money	
  runs	
  out	
  
-­‐	
  Lean	
  analy@cs	
  by	
  Alistair	
  Croll	
  and	
  Ben	
  Yoskowitz	
  
Our  Journey  Today
Metrics
Lean
What  do  these  metrics  look  like  ?
Depends	
  upon	
  what	
  stage	
  your	
  startup	
  is	
  at	
  	
  
And	
  what	
  is	
  your	
  favorite	
  analyEcs	
  framework	
  ?	
  
Dave  Mcclure  Pirate  Metrics
Source	
  :	
  hIp://www.slideshare.net/dmc500hats/startup-­‐metrics-­‐for-­‐pirates-­‐long-­‐version	
  
Lean  Analy9cs  Stages
Credits	
  –	
  Alistair	
  Croll	
  and	
  Ben	
  Yoskovitz	
  
One	
  Metric	
  that	
  maXers	
  
f(stage,	
  business)	
  =	
  metric	
  that	
  
maIers	
  
	
  
Bit.ly/BigLeanTable	
  	
  
Credits	
  –	
  Alistair	
  Croll	
  and	
  Ben	
  Yoskovitz	
  
Example  –  E-­‐commerce
Stage	
   Metrics	
  
Empathy	
   How	
  do	
  buyers	
  become	
  aware	
  of	
  the	
  need	
  ?	
  
How	
  do	
  they	
  try	
  to	
  find	
  the	
  solu@on?	
  What	
  pain	
  do	
  they	
  encounter	
  as	
  a	
  result?	
  
What	
  are	
  their	
  demographics	
  and	
  tech	
  profiles?	
  
S@ckiness	
   Conversion,	
  Shopping	
  cart	
  size	
  
Acquisi@on	
  :	
  cost	
  of	
  finding	
  new	
  buyers	
  	
  
Loyalty	
  :	
  Percent	
  of	
  buyers	
  who	
  return	
  in	
  90	
  days	
  
Virality	
   Acquisi@on	
  mode:	
  customer	
  acquisi@on	
  cost,	
  volume	
  of	
  sharing	
  
Loyalty	
  model:	
  ability	
  to	
  reac@vate,	
  volume	
  of	
  buyers	
  who	
  return	
  
Revenue	
   Transac@on	
  value,	
  revenue	
  per	
  customer,	
  ra@o	
  of	
  acquisi@on	
  cost	
  to	
  LTV,	
  direct	
  
sales	
  metrics	
  
Scale	
   Affiliates,	
  Channels,	
  white-­‐label	
  product	
  ra@ngs,	
  reviews,	
  support	
  costs,	
  return	
  RMA	
  
and	
  refunds,	
  channel	
  conflict	
  
Source:	
  Bit.ly/BigLeanTable	
  
Our  Journey  Today
Lean
Which  one  should  I  focus  on  ?
Preferably  one  (bit.ly/BigLeanTable)
2
Metrics
What  do  they  look  like  ?
Depends  upon  stage  and  type  of  startup
1
Where	
  do	
  I	
  get	
  these	
  metrics	
  from	
  ?	
  
Logs  –  Used  for  and  Types…
• Opera9onal  Metrics  
• Applica9on/Business  
related  metrics  
•  Opera9ng  system  logs
•  Web  Server  Logs
•  Database  logs
•  CDN  Logs
•  Applica9on  Logs
User  Engagement  in  Online  Video
[Source: Conviva Viewer Experience Report – 2013]
Requirements  for  Gaming  company
Cost  Analysis
Data  transfer
•  By  date/9me
•  By  edge  loca9on
•  By  date/9me  within  
an  edge  loca9on
•  By  top  X  URLs
•  By  HTTP  vs.  HTTPS
Marke9ng
Top  URLs
•  As-­‐is  count
•  By  content  type
•  By  edge  loca9on
•  By  edge  loca9on  and  
content  type
Requests  served
•  By  edge  loca9on
Revenue
•  By  edge  loca9on
Top  games
•  By  age
•  By  income
•  By  gender
Opera9ons
Error  rates
•  By  top  X  URLs
•  By  edge  loca9on
•  By  edge  loca9on  and  
content  type
Revenue
Top  games
•  By  revenue
•  By  edge  loca9on  and  
revenue
Top  ads
•  That  lead  to  a  game  
purchase
Requirements  for  Gaming  company
Cost	
  Analysis	
  
Data  transfer
• By  date/9me
• By  edge  loca9on
• By  date/9me  within  an  
edge  loca9on
• By  top  X  URLs
• By  HTTP  vs.  HTTPS
Cloudfront  logs
Web  Server  Logs
Available  Data  Sources  (Gaming)
Metric Sources
Data	
  transfer	
  by	
  date/@me CloudFront	
  logs
Data	
  transfer	
  by	
  edge	
  loca@on CloudFront	
  logs
Data	
  transfer	
  by	
  date/@me	
  within	
  an	
  edge	
  loca@on CloudFront	
  logs
Data	
  transfer	
  by	
  top	
  x	
  URLs CloudFront	
  logs,	
  web	
  servers	
  logs
Data	
  transfer	
  by	
  hXp	
  vs	
  HTTPS CloudFront	
  logs
Top	
  URLs CloudFront	
  logs,	
  web	
  servers	
  logs
Top	
  URLs	
  by	
  Content	
  Type CloudFront	
  logs
Top	
  URLs	
  by	
  Edge	
  Loca@on CloudFront	
  logs
Top	
  URLs	
  by	
  Edge	
  Loca@on	
  and	
  Content	
  Type CloudFront	
  logs
Error	
  rates	
  by	
  top	
  x	
  URLs CloudFront	
  logs,	
  web	
  servers	
  logs
Error	
  rate	
  by	
  edge	
  loca@on CloudFront	
  logs
Error	
  Rate	
  by	
  edge	
  loca@on	
  and	
  content	
  type CloudFront	
  logs
Requests	
  served	
  by	
  edge	
  loca@on CloudFront	
  logs
Revenue	
  by	
  edge	
  loca@on CloudFront	
  logs,	
  OrdersDB,	
  app	
  servers	
  logs
Top	
  games	
  segmented	
  by	
  age CloudFront	
  logs,	
  user	
  profile
Top	
  games	
  segmented	
  by	
  income CloudFront	
  logs,	
  user	
  profile
Top	
  games	
  segmented	
  by	
  gender CloudFront	
  logs,	
  user	
  profile
Top	
  games	
  by	
  revenue CloudFront	
  logs,	
  OrdersDB
Top	
  games	
  by	
  edge	
  loca@on	
  and	
  revenue CloudFront	
  logs,	
  OrdersDB
Top	
  game	
  revenue	
  segmented	
  by	
  age CloudFront	
  logs,	
  OrdersDB,	
  user	
  profile
Our  Journey  Today
Lean
Which  one  should  I  focus  on  ?
Preferably  one  (bit.ly/BigLeanTable)
2
Metrics
What  do  they  look  like  ?
Depends  upon  stage  and  type  of  startup
1
Where  do  I  find  them  ?
They  are  all  hidden  in  your  logs  (So  don’t  
throw  away  logs  to  create  disk  space  !)
3
How	
  to	
  process	
  logs	
  on	
  AWS	
  
CloudFront  Access  Log  Format
#Version: 1.0
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent)
cs-uri-query 2012-05-25 22:01:30 AMS1 4448 94.212.249.78 GET
d1234567890213.cloudfront.net /YT0KthT/F5SOWdDPqNqQF07tiTOXqJMpfD
dlb3LMwv3/jP3/CINm/yDSy0MsRcWJN/Simutrans.exe 200 http://AtRJw2kxg0EMW.com/kZetr/YCb6AM9N2xt2 Mozilla/
5.0%20(compatible;%20M
SIE%209.0;%20Windows%20NT%206.1;%20WOW64;%20Trident/5.0) uid=100&oid=108625181
2012-05-25 22:01:30 AMS1 4952 94.212.249.78 GET d1234567890213.cloudfront.net /66IG584/
CPCxY0P44BGb5ZOd3qSUrauL05
0LOvFwaMj/eH/caw/Blob Wars-Blob And Conquer.exe 200 http://AtRJw2kxg0EMW.com/kZetr/YCb6AM9N2xt2 Mozilla/
5.0%20(compatible;%20M
SIE%209.0;%20Windows%20NT%206.1;%20WOW64;%20Trident/5.0) uid=100&oid=108625184
2012-05-25 22:01:30 AMS1 4556 78.8.5.135 GET d1234567890213.cloudfront.net /SwlufjC/
xEjH3BRbXMXwmFWqzKt7od6tlW
R3e13LhmH/V3eF/lo6g/AstroMenace.exe 200 http://AtRJw2kxg0EMW.com/AC1vg/1727EWfb7fPt Opera/9.80%20(Windows
%20NT%205.1;%20U;%20pl)%2
0Presto/2.10.229%20Version/11.60 uid=100&oid=108625189
2012-05-25 22:01:30 AMS1 47172 78.8.5.135 GET d1234567890213.cloudfront.net /Di1cXoN/
TskldkSHcgkvZXQEmv5vOVR25X
5UTisFkRq/pQa/wCjUXZb/Z1HRuGlo/Kroz.exe 200 http://AtRJw2kxg0EMW.com/AC1vg/1727EWfb7fPt Opera/
9.80%20(Windows%20NT%205.1;%20U;
%20pl)%20Presto/2.10.229%20Version/11.60 uid=100&oid=108625206
Sample  Your  Data  with  R
> sample_data <- read.delim(”SampleFiles/E123ABCDEF.2012-05-25-22.NEfbhLN3", header=F)
> sample_data <- sample_data[-1:-2,]
> View(sample_data)
> m <- ggplot(sample_data, aes(x = factor(V9)))
> m + geom_histogram() + scale_y_log10() + xlab('Error Codes') +
ylab('log(Frequency)')
Complete  Rstudio  Interface
Model	
   vCPU	
  
Mem	
  
(GiB)	
  
SSD	
  
Storage	
  
(GB)	
  
r3.large	
   2	
   15	
  
1	
  x	
  32	
  
	
  
r3.xlarge	
   4	
   30.5	
  
1	
  x	
  80	
  
	
  
r3.2xlarge	
   8	
   61	
  
1	
  x	
  160	
  
	
  
r3.4xlarge	
   16	
   122	
  
1	
  x	
  320	
  
	
  
r3.8xlarge	
  
	
  
32	
  
	
  
244	
  
	
  
2	
  x	
  320	
  
	
  
Our  Journey  Today
Lean
Which  one  should  I  focus  on  ?
Preferably  one  (bit.ly/BigLeanTable)
2
Metrics
What  do  they  look  like  ?
Depends  upon  stage  and  type  of  startup
1
Where  do  I  find  them  ?
They  are  all  hidden  in  your  logs  (So  don’t  
throw  away  logs  to  create  disk  space  !)
3
How  do  I  process  these  logs  ?
Simple  tools  like  awk/sed,  SQL,  R
4
Two  approaches  to  Scale  your  log  
processing
1.  DIY  
2.  Use  prepackaged  3rd  party  soCware
3rd  Party  Tools
•  Sumologic
•  Loggly
•  SnowPlow  analy9cs
•  Papertrail
•  Logstash  +  Kibana  +  elas9cSearch
•  Log.io
•  Treasure  Data
and  many  more  solu9ons  in  the  market  with  varied  levels  of  depth
Our  Journey  Today
Lean
Which  one  should  I  focus  on  ?
Preferably  one  (bit.ly/BigLeanTable)
2
Metrics
What  do  they  look  like  ?
Depends  upon  stage  and  type  of  startup
1
Where  do  I  find  them  ?
They  are  all  hidden  in  your  logs  (So  don’t  
throw  away  logs  to  create  disk  space  !)
3
How  do  I  process  these  logs  ?
Simple  tools  like  awk/sed,  SQL,  R
4
What  if  I  have  too  many  logs  ?  How  do  I  scale  
processing
Get  a  3rd  party  tool  or  build  it  yourself
5
DIY  Scalable  Log  Processing  Plahorm
Data  Analy9cs  Plahorm
Log	
  shipping	
  
and	
  
aggrega@on	
  
Storage	
   Transforma@on	
   Analysis	
   Visualiza@on	
  
Log	
  shipping	
  
and	
  
aggrega@on	
  
Storage	
   Transforma@on	
   Analysis	
   Visualiza@on	
  
Data  Analy9cs  Plahorm
Collec9on  of  Data
Sources	
  
Aggrega@on	
  and	
  
shipping	
  	
  
Tool	
  
Data	
  Sink	
  
Web	
  Servers	
  
Applica@on	
  servers	
  
Connected	
  Devices	
  
Mobile	
  Phones	
  
Etc	
  
Scalable	
  method	
  to	
  collect	
  
and	
  aggregate	
  
Flume,	
  Kaja,	
  Kinesis,	
  
Queue	
  
Reliable	
  and	
  durable	
  
des@na@on	
  OR	
  
Des@na@ons	
  	
  
43
Run  your  own  log  collector  
Your	
  applicaEon	
   Amazon S3
DynamoDB	
  
Any	
  other	
  data	
  
store	
  
Amazon S3
Amazon	
  EC2	
  	
  
1
Use  a  Queue  
Amazon	
  Simple	
  
Queue	
  Service	
  
(SQS)	
  
Amazon S3
DynamoDB	
  
Any	
  other	
  data	
  
store	
  
2
Use  a  Tool  like  FLUME,  Fluentd,KAFKA,  HONU  
etc
Flume, Fluentd
running on
EC2
Amazon S3
Any	
  other	
  data	
  
store	
  
HDFS
4
 Data	
  
Sources	
  
App.4	
  
	
  
[Machine	
  
Learning]	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  
	
  
	
  
AWS	
  Endpoint	
  
App.1	
  
	
  
[Aggregate	
  &	
  
De-­‐Duplicate]	
  
	
  Data	
  
Sources	
  
Data	
  
Sources	
  
	
  Data	
  
Sources	
  
App.2	
  
	
  
[Metric	
  
ExtracEon]	
  
S3
DynamoDB	
  
Redshift
App.3	
  
[Sliding	
  
Window	
  
Analysis]	
  
	
  Data	
  
Sources	
  
Availability
Zone
Shard	
  1	
  
Shard	
  2	
  
Shard	
  N	
  
Availability
Zone
Availability
Zone
Introducing  Amazon  Kinesis    
Managed  Service  for  Real-­‐Time  Processing  of  Big  Data  
EMR
47
Easy	
  AdministraEon	
  
	
  	
  
Managed	
  service	
  for	
  real-­‐@me	
  streaming	
  data	
  
collec@on,	
  processing	
  and	
  analysis.	
  Simply	
  
create	
  a	
  new	
  stream,	
  set	
  the	
  desired	
  level	
  of	
  
capacity,	
  and	
  let	
  the	
  service	
  handle	
  the	
  rest.	
  
	
  
	
  
	
  
Real-­‐Eme	
  Performance	
  	
  
	
  	
  
Perform	
  con@nual	
  processing	
  on	
  streaming	
  
big	
  data.	
  Processing	
  latencies	
  fall	
  to	
  a	
  few	
  
seconds,	
  compared	
  with	
  the	
  minutes	
  or	
  hours	
  
associated	
  with	
  batch	
  processing.	
  	
  
	
  	
  
	
  	
  
High	
  Throughput.	
  ElasEc	
  	
  
	
  	
  
Seamlessly	
  scale	
  to	
  match	
  your	
  data	
  
throughput	
  rate	
  and	
  volume.	
  You	
  can	
  easily	
  
scale	
  up	
  to	
  gigabytes	
  per	
  second.	
  The	
  service	
  
will	
  scale	
  up	
  or	
  down	
  based	
  on	
  your	
  
opera@onal	
  or	
  business	
  needs.	
  
	
  	
  
S3,	
  EMR,	
  Storm,	
  Redshib,	
  &	
  DynamoDB	
  
IntegraEon	
  
	
  	
  
Reliably	
  collect,	
  process,	
  and	
  transform	
  all	
  of	
  
your	
  data	
  in	
  real-­‐@me	
  &	
  deliver	
  to	
  AWS	
  data	
  
stores	
  of	
  choice,	
  with	
  Connectors	
  for	
  S3,	
  
Redshil,	
  and	
  DynamoDB.	
  
	
  	
  
	
  	
  
Build	
  Real-­‐Eme	
  ApplicaEons	
  
	
  	
  
Client	
  libraries	
  that	
  enable	
  developers	
  to	
  
design	
  and	
  operate	
  real-­‐@me	
  streaming	
  data	
  
processing	
  applica@ons.	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
Low	
  Cost	
  
	
  	
  
Cost-­‐efficient	
  for	
  workloads	
  of	
  any	
  scale.	
  You	
  
can	
  get	
  started	
  by	
  provisioning	
  a	
  small	
  
stream,	
  and	
  pay	
  low	
  hourly	
  rates	
  only	
  for	
  
what	
  you	
  use.	
  
	
  	
  
	
  	
  
	
  	
  
Amazon  Kinesis:  Key  Developer  Benefits
Data  Analy9cs  Plahorm
Log	
  shipping	
  
and	
  
aggrega@on	
  
Storage	
   Transforma@on	
   Analysis	
   Visualiza@on	
  
Choice  of  storage  systems  (Structure  and  Volume)
Structure	
  
Low	
  High	
  
Large	
  
Small	
  
Size	
  
S3	
  
RDS	
  
Dynamo	
  DB	
  
NoSQL	
  
EBS	
  
1
Choice  of  storage  systems  (Structure  and  Volume)
Structure	
  
Low	
  High	
  
Large	
  
Small	
  
Size	
  
S3	
  
RDS	
  
Dynamo	
  DB	
  
NoSQL	
  
EBS	
  
1
Courtesy	
  hXp://techblog.nenlix.com/2013/01/hadoop-­‐planorm-­‐as-­‐service-­‐in-­‐cloud.html	
  
S3  as  a  “single  source  of  truth”
S3
Data  Analy9cs  Plahorm
Log	
  shipping	
  
and	
  
aggrega@on	
  
Storage	
   Transforma@on	
   Analysis	
   Visualiza@on	
  
Hadoop  based  Analysis
Amazon	
  SQS	
  
Amazon S3
DynamoDB	
  
Any	
  SQL	
  or	
  NO	
  SQL	
  
Store	
  
Log	
  AggregaEon	
  	
  
tools	
  
Amazon
EMR
Your  choice  of  tools  on  Hadoop/EMR
Amazon	
  SQS	
  
Amazon S3
DynamoDB	
  
Any	
  SQL	
  or	
  NO	
  SQL	
  
Store	
  
Log	
  AggregaEon	
  	
  
tools	
  
Amazon
EMR
Pig  for  Access  Logs  Analysis
RAW_LOG = LOAD 's3://myoutputbucket/aggregate/' AS (ts:chararray, url:chararray…);
LOGS_BASE_F = FILTER RAW_LOG BY url MATCHES '^GET /__track.*$’;
LOGS_BASE_F_W_PARAM = FOREACH LOGS_BASE_F GENERATE
url,
DATE_TIME(ts, 'dd/MMM/yyyy:HH:mm:ss Z') as dt,
SUBSTRING(DATE_TIME(ts, 'dd/MMM/yyyy:HH:mm:ss Z') ,0, 10 ) as day,
…
status,
REGEX_EXTRACT(url, '^GET /([^?]+)', 1) AS action: chararray,
REGEX_EXTRACT(url, 'idt=([^&]+)', 1) AS idt: chararray,
REGEX_EXTRACT(url, 'idc=([^&]+)', 1) AS idc: chararray;
I1 = FILTER LOGS_BASE_F_W_PARAM by action == 'clic' or action == 'display';
LOGS_SHORT = FOREACH I1 GENERATE uuid, action, dt, day, ida, idas, act, idp,
idcmp ,idc;
G1 = GROUP LOGS_SHORT BY (uuid,idc);
store G1 into ‘s3://mybucket/sessions/’;
Load	
  and	
  Filter	
  
(cat	
  /	
  grep)	
  
Parse	
  
(awk)	
  
Store	
  
(>)	
  
Data  analy9cs  Plahorm
Log	
  shipping	
  
and	
  
aggrega@on	
  
Storage	
   Transforma@on	
   Analysis	
   Visualiza@on	
  
Hadoop is good for
1.  Ad Hoc Query analysis
2.  Large Unstructured Data Sets
3.  Machine Learning and Advanced Analytics
4.  Schema less
SQL based processing for unstructured data
Amazon	
  SQS	
  
Amazon S3
DynamoDB	
  
Any	
  SQL	
  or	
  NO	
  SQL	
  
Store	
  
Log	
  AggregaEon	
  	
  
tools	
  
Amazon
EMR
Amazon
Redshift
Pre-processing
framework
Petabyte scale
Columnar Data -
warehouse
You  might  not  need  pre-­‐processing  (e.g.  JSON,  CSV)
Amazon	
  SQS	
  
Amazon S3
DynamoDB	
  
Any	
  SQL	
  or	
  NO	
  SQL	
  
Store	
  
Log	
  AggregaEon	
  	
  
tools	
  
Amazon
Redshift
Petabyte scale
Columnar Data -
warehouse
COPY  into  Amazon  RedshiC
create table cf_logs
( d date, t char(8), edge char(4), bytes int, cip varchar(15),
verb char(3), distro varchar(MAX), object varchar(MAX), status int,
Referer varchar(MAX), agent varchar(MAX), qs varchar(MAX) )
copy cf_logs from 's3://big-data/logs/E123ABCDEF/'
credentials 'aws_access_key_id=<key_id>;aws_secret_access_key=<secret_key>'
IGNOREHEADER 2
GZIP
DELIMITER 't'
DATEFORMAT 'YYYY-MM-DD'
But  Data  Warehouses  is  for  Enterprises  ?
Rela@onal	
  data	
  warehouse	
  
Massively	
  parallel	
  
Petabyte	
  scale	
  
Fully	
  managed;	
  zero	
  admin	
  
Low	
  cost	
  point	
  
Open	
  Interface	
  
Amazon	
  	
  
Redshil	
  
Redshift is Data-warehouse done the AWS Way
Your choice of BI Tools on the cloud
Amazon	
  SQS	
  
Amazon S3
DynamoDB	
  
Any	
  SQL	
  or	
  NO	
  SQL	
  
Store	
  
Log	
  AggregaEon	
  	
  
tools	
  
Amazon
EMR
Amazon
Redshift
Pre-processing
framework
Choose  Your  Favorite  
Visualiza9on  Tool
Tableau	
  (Windows	
  instance)	
  
R	
  
Jaspersol	
  
QlikView	
  
MicroStrategy	
  
SiSense	
  
…	
  
Our  Journey  Today
Lean
Which  one  should  I  focus  on  ?
Preferably  one  (bit.ly/BigLeanTable)
2
Metrics
What  do  they  look  like  ?
Depends  upon  stage  and  type  of  startup
1
Where  do  I  find  them  ?
They  are  all  hidden  in  your  logs  (So  don’t  
throw  away  logs  to  create  disk  space  !)
3
How  do  I  process  these  logs  ?
Simple  tools  like  awk/sed,  SQL,  R
4
What  if  I  have  too  many  logs  ?  How  do  I  scale  
processing
Get  a  3rd  party  tool  or  build  it  yourself
5
How  do  I  build  a  log  analy9cs  plahorm  
myself
1.  Ship  and  aggregate  your  logs  using  either  
Flume,  Kinesis,  Fluentd  and  store  them  in  S3
2.  Process  them  using  Hadoop  (EMR)  or  RedshiC
3.  Run  your  our  visualiza9on  tool  on  it
6
Standing  on  shoulder  of  Giants
“With  Amazon  RedshiC  and  Tableau,  anyone  in  the  company  can  set  up  any  queries  they  like—from  
how  users  are  reac9ng  to  a  feature,  to  growth  by  demographic  or  geography,  to  the  impact  sales  
efforts  have  had  in  different  areas.  It’s  very  flexible,”
“Using  Amazon  Elas9c  MapReduce  Yelp  was  able  to  save  $55,000  in  upfront  hardware  costs  and  get  
up  and  running  in  a  marer  of  days  not  months.  However,  most  important  to  Yelp  is  the  opportunity  
cost.  “With  AWS,  our  developers  can  now  do  things  they  couldn’t  before,”  says  Marin.  “Our  systems  
team  can  focus  their  energies  on  other  challenges”
“Ini9ally  we  used  Amazon  RedshiC  as  a  data  mart  for  the  data  science  team.  Now,  it  is  increasingly  
used  for  produc9on  data  mart  tasks  such  as  providing  our  marke9ng  department  with  fresh  data  to  
make  informed  decisions  and  automa9cally  op9mize  our  adver9sing,"  said  Cooper  McGuire,  
Managing  Director,  at  Zalora.  "Addi9onally,  Amazon  RedshiC  is  simple  to  use  and  reliable.  With  one  
click,  we  can  rapidly  scale  up  or  down  in  real  9me  in  alignment  with  business  requirements.  We  have  
been  able  to  eliminate  significant  maintenance  costs  and  overhead  associated  with  tradi9onal  
solu9ons  and  external  consultants
Finally,  a  Small  Warning
Abraham	
  Wald	
  (1902-­‐1950)	
  
A	
  B	
  
C	
  
In  Summary
•  Growth  Hacking  =  Understanding  your  business  to  op9mize  it
•  You  can’t  op9mize  what  you  don’t  measure  
•  Logs  are  your  goldmine  –  they  contain  everything  you  want  to  
measure
•  S3  is  a  good  place  to  store  all  your  logs  because  of  Durability  and  Cost  
•  Build  an  analy9cs  plahorm  that  enables  developers  and  analysts  to  
gain  interes9ng  insights  with  the  choice  of  tool  they  want
•  Most  Important  –  Innova9on  and  growth  will  come  from  areas  you  
least  thought  it  could  !
Thank  You  !  
sinhaar@amazon.com  
@abysinha

More Related Content

Similar to AWS Activate Webinar - Growing on AWS

Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...
Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...
Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...Amazon Web Services
 
Enterprise Marketplace Powered by Sitecore Experience Cloud
Enterprise Marketplace Powered by Sitecore Experience CloudEnterprise Marketplace Powered by Sitecore Experience Cloud
Enterprise Marketplace Powered by Sitecore Experience CloudVarunNehra
 
5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWSChristian Beedgen
 
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Spark Summit
 
Track 3 Session 2_從傳統 legacy 邁向數位化與現代化架構
Track 3 Session 2_從傳統  legacy  邁向數位化與現代化架構Track 3 Session 2_從傳統  legacy  邁向數位化與現代化架構
Track 3 Session 2_從傳統 legacy 邁向數位化與現代化架構Amazon Web Services
 
AWS Stripe Meetup - Powering UK Startup Economy
AWS Stripe Meetup - Powering UK Startup EconomyAWS Stripe Meetup - Powering UK Startup Economy
AWS Stripe Meetup - Powering UK Startup EconomyAmazon Web Services
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAmazon Web Services
 
Launch Your Game in the Cloud in Record Time
Launch Your Game in the Cloud in Record TimeLaunch Your Game in the Cloud in Record Time
Launch Your Game in the Cloud in Record TimeRightScale
 
Charting New Waters: Data Integration Excellence for Port & Marine Operations
Charting New Waters: Data Integration Excellence for Port & Marine OperationsCharting New Waters: Data Integration Excellence for Port & Marine Operations
Charting New Waters: Data Integration Excellence for Port & Marine Operationsmarketing932765
 
PayPal Real Time Analytics
PayPal  Real Time AnalyticsPayPal  Real Time Analytics
PayPal Real Time AnalyticsAnil Madan
 
Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...
Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...
Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...Databricks
 
AWS re:Invent 2016: Media Delivery from the Cloud: Integrated AWS Solutions f...
AWS re:Invent 2016: Media Delivery from the Cloud: Integrated AWS Solutions f...AWS re:Invent 2016: Media Delivery from the Cloud: Integrated AWS Solutions f...
AWS re:Invent 2016: Media Delivery from the Cloud: Integrated AWS Solutions f...Amazon Web Services
 
AWS Data-Driven Insights Learning Series_ANZ Sep 2019 Part 2
AWS Data-Driven Insights Learning Series_ANZ Sep 2019 Part 2AWS Data-Driven Insights Learning Series_ANZ Sep 2019 Part 2
AWS Data-Driven Insights Learning Series_ANZ Sep 2019 Part 2Amazon Web Services
 
Sales Acceleration with Marketing Automation
Sales Acceleration with Marketing AutomationSales Acceleration with Marketing Automation
Sales Acceleration with Marketing AutomationBMA Carolinas
 
Digital Servicing Using Artificial Intelligence
Digital Servicing Using Artificial IntelligenceDigital Servicing Using Artificial Intelligence
Digital Servicing Using Artificial IntelligenceRené Werner
 
apidays LIVE JAKARTA - Event Driven APIs by Phil Scanlon
apidays LIVE JAKARTA - Event Driven APIs by Phil Scanlonapidays LIVE JAKARTA - Event Driven APIs by Phil Scanlon
apidays LIVE JAKARTA - Event Driven APIs by Phil Scanlonapidays
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkDatabricks
 

Similar to AWS Activate Webinar - Growing on AWS (20)

Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...
Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...
Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...
 
Tweak Geeks #FOS15
Tweak Geeks #FOS15Tweak Geeks #FOS15
Tweak Geeks #FOS15
 
Enterprise Marketplace Powered by Sitecore Experience Cloud
Enterprise Marketplace Powered by Sitecore Experience CloudEnterprise Marketplace Powered by Sitecore Experience Cloud
Enterprise Marketplace Powered by Sitecore Experience Cloud
 
5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS
 
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
 
AI & AWS DeepComposer
AI & AWS DeepComposerAI & AWS DeepComposer
AI & AWS DeepComposer
 
Track 3 Session 2_從傳統 legacy 邁向數位化與現代化架構
Track 3 Session 2_從傳統  legacy  邁向數位化與現代化架構Track 3 Session 2_從傳統  legacy  邁向數位化與現代化架構
Track 3 Session 2_從傳統 legacy 邁向數位化與現代化架構
 
presentation slides
presentation slidespresentation slides
presentation slides
 
AWS Stripe Meetup - Powering UK Startup Economy
AWS Stripe Meetup - Powering UK Startup EconomyAWS Stripe Meetup - Powering UK Startup Economy
AWS Stripe Meetup - Powering UK Startup Economy
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
Launch Your Game in the Cloud in Record Time
Launch Your Game in the Cloud in Record TimeLaunch Your Game in the Cloud in Record Time
Launch Your Game in the Cloud in Record Time
 
Charting New Waters: Data Integration Excellence for Port & Marine Operations
Charting New Waters: Data Integration Excellence for Port & Marine OperationsCharting New Waters: Data Integration Excellence for Port & Marine Operations
Charting New Waters: Data Integration Excellence for Port & Marine Operations
 
PayPal Real Time Analytics
PayPal  Real Time AnalyticsPayPal  Real Time Analytics
PayPal Real Time Analytics
 
Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...
Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...
Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...
 
AWS re:Invent 2016: Media Delivery from the Cloud: Integrated AWS Solutions f...
AWS re:Invent 2016: Media Delivery from the Cloud: Integrated AWS Solutions f...AWS re:Invent 2016: Media Delivery from the Cloud: Integrated AWS Solutions f...
AWS re:Invent 2016: Media Delivery from the Cloud: Integrated AWS Solutions f...
 
AWS Data-Driven Insights Learning Series_ANZ Sep 2019 Part 2
AWS Data-Driven Insights Learning Series_ANZ Sep 2019 Part 2AWS Data-Driven Insights Learning Series_ANZ Sep 2019 Part 2
AWS Data-Driven Insights Learning Series_ANZ Sep 2019 Part 2
 
Sales Acceleration with Marketing Automation
Sales Acceleration with Marketing AutomationSales Acceleration with Marketing Automation
Sales Acceleration with Marketing Automation
 
Digital Servicing Using Artificial Intelligence
Digital Servicing Using Artificial IntelligenceDigital Servicing Using Artificial Intelligence
Digital Servicing Using Artificial Intelligence
 
apidays LIVE JAKARTA - Event Driven APIs by Phil Scanlon
apidays LIVE JAKARTA - Event Driven APIs by Phil Scanlonapidays LIVE JAKARTA - Event Driven APIs by Phil Scanlon
apidays LIVE JAKARTA - Event Driven APIs by Phil Scanlon
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

AWS Activate Webinar - Growing on AWS

  • 1. Growing on Amazon Web Services Abhishek Sinha Amazon Web Services @abysinha
  • 3. Growth  Hacking Growth  hacking  is  a  marke9ng  technique  developed  by  technology   startups  which  uses  crea9vity,  analy9cal  thinking,  and  social  metrics  to   sell  products  and  gain  exposure At  Airbnb,  we  look  into  all  possible  ways  to   improve  our  product  and  user  experience.   OCen  9mes  this  involves  lots  of  analy9cs   behind  the  scene.”
  • 4.
  • 5. Learn  and   Iterate   MVP  Hypothesis  
  • 6. Learn  and   Iterate   MVP  Hypothesis   Hosts  with  professional  photography  will  get  more  business.   And  hosts  will  sign  up  for  professional  photography  as  a  service.”   Build  a  MVP  –  20  Photographers     Saw  the  proverbial  “Hockey  SEck”  
  • 7. Airbnb  then  scaled  the  Idea •  Professional  Photography  Services   •  Increased  the  requirements  of  Photo  Quality   •  Watermarked  Photos  for  authen@city   •  Key  Metrics  Tracked  –  “Shoots  per  month”   •  April  2012  –  5000  shoots  per  month   •  Growth  can  some@mes  come  from  unexpected  areas    
  • 9.
  • 10. Growth  hacking  is  a  marke9ng   technique  developed  by  technology   startups  which  uses  crea9vity,   analy9cal  thinking,  and  social  metrics   to  sell  products  and  gain  exposure BUILD-­‐MEASURE-­‐LEARN The  fundamental  ac9vity  of  a   startup  is  to  turn  ideas  into   products,  measure  how  customers   respond,  and  then  learn  whether   to  pivot  or  persevere.  All  successful   startup  processes  should  be   geared  to  accelerate  that  feedback   loop.
  • 11.
  • 12. In  a  startup,  the  purpose  of  analy@cs  is   to  iterate  to  product/market  fit  before   the  money  runs  out   -­‐  Lean  analy@cs  by  Alistair  Croll  and  Ben  Yoskowitz  
  • 13.
  • 15. What  do  these  metrics  look  like  ? Depends  upon  what  stage  your  startup  is  at     And  what  is  your  favorite  analyEcs  framework  ?  
  • 16. Dave  Mcclure  Pirate  Metrics Source  :  hIp://www.slideshare.net/dmc500hats/startup-­‐metrics-­‐for-­‐pirates-­‐long-­‐version  
  • 17. Lean  Analy9cs  Stages Credits  –  Alistair  Croll  and  Ben  Yoskovitz  
  • 18. One  Metric  that  maXers   f(stage,  business)  =  metric  that   maIers     Bit.ly/BigLeanTable     Credits  –  Alistair  Croll  and  Ben  Yoskovitz  
  • 19. Example  –  E-­‐commerce Stage   Metrics   Empathy   How  do  buyers  become  aware  of  the  need  ?   How  do  they  try  to  find  the  solu@on?  What  pain  do  they  encounter  as  a  result?   What  are  their  demographics  and  tech  profiles?   S@ckiness   Conversion,  Shopping  cart  size   Acquisi@on  :  cost  of  finding  new  buyers     Loyalty  :  Percent  of  buyers  who  return  in  90  days   Virality   Acquisi@on  mode:  customer  acquisi@on  cost,  volume  of  sharing   Loyalty  model:  ability  to  reac@vate,  volume  of  buyers  who  return   Revenue   Transac@on  value,  revenue  per  customer,  ra@o  of  acquisi@on  cost  to  LTV,  direct   sales  metrics   Scale   Affiliates,  Channels,  white-­‐label  product  ra@ngs,  reviews,  support  costs,  return  RMA   and  refunds,  channel  conflict   Source:  Bit.ly/BigLeanTable  
  • 20.
  • 21. Our  Journey  Today Lean Which  one  should  I  focus  on  ? Preferably  one  (bit.ly/BigLeanTable) 2 Metrics What  do  they  look  like  ? Depends  upon  stage  and  type  of  startup 1
  • 22. Where  do  I  get  these  metrics  from  ?  
  • 23.
  • 24. Logs  –  Used  for  and  Types… • Opera9onal  Metrics   • Applica9on/Business   related  metrics   •  Opera9ng  system  logs •  Web  Server  Logs •  Database  logs •  CDN  Logs •  Applica9on  Logs
  • 25. User  Engagement  in  Online  Video [Source: Conviva Viewer Experience Report – 2013]
  • 26. Requirements  for  Gaming  company Cost  Analysis Data  transfer •  By  date/9me •  By  edge  loca9on •  By  date/9me  within   an  edge  loca9on •  By  top  X  URLs •  By  HTTP  vs.  HTTPS Marke9ng Top  URLs •  As-­‐is  count •  By  content  type •  By  edge  loca9on •  By  edge  loca9on  and   content  type Requests  served •  By  edge  loca9on Revenue •  By  edge  loca9on Top  games •  By  age •  By  income •  By  gender Opera9ons Error  rates •  By  top  X  URLs •  By  edge  loca9on •  By  edge  loca9on  and   content  type Revenue Top  games •  By  revenue •  By  edge  loca9on  and   revenue Top  ads •  That  lead  to  a  game   purchase
  • 27. Requirements  for  Gaming  company Cost  Analysis   Data  transfer • By  date/9me • By  edge  loca9on • By  date/9me  within  an   edge  loca9on • By  top  X  URLs • By  HTTP  vs.  HTTPS Cloudfront  logs Web  Server  Logs
  • 28. Available  Data  Sources  (Gaming) Metric Sources Data  transfer  by  date/@me CloudFront  logs Data  transfer  by  edge  loca@on CloudFront  logs Data  transfer  by  date/@me  within  an  edge  loca@on CloudFront  logs Data  transfer  by  top  x  URLs CloudFront  logs,  web  servers  logs Data  transfer  by  hXp  vs  HTTPS CloudFront  logs Top  URLs CloudFront  logs,  web  servers  logs Top  URLs  by  Content  Type CloudFront  logs Top  URLs  by  Edge  Loca@on CloudFront  logs Top  URLs  by  Edge  Loca@on  and  Content  Type CloudFront  logs Error  rates  by  top  x  URLs CloudFront  logs,  web  servers  logs Error  rate  by  edge  loca@on CloudFront  logs Error  Rate  by  edge  loca@on  and  content  type CloudFront  logs Requests  served  by  edge  loca@on CloudFront  logs Revenue  by  edge  loca@on CloudFront  logs,  OrdersDB,  app  servers  logs Top  games  segmented  by  age CloudFront  logs,  user  profile Top  games  segmented  by  income CloudFront  logs,  user  profile Top  games  segmented  by  gender CloudFront  logs,  user  profile Top  games  by  revenue CloudFront  logs,  OrdersDB Top  games  by  edge  loca@on  and  revenue CloudFront  logs,  OrdersDB Top  game  revenue  segmented  by  age CloudFront  logs,  OrdersDB,  user  profile
  • 29. Our  Journey  Today Lean Which  one  should  I  focus  on  ? Preferably  one  (bit.ly/BigLeanTable) 2 Metrics What  do  they  look  like  ? Depends  upon  stage  and  type  of  startup 1 Where  do  I  find  them  ? They  are  all  hidden  in  your  logs  (So  don’t   throw  away  logs  to  create  disk  space  !) 3
  • 30. How  to  process  logs  on  AWS  
  • 31. CloudFront  Access  Log  Format #Version: 1.0 #Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query 2012-05-25 22:01:30 AMS1 4448 94.212.249.78 GET d1234567890213.cloudfront.net /YT0KthT/F5SOWdDPqNqQF07tiTOXqJMpfD dlb3LMwv3/jP3/CINm/yDSy0MsRcWJN/Simutrans.exe 200 http://AtRJw2kxg0EMW.com/kZetr/YCb6AM9N2xt2 Mozilla/ 5.0%20(compatible;%20M SIE%209.0;%20Windows%20NT%206.1;%20WOW64;%20Trident/5.0) uid=100&oid=108625181 2012-05-25 22:01:30 AMS1 4952 94.212.249.78 GET d1234567890213.cloudfront.net /66IG584/ CPCxY0P44BGb5ZOd3qSUrauL05 0LOvFwaMj/eH/caw/Blob Wars-Blob And Conquer.exe 200 http://AtRJw2kxg0EMW.com/kZetr/YCb6AM9N2xt2 Mozilla/ 5.0%20(compatible;%20M SIE%209.0;%20Windows%20NT%206.1;%20WOW64;%20Trident/5.0) uid=100&oid=108625184 2012-05-25 22:01:30 AMS1 4556 78.8.5.135 GET d1234567890213.cloudfront.net /SwlufjC/ xEjH3BRbXMXwmFWqzKt7od6tlW R3e13LhmH/V3eF/lo6g/AstroMenace.exe 200 http://AtRJw2kxg0EMW.com/AC1vg/1727EWfb7fPt Opera/9.80%20(Windows %20NT%205.1;%20U;%20pl)%2 0Presto/2.10.229%20Version/11.60 uid=100&oid=108625189 2012-05-25 22:01:30 AMS1 47172 78.8.5.135 GET d1234567890213.cloudfront.net /Di1cXoN/ TskldkSHcgkvZXQEmv5vOVR25X 5UTisFkRq/pQa/wCjUXZb/Z1HRuGlo/Kroz.exe 200 http://AtRJw2kxg0EMW.com/AC1vg/1727EWfb7fPt Opera/ 9.80%20(Windows%20NT%205.1;%20U; %20pl)%20Presto/2.10.229%20Version/11.60 uid=100&oid=108625206
  • 32. Sample  Your  Data  with  R > sample_data <- read.delim(”SampleFiles/E123ABCDEF.2012-05-25-22.NEfbhLN3", header=F) > sample_data <- sample_data[-1:-2,] > View(sample_data) > m <- ggplot(sample_data, aes(x = factor(V9))) > m + geom_histogram() + scale_y_log10() + xlab('Error Codes') + ylab('log(Frequency)')
  • 33. Complete  Rstudio  Interface Model   vCPU   Mem   (GiB)   SSD   Storage   (GB)   r3.large   2   15   1  x  32     r3.xlarge   4   30.5   1  x  80     r3.2xlarge   8   61   1  x  160     r3.4xlarge   16   122   1  x  320     r3.8xlarge     32     244     2  x  320    
  • 34. Our  Journey  Today Lean Which  one  should  I  focus  on  ? Preferably  one  (bit.ly/BigLeanTable) 2 Metrics What  do  they  look  like  ? Depends  upon  stage  and  type  of  startup 1 Where  do  I  find  them  ? They  are  all  hidden  in  your  logs  (So  don’t   throw  away  logs  to  create  disk  space  !) 3 How  do  I  process  these  logs  ? Simple  tools  like  awk/sed,  SQL,  R 4
  • 35.
  • 36. Two  approaches  to  Scale  your  log   processing 1.  DIY   2.  Use  prepackaged  3rd  party  soCware
  • 37. 3rd  Party  Tools •  Sumologic •  Loggly •  SnowPlow  analy9cs •  Papertrail •  Logstash  +  Kibana  +  elas9cSearch •  Log.io •  Treasure  Data and  many  more  solu9ons  in  the  market  with  varied  levels  of  depth
  • 38. Our  Journey  Today Lean Which  one  should  I  focus  on  ? Preferably  one  (bit.ly/BigLeanTable) 2 Metrics What  do  they  look  like  ? Depends  upon  stage  and  type  of  startup 1 Where  do  I  find  them  ? They  are  all  hidden  in  your  logs  (So  don’t   throw  away  logs  to  create  disk  space  !) 3 How  do  I  process  these  logs  ? Simple  tools  like  awk/sed,  SQL,  R 4 What  if  I  have  too  many  logs  ?  How  do  I  scale   processing Get  a  3rd  party  tool  or  build  it  yourself 5
  • 39. DIY  Scalable  Log  Processing  Plahorm
  • 40. Data  Analy9cs  Plahorm Log  shipping   and   aggrega@on   Storage   Transforma@on   Analysis   Visualiza@on  
  • 41. Log  shipping   and   aggrega@on   Storage   Transforma@on   Analysis   Visualiza@on   Data  Analy9cs  Plahorm
  • 42. Collec9on  of  Data Sources   Aggrega@on  and   shipping     Tool   Data  Sink   Web  Servers   Applica@on  servers   Connected  Devices   Mobile  Phones   Etc   Scalable  method  to  collect   and  aggregate   Flume,  Kaja,  Kinesis,   Queue   Reliable  and  durable   des@na@on  OR   Des@na@ons    
  • 43. 43 Run  your  own  log  collector   Your  applicaEon   Amazon S3 DynamoDB   Any  other  data   store   Amazon S3 Amazon  EC2     1
  • 44. Use  a  Queue   Amazon  Simple   Queue  Service   (SQS)   Amazon S3 DynamoDB   Any  other  data   store   2
  • 45. Use  a  Tool  like  FLUME,  Fluentd,KAFKA,  HONU   etc Flume, Fluentd running on EC2 Amazon S3 Any  other  data   store   HDFS 4
  • 46.  Data   Sources   App.4     [Machine   Learning]                                       AWS  Endpoint   App.1     [Aggregate  &   De-­‐Duplicate]    Data   Sources   Data   Sources    Data   Sources   App.2     [Metric   ExtracEon]   S3 DynamoDB   Redshift App.3   [Sliding   Window   Analysis]    Data   Sources   Availability Zone Shard  1   Shard  2   Shard  N   Availability Zone Availability Zone Introducing  Amazon  Kinesis     Managed  Service  for  Real-­‐Time  Processing  of  Big  Data   EMR
  • 47. 47 Easy  AdministraEon       Managed  service  for  real-­‐@me  streaming  data   collec@on,  processing  and  analysis.  Simply   create  a  new  stream,  set  the  desired  level  of   capacity,  and  let  the  service  handle  the  rest.         Real-­‐Eme  Performance         Perform  con@nual  processing  on  streaming   big  data.  Processing  latencies  fall  to  a  few   seconds,  compared  with  the  minutes  or  hours   associated  with  batch  processing.             High  Throughput.  ElasEc         Seamlessly  scale  to  match  your  data   throughput  rate  and  volume.  You  can  easily   scale  up  to  gigabytes  per  second.  The  service   will  scale  up  or  down  based  on  your   opera@onal  or  business  needs.       S3,  EMR,  Storm,  Redshib,  &  DynamoDB   IntegraEon       Reliably  collect,  process,  and  transform  all  of   your  data  in  real-­‐@me  &  deliver  to  AWS  data   stores  of  choice,  with  Connectors  for  S3,   Redshil,  and  DynamoDB.           Build  Real-­‐Eme  ApplicaEons       Client  libraries  that  enable  developers  to   design  and  operate  real-­‐@me  streaming  data   processing  applica@ons.                   Low  Cost       Cost-­‐efficient  for  workloads  of  any  scale.  You   can  get  started  by  provisioning  a  small   stream,  and  pay  low  hourly  rates  only  for   what  you  use.               Amazon  Kinesis:  Key  Developer  Benefits
  • 48. Data  Analy9cs  Plahorm Log  shipping   and   aggrega@on   Storage   Transforma@on   Analysis   Visualiza@on  
  • 49. Choice  of  storage  systems  (Structure  and  Volume) Structure   Low  High   Large   Small   Size   S3   RDS   Dynamo  DB   NoSQL   EBS   1
  • 50. Choice  of  storage  systems  (Structure  and  Volume) Structure   Low  High   Large   Small   Size   S3   RDS   Dynamo  DB   NoSQL   EBS   1
  • 52. Data  Analy9cs  Plahorm Log  shipping   and   aggrega@on   Storage   Transforma@on   Analysis   Visualiza@on  
  • 53. Hadoop  based  Analysis Amazon  SQS   Amazon S3 DynamoDB   Any  SQL  or  NO  SQL   Store   Log  AggregaEon     tools   Amazon EMR
  • 54. Your  choice  of  tools  on  Hadoop/EMR Amazon  SQS   Amazon S3 DynamoDB   Any  SQL  or  NO  SQL   Store   Log  AggregaEon     tools   Amazon EMR
  • 55. Pig  for  Access  Logs  Analysis RAW_LOG = LOAD 's3://myoutputbucket/aggregate/' AS (ts:chararray, url:chararray…); LOGS_BASE_F = FILTER RAW_LOG BY url MATCHES '^GET /__track.*$’; LOGS_BASE_F_W_PARAM = FOREACH LOGS_BASE_F GENERATE url, DATE_TIME(ts, 'dd/MMM/yyyy:HH:mm:ss Z') as dt, SUBSTRING(DATE_TIME(ts, 'dd/MMM/yyyy:HH:mm:ss Z') ,0, 10 ) as day, … status, REGEX_EXTRACT(url, '^GET /([^?]+)', 1) AS action: chararray, REGEX_EXTRACT(url, 'idt=([^&]+)', 1) AS idt: chararray, REGEX_EXTRACT(url, 'idc=([^&]+)', 1) AS idc: chararray; I1 = FILTER LOGS_BASE_F_W_PARAM by action == 'clic' or action == 'display'; LOGS_SHORT = FOREACH I1 GENERATE uuid, action, dt, day, ida, idas, act, idp, idcmp ,idc; G1 = GROUP LOGS_SHORT BY (uuid,idc); store G1 into ‘s3://mybucket/sessions/’; Load  and  Filter   (cat  /  grep)   Parse   (awk)   Store   (>)  
  • 56. Data  analy9cs  Plahorm Log  shipping   and   aggrega@on   Storage   Transforma@on   Analysis   Visualiza@on  
  • 57. Hadoop is good for 1.  Ad Hoc Query analysis 2.  Large Unstructured Data Sets 3.  Machine Learning and Advanced Analytics 4.  Schema less
  • 58. SQL based processing for unstructured data Amazon  SQS   Amazon S3 DynamoDB   Any  SQL  or  NO  SQL   Store   Log  AggregaEon     tools   Amazon EMR Amazon Redshift Pre-processing framework Petabyte scale Columnar Data - warehouse
  • 59. You  might  not  need  pre-­‐processing  (e.g.  JSON,  CSV) Amazon  SQS   Amazon S3 DynamoDB   Any  SQL  or  NO  SQL   Store   Log  AggregaEon     tools   Amazon Redshift Petabyte scale Columnar Data - warehouse
  • 60. COPY  into  Amazon  RedshiC create table cf_logs ( d date, t char(8), edge char(4), bytes int, cip varchar(15), verb char(3), distro varchar(MAX), object varchar(MAX), status int, Referer varchar(MAX), agent varchar(MAX), qs varchar(MAX) ) copy cf_logs from 's3://big-data/logs/E123ABCDEF/' credentials 'aws_access_key_id=<key_id>;aws_secret_access_key=<secret_key>' IGNOREHEADER 2 GZIP DELIMITER 't' DATEFORMAT 'YYYY-MM-DD'
  • 61. But  Data  Warehouses  is  for  Enterprises  ?
  • 62. Rela@onal  data  warehouse   Massively  parallel   Petabyte  scale   Fully  managed;  zero  admin   Low  cost  point   Open  Interface   Amazon     Redshil   Redshift is Data-warehouse done the AWS Way
  • 63. Your choice of BI Tools on the cloud Amazon  SQS   Amazon S3 DynamoDB   Any  SQL  or  NO  SQL   Store   Log  AggregaEon     tools   Amazon EMR Amazon Redshift Pre-processing framework
  • 64.
  • 65. Choose  Your  Favorite   Visualiza9on  Tool Tableau  (Windows  instance)   R   Jaspersol   QlikView   MicroStrategy   SiSense   …  
  • 66. Our  Journey  Today Lean Which  one  should  I  focus  on  ? Preferably  one  (bit.ly/BigLeanTable) 2 Metrics What  do  they  look  like  ? Depends  upon  stage  and  type  of  startup 1 Where  do  I  find  them  ? They  are  all  hidden  in  your  logs  (So  don’t   throw  away  logs  to  create  disk  space  !) 3 How  do  I  process  these  logs  ? Simple  tools  like  awk/sed,  SQL,  R 4 What  if  I  have  too  many  logs  ?  How  do  I  scale   processing Get  a  3rd  party  tool  or  build  it  yourself 5 How  do  I  build  a  log  analy9cs  plahorm   myself 1.  Ship  and  aggregate  your  logs  using  either   Flume,  Kinesis,  Fluentd  and  store  them  in  S3 2.  Process  them  using  Hadoop  (EMR)  or  RedshiC 3.  Run  your  our  visualiza9on  tool  on  it 6
  • 67. Standing  on  shoulder  of  Giants “With  Amazon  RedshiC  and  Tableau,  anyone  in  the  company  can  set  up  any  queries  they  like—from   how  users  are  reac9ng  to  a  feature,  to  growth  by  demographic  or  geography,  to  the  impact  sales   efforts  have  had  in  different  areas.  It’s  very  flexible,” “Using  Amazon  Elas9c  MapReduce  Yelp  was  able  to  save  $55,000  in  upfront  hardware  costs  and  get   up  and  running  in  a  marer  of  days  not  months.  However,  most  important  to  Yelp  is  the  opportunity   cost.  “With  AWS,  our  developers  can  now  do  things  they  couldn’t  before,”  says  Marin.  “Our  systems   team  can  focus  their  energies  on  other  challenges” “Ini9ally  we  used  Amazon  RedshiC  as  a  data  mart  for  the  data  science  team.  Now,  it  is  increasingly   used  for  produc9on  data  mart  tasks  such  as  providing  our  marke9ng  department  with  fresh  data  to   make  informed  decisions  and  automa9cally  op9mize  our  adver9sing,"  said  Cooper  McGuire,   Managing  Director,  at  Zalora.  "Addi9onally,  Amazon  RedshiC  is  simple  to  use  and  reliable.  With  one   click,  we  can  rapidly  scale  up  or  down  in  real  9me  in  alignment  with  business  requirements.  We  have   been  able  to  eliminate  significant  maintenance  costs  and  overhead  associated  with  tradi9onal   solu9ons  and  external  consultants
  • 68. Finally,  a  Small  Warning Abraham  Wald  (1902-­‐1950)  
  • 70. In  Summary •  Growth  Hacking  =  Understanding  your  business  to  op9mize  it •  You  can’t  op9mize  what  you  don’t  measure   •  Logs  are  your  goldmine  –  they  contain  everything  you  want  to   measure •  S3  is  a  good  place  to  store  all  your  logs  because  of  Durability  and  Cost   •  Build  an  analy9cs  plahorm  that  enables  developers  and  analysts  to   gain  interes9ng  insights  with  the  choice  of  tool  they  want •  Most  Important  –  Innova9on  and  growth  will  come  from  areas  you   least  thought  it  could  !
  • 71. Thank  You  !   sinhaar@amazon.com   @abysinha