3. Growth Hacking
Growth hacking is a marke9ng technique developed by technology
startups which uses crea9vity, analy9cal thinking, and social metrics to
sell products and gain exposure
At Airbnb, we look into all possible ways to
improve our product and user experience.
OCen 9mes this involves lots of analy9cs
behind the scene.”
6. Learn
and
Iterate
MVP
Hypothesis
Hosts
with
professional
photography
will
get
more
business.
And
hosts
will
sign
up
for
professional
photography
as
a
service.”
Build
a
MVP
–
20
Photographers
Saw
the
proverbial
“Hockey
SEck”
7. Airbnb then scaled the Idea
• Professional
Photography
Services
• Increased
the
requirements
of
Photo
Quality
• Watermarked
Photos
for
authen@city
• Key
Metrics
Tracked
–
“Shoots
per
month”
• April
2012
–
5000
shoots
per
month
• Growth
can
some@mes
come
from
unexpected
areas
10. Growth hacking is a marke9ng
technique developed by technology
startups which uses crea9vity,
analy9cal thinking, and social metrics
to sell products and gain exposure
BUILD-‐MEASURE-‐LEARN
The fundamental ac9vity of a
startup is to turn ideas into
products, measure how customers
respond, and then learn whether
to pivot or persevere. All successful
startup processes should be
geared to accelerate that feedback
loop.
11.
12. In
a
startup,
the
purpose
of
analy@cs
is
to
iterate
to
product/market
fit
before
the
money
runs
out
-‐
Lean
analy@cs
by
Alistair
Croll
and
Ben
Yoskowitz
18. One
Metric
that
maXers
f(stage,
business)
=
metric
that
maIers
Bit.ly/BigLeanTable
Credits
–
Alistair
Croll
and
Ben
Yoskovitz
19. Example – E-‐commerce
Stage
Metrics
Empathy
How
do
buyers
become
aware
of
the
need
?
How
do
they
try
to
find
the
solu@on?
What
pain
do
they
encounter
as
a
result?
What
are
their
demographics
and
tech
profiles?
S@ckiness
Conversion,
Shopping
cart
size
Acquisi@on
:
cost
of
finding
new
buyers
Loyalty
:
Percent
of
buyers
who
return
in
90
days
Virality
Acquisi@on
mode:
customer
acquisi@on
cost,
volume
of
sharing
Loyalty
model:
ability
to
reac@vate,
volume
of
buyers
who
return
Revenue
Transac@on
value,
revenue
per
customer,
ra@o
of
acquisi@on
cost
to
LTV,
direct
sales
metrics
Scale
Affiliates,
Channels,
white-‐label
product
ra@ngs,
reviews,
support
costs,
return
RMA
and
refunds,
channel
conflict
Source:
Bit.ly/BigLeanTable
20.
21. Our Journey Today
Lean
Which one should I focus on ?
Preferably one (bit.ly/BigLeanTable)
2
Metrics
What do they look like ?
Depends upon stage and type of startup
1
24. Logs – Used for and Types…
• Opera9onal Metrics
• Applica9on/Business
related metrics
• Opera9ng system logs
• Web Server Logs
• Database logs
• CDN Logs
• Applica9on Logs
25. User Engagement in Online Video
[Source: Conviva Viewer Experience Report – 2013]
26. Requirements for Gaming company
Cost Analysis
Data transfer
• By date/9me
• By edge loca9on
• By date/9me within
an edge loca9on
• By top X URLs
• By HTTP vs. HTTPS
Marke9ng
Top URLs
• As-‐is count
• By content type
• By edge loca9on
• By edge loca9on and
content type
Requests served
• By edge loca9on
Revenue
• By edge loca9on
Top games
• By age
• By income
• By gender
Opera9ons
Error rates
• By top X URLs
• By edge loca9on
• By edge loca9on and
content type
Revenue
Top games
• By revenue
• By edge loca9on and
revenue
Top ads
• That lead to a game
purchase
27. Requirements for Gaming company
Cost
Analysis
Data transfer
• By date/9me
• By edge loca9on
• By date/9me within an
edge loca9on
• By top X URLs
• By HTTP vs. HTTPS
Cloudfront logs
Web Server Logs
28. Available Data Sources (Gaming)
Metric Sources
Data
transfer
by
date/@me CloudFront
logs
Data
transfer
by
edge
loca@on CloudFront
logs
Data
transfer
by
date/@me
within
an
edge
loca@on CloudFront
logs
Data
transfer
by
top
x
URLs CloudFront
logs,
web
servers
logs
Data
transfer
by
hXp
vs
HTTPS CloudFront
logs
Top
URLs CloudFront
logs,
web
servers
logs
Top
URLs
by
Content
Type CloudFront
logs
Top
URLs
by
Edge
Loca@on CloudFront
logs
Top
URLs
by
Edge
Loca@on
and
Content
Type CloudFront
logs
Error
rates
by
top
x
URLs CloudFront
logs,
web
servers
logs
Error
rate
by
edge
loca@on CloudFront
logs
Error
Rate
by
edge
loca@on
and
content
type CloudFront
logs
Requests
served
by
edge
loca@on CloudFront
logs
Revenue
by
edge
loca@on CloudFront
logs,
OrdersDB,
app
servers
logs
Top
games
segmented
by
age CloudFront
logs,
user
profile
Top
games
segmented
by
income CloudFront
logs,
user
profile
Top
games
segmented
by
gender CloudFront
logs,
user
profile
Top
games
by
revenue CloudFront
logs,
OrdersDB
Top
games
by
edge
loca@on
and
revenue CloudFront
logs,
OrdersDB
Top
game
revenue
segmented
by
age CloudFront
logs,
OrdersDB,
user
profile
29. Our Journey Today
Lean
Which one should I focus on ?
Preferably one (bit.ly/BigLeanTable)
2
Metrics
What do they look like ?
Depends upon stage and type of startup
1
Where do I find them ?
They are all hidden in your logs (So don’t
throw away logs to create disk space !)
3
32. Sample Your Data with R
> sample_data <- read.delim(”SampleFiles/E123ABCDEF.2012-05-25-22.NEfbhLN3", header=F)
> sample_data <- sample_data[-1:-2,]
> View(sample_data)
> m <- ggplot(sample_data, aes(x = factor(V9)))
> m + geom_histogram() + scale_y_log10() + xlab('Error Codes') +
ylab('log(Frequency)')
33. Complete Rstudio Interface
Model
vCPU
Mem
(GiB)
SSD
Storage
(GB)
r3.large
2
15
1
x
32
r3.xlarge
4
30.5
1
x
80
r3.2xlarge
8
61
1
x
160
r3.4xlarge
16
122
1
x
320
r3.8xlarge
32
244
2
x
320
34. Our Journey Today
Lean
Which one should I focus on ?
Preferably one (bit.ly/BigLeanTable)
2
Metrics
What do they look like ?
Depends upon stage and type of startup
1
Where do I find them ?
They are all hidden in your logs (So don’t
throw away logs to create disk space !)
3
How do I process these logs ?
Simple tools like awk/sed, SQL, R
4
35.
36. Two approaches to Scale your log
processing
1. DIY
2. Use prepackaged 3rd party soCware
37. 3rd Party Tools
• Sumologic
• Loggly
• SnowPlow analy9cs
• Papertrail
• Logstash + Kibana + elas9cSearch
• Log.io
• Treasure Data
and many more solu9ons in the market with varied levels of depth
38. Our Journey Today
Lean
Which one should I focus on ?
Preferably one (bit.ly/BigLeanTable)
2
Metrics
What do they look like ?
Depends upon stage and type of startup
1
Where do I find them ?
They are all hidden in your logs (So don’t
throw away logs to create disk space !)
3
How do I process these logs ?
Simple tools like awk/sed, SQL, R
4
What if I have too many logs ? How do I scale
processing
Get a 3rd party tool or build it yourself
5
41. Log
shipping
and
aggrega@on
Storage
Transforma@on
Analysis
Visualiza@on
Data Analy9cs Plahorm
42. Collec9on of Data
Sources
Aggrega@on
and
shipping
Tool
Data
Sink
Web
Servers
Applica@on
servers
Connected
Devices
Mobile
Phones
Etc
Scalable
method
to
collect
and
aggregate
Flume,
Kaja,
Kinesis,
Queue
Reliable
and
durable
des@na@on
OR
Des@na@ons
43. 43
Run your own log collector
Your
applicaEon
Amazon S3
DynamoDB
Any
other
data
store
Amazon S3
Amazon
EC2
1
44. Use a Queue
Amazon
Simple
Queue
Service
(SQS)
Amazon S3
DynamoDB
Any
other
data
store
2
45. Use a Tool like FLUME, Fluentd,KAFKA, HONU
etc
Flume, Fluentd
running on
EC2
Amazon S3
Any
other
data
store
HDFS
4
46. Data
Sources
App.4
[Machine
Learning]
AWS
Endpoint
App.1
[Aggregate
&
De-‐Duplicate]
Data
Sources
Data
Sources
Data
Sources
App.2
[Metric
ExtracEon]
S3
DynamoDB
Redshift
App.3
[Sliding
Window
Analysis]
Data
Sources
Availability
Zone
Shard
1
Shard
2
Shard
N
Availability
Zone
Availability
Zone
Introducing Amazon Kinesis
Managed Service for Real-‐Time Processing of Big Data
EMR
47. 47
Easy
AdministraEon
Managed
service
for
real-‐@me
streaming
data
collec@on,
processing
and
analysis.
Simply
create
a
new
stream,
set
the
desired
level
of
capacity,
and
let
the
service
handle
the
rest.
Real-‐Eme
Performance
Perform
con@nual
processing
on
streaming
big
data.
Processing
latencies
fall
to
a
few
seconds,
compared
with
the
minutes
or
hours
associated
with
batch
processing.
High
Throughput.
ElasEc
Seamlessly
scale
to
match
your
data
throughput
rate
and
volume.
You
can
easily
scale
up
to
gigabytes
per
second.
The
service
will
scale
up
or
down
based
on
your
opera@onal
or
business
needs.
S3,
EMR,
Storm,
Redshib,
&
DynamoDB
IntegraEon
Reliably
collect,
process,
and
transform
all
of
your
data
in
real-‐@me
&
deliver
to
AWS
data
stores
of
choice,
with
Connectors
for
S3,
Redshil,
and
DynamoDB.
Build
Real-‐Eme
ApplicaEons
Client
libraries
that
enable
developers
to
design
and
operate
real-‐@me
streaming
data
processing
applica@ons.
Low
Cost
Cost-‐efficient
for
workloads
of
any
scale.
You
can
get
started
by
provisioning
a
small
stream,
and
pay
low
hourly
rates
only
for
what
you
use.
Amazon Kinesis: Key Developer Benefits
57. Hadoop is good for
1. Ad Hoc Query analysis
2. Large Unstructured Data Sets
3. Machine Learning and Advanced Analytics
4. Schema less
58. SQL based processing for unstructured data
Amazon
SQS
Amazon S3
DynamoDB
Any
SQL
or
NO
SQL
Store
Log
AggregaEon
tools
Amazon
EMR
Amazon
Redshift
Pre-processing
framework
Petabyte scale
Columnar Data -
warehouse
59. You might not need pre-‐processing (e.g. JSON, CSV)
Amazon
SQS
Amazon S3
DynamoDB
Any
SQL
or
NO
SQL
Store
Log
AggregaEon
tools
Amazon
Redshift
Petabyte scale
Columnar Data -
warehouse
60. COPY into Amazon RedshiC
create table cf_logs
( d date, t char(8), edge char(4), bytes int, cip varchar(15),
verb char(3), distro varchar(MAX), object varchar(MAX), status int,
Referer varchar(MAX), agent varchar(MAX), qs varchar(MAX) )
copy cf_logs from 's3://big-data/logs/E123ABCDEF/'
credentials 'aws_access_key_id=<key_id>;aws_secret_access_key=<secret_key>'
IGNOREHEADER 2
GZIP
DELIMITER 't'
DATEFORMAT 'YYYY-MM-DD'
62. Rela@onal
data
warehouse
Massively
parallel
Petabyte
scale
Fully
managed;
zero
admin
Low
cost
point
Open
Interface
Amazon
Redshil
Redshift is Data-warehouse done the AWS Way
63. Your choice of BI Tools on the cloud
Amazon
SQS
Amazon S3
DynamoDB
Any
SQL
or
NO
SQL
Store
Log
AggregaEon
tools
Amazon
EMR
Amazon
Redshift
Pre-processing
framework
64.
65. Choose Your Favorite
Visualiza9on Tool
Tableau
(Windows
instance)
R
Jaspersol
QlikView
MicroStrategy
SiSense
…
66. Our Journey Today
Lean
Which one should I focus on ?
Preferably one (bit.ly/BigLeanTable)
2
Metrics
What do they look like ?
Depends upon stage and type of startup
1
Where do I find them ?
They are all hidden in your logs (So don’t
throw away logs to create disk space !)
3
How do I process these logs ?
Simple tools like awk/sed, SQL, R
4
What if I have too many logs ? How do I scale
processing
Get a 3rd party tool or build it yourself
5
How do I build a log analy9cs plahorm
myself
1. Ship and aggregate your logs using either
Flume, Kinesis, Fluentd and store them in S3
2. Process them using Hadoop (EMR) or RedshiC
3. Run your our visualiza9on tool on it
6
67. Standing on shoulder of Giants
“With Amazon RedshiC and Tableau, anyone in the company can set up any queries they like—from
how users are reac9ng to a feature, to growth by demographic or geography, to the impact sales
efforts have had in different areas. It’s very flexible,”
“Using Amazon Elas9c MapReduce Yelp was able to save $55,000 in upfront hardware costs and get
up and running in a marer of days not months. However, most important to Yelp is the opportunity
cost. “With AWS, our developers can now do things they couldn’t before,” says Marin. “Our systems
team can focus their energies on other challenges”
“Ini9ally we used Amazon RedshiC as a data mart for the data science team. Now, it is increasingly
used for produc9on data mart tasks such as providing our marke9ng department with fresh data to
make informed decisions and automa9cally op9mize our adver9sing," said Cooper McGuire,
Managing Director, at Zalora. "Addi9onally, Amazon RedshiC is simple to use and reliable. With one
click, we can rapidly scale up or down in real 9me in alignment with business requirements. We have
been able to eliminate significant maintenance costs and overhead associated with tradi9onal
solu9ons and external consultants
70. In Summary
• Growth Hacking = Understanding your business to op9mize it
• You can’t op9mize what you don’t measure
• Logs are your goldmine – they contain everything you want to
measure
• S3 is a good place to store all your logs because of Durability and Cost
• Build an analy9cs plahorm that enables developers and analysts to
gain interes9ng insights with the choice of tool they want
• Most Important – Innova9on and growth will come from areas you
least thought it could !