How Signpost uses MongoDB for Tracking and Analytics

8,929 views
9,178 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,929
On SlideShare
0
From Embeds
0
Number of Embeds
5,541
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • How Signpost uses MongoDB for Tracking and Analytics

    1. 1. MongoNYC
‐
June
7
2011Behind
the
Curtain
    2. 2. Signpost
‐
Who
are
we?• AdWords
meets
Groupon ― LocaDon
Based ― Merchant
and
User
Contributed
Deals ― Decentralized
Sales
PlaKorm 2
    3. 3. Who
am
I?• UDlity
Knife
Engineer• @maRnsler• hSp://github.com/maRnsler• hSp://www.maRnsler.com 3
    4. 4. Numbers• >
11M
Tracked
Requests• >
11M
Tracked
Events• 4
Developers• 2
Coffee
runs
/
Day 4
    5. 5. What
Makes
it
Work• Websites ― Java
(Tomcat/Spring/MyBaDs),
MySQL,
Memcache• AnalyDcs ― Java
(Tomcat/Cement/GuiceyMongo),
MongoDB,
Node.js 5
    6. 6. GuiceyMongo?
(A
liSle
shameless
self‐promoDon)• Enables
MongoDB
Database/CollecDon
config
in
GuiceInjector
injector
=
Guice.createInjector(



GuiceyMongo.configure("TEST")







.mapDatabase("MAIN").to("test")







.mapCollection("USER").to("user").inDatabase("MAIN"),




GuiceyMongo.configure("PROD")







.mapDatabase("MAIN").to("prod")







.mapCollection("USER").to("user").inDatabase("MAIN"),




GuiceyMongo.chooseConfiguration("TEST"));@Injectvoid
loadPerson(@GuiceyMongoCollection("person")
GuiceyCollection<Person>
personCollection)
{



Person
p
=
personCollection.findOne();



//
...} 6
    7. 7. GuiceyMongo?• Generates
Wrapper/Builder
code
from
a
simple
DDL Person
p
=
personCollection.findOne(); System.out(p.getName()); System.out(StringUtil.join(p.getAliasSet(),
",
")); if
(p.hasPicture())
{ 



Image
picture
=
ImageIO.read(p.getPictureInputStream()); 



//
do
something
with
the
picture }data Person { string name; set<string> alias; blob picture;} Person.Builder
p
=
Person.newBuilder() 















.setName("Matt
Insler") 















.addAlias("Matt") 















.addAlias("Guice
&
Mongo
Guru") 















.setPictureBucket("pictures"); ImageIO.write(picture,
format,
p.getPictureOutputStream()); personCollection.save(p.build()); 7
    8. 8. GuiceyMongo?• Supports
embedded
complex
objects,
lists,
maps,
sets,
enumsdata Contact { data InstantMessenger { data Address { enum Application { string street_1; AIM, string street_2; ICQ, string city; Jabber, string state; MSN, int zip_code; Yahoo } } [identity] string screen_name; string identity; string alias; IMApplication application; string first_name; } string last_name; map<string, Address> address; map<string, string> phone_number; map<string, string> email_address; map<string, InstantMessenger> instant_messenger; set<string> tag; blob picture;} 8
    9. 9. GuiceyMongo?• Proxies
Server‐Side
Methods
(Stored
Procedures)public
interface
ContactQuery
{



Contact
findContactByName(String
name);



List<Contact>
findContactsByIMAlias(String
alias);}Injector
injector
=
Guice.createInjector(







GuiceyMongo.configure("Test")











.mapDatabase("Main").to("test_db"),








GuiceyMongo.javascriptProxy(ContractQuery.class,
"Main"),








GuiceyMongo.chooseConfiguration("Test"));
@Injectvoid
exercise(ContactQuery
query)
{



query.findContactByName("Matt
Insler");} 9
    10. 10. Cement?• MVC
Framework
wriSen
for
Guice/MongoDB• Efficient
rouDng
and
built‐in
logging
and
excepDon
handling• Wrote
it
because
I
was
bored
and
I
could• Definitely
not
producDon‐ready
whatsoever 10
    11. 11. Enough
of
That• On
to
the
meat
and
potatoes 11
    12. 12. We
Track
Everything• Request
Tracking• Event
Tracking• ExcepDon
Tracking
/
Triage 12
    13. 13. We
Track
Everything• Request
Tracking• Event
Tracking• ExcepDon
Tracking
/
TriageEvery Fat-Fingered Form Submission 13
    14. 14. Request
Tracking• Queued
and
wriSen
from
a
low‐priority
thread
from
Java ― WriSen
to
MySQL
first ― WriSen
to
MongoDB
with
MySQL
Primary
Key
for
differenDaDon• Visitor
Key
‐
Cookied
browser/computer• Session
Key
‐
Per
User
session• Stripe
sessions
with
extra
informaDon ― acquisiDon/referrer
informaDon 14
    15. 15. AnalyDcs
Architecture
‐
Request
Tracking
FlowWeb MySQL MongoDB request 15
    16. 16. Request
Tracking
‐
What
we
use
this
for• Spying
on
you• Customer
Service• UX
Analysis• Funnel
Analysis• ExcepDon
tracing
and
debugging 16
    17. 17. User
AcDvity
‐
Visitor
Keys
(we’re
watching
you) 17
    18. 18. User
AcDvity
‐
Sessions
(we’re
watching
you)
 18
    19. 19. User
AcDvity
‐
Requests
(we’re
watching
you) 19
    20. 20. Request
objects{

"_id"
:
ObjectId("4de53548b09a6541874eb641"),

"method"
:
"GET",

"response_code"
:
200,

"parameters"
:
{



"raw"
:
"false",



"feedId"
:
"dealActivity",



"user‐opts"
:
"username,avatar",



"count"
:
"100",



"no‐cache"
:
"1306867013431",



"feedType"
:
"comment",



"userId"
:
"",



"targetUserId"
:
"",



"deal‐opts"
:
"id,location‐name,title,category",



"activity‐opts"
:
"id,user,type,rawtime,commentcontent",



"types"
:
"commenter",



"dealId"
:
"243844"

},

"user_agent"
:
"Mozilla/5.0
(Windows;
U;
Windows
NT
6.0;
en‐US;
rv:1.9.2.17)
Gecko/20110420
Firefox/3.6.17
(.NET
CLR
3.5.30729)",

"referer"
:
"http://www.signpost.com/deals/Boston‐MA/Cafe‐Gigu/‐5‐for‐10‐Worth‐of‐Food‐at‐Cafe‐Gigu‐/243844",

"visitor_key"
:
"6f6d7e229f2a06e6a0dcdac42b09b26d17dbfac2318be53bea72a996204e3731",

"uri"
:
"/api/recent‐activity",

"user_id"
:
null,

"session_key"
:
"008F75D5B7515B202DC1A610F9D86539",

"ip"
:
"123.28.156.46",

"response_time"
:
4,

"timestamp"
:
ISODate("2011‐05‐31T18:36:56.177Z")} 20
    21. 21. Request
Tracking
‐
Queries• Visitor
Keys
by
Userdb.request.mapReduce(function()
{

emit(this.visitor_key,
{start:
this.timestamp,
end:
this.timestamp});},
function(key,
values)
{

return
{start:
values[0].start,
end:
values[values.length
‐
1].end};},
{

query:
{user_id:
12345},

sort:
{timestamp:
1}});• Requests
by
Visitor
Keydb.request.find({visitor_key:
‘a3896b987c98d798e791432fff332’}).sort({create_time:
‐1}); 21
    22. 22. We
Track
Everything• Request
Tracking• Event
Tracking• ExcepDon
Tracking
/
Triage 22
    23. 23. Event
Tracking• Queued
and
wriSen
from
a
low‐priority
thread
from
Java ― WriSen
to
MySQL
first ― WriSen
to
MongoDB
with
MySQL
Primary
Key
for
differenDaDon• Events
are
wriSen
into
a
“raw”
collecDon• Event
Fixer
process
normalizes
events
into
“fixed”
collecDon ― Drops
bot
traffic ― Accounts
for
historical
naming
changes
(user
||
userId
||
u
‐>
user_id) ― Converts
Strings
to
Numbers
if
necessary
for
later
indexing• Event
Indexer
calculates
and
updates
“rollup”
collecDon 23
    24. 24. AnalyDcs
Architecture
‐
Event
FlowWeb MySQL MongoDB eventBI Event Fixer event.fixed Event Indexer event.rollup 24
    25. 25. Event
Tracking
‐
What
we
use
this
for• Email
Funnel
Analysis• Purchase/Signup
Funnels• Counts
or
Sums
of
event
permutaDons
over
Dme ― We
aggregate
seconds,
minutes,
days,
weeks,
months,
years• Top
Count/Sum
over
a
Dmeframe ― Such
as
top
5
deals
by
purchase
in
the
past
month 25
    26. 26. Event
Tracking
‐
Funnelsvar
email_funnel
=
{

steps:
[{



type:
event,



name:
Unique
Emails
Sent,



output:
length,



algorithm:
distinct,



algorithm_data:
event_properties.email_tracking_id,



query:
function()
{





var
query
=
{







event_name:
email‐sent,







event_properties.type:
Daily
Deal,







create_time:
{}





};





query.create_time[$gte]
=
startDate;





query.create_time[$lt]
=
endDate;





return
query;



}

},
{



type:
event,



name:
Unique
Sessions
with
an
Email
Click,



output:
length,



algorithm:
distinct,



algorithm_data:
session_key,



query:
{





event_name:
email‐click,





event_properties.email_tracking_id:
{$in:
${data[0].data}}



}

},
{



...

}]};execute_funnel(email_funnel); 26
    27. 27. Event
Tracking
‐
What
we
use
this
for• Email
Funnel
Analysis• Purchase/Signup
Funnels• Counts
or
Sums
of
event
permutaDons
over
Dme ― We
aggregate
seconds,
minutes,
days,
weeks,
months,
years• Top
Count/Sum
over
a
Gmeframe ― Such
as
top
5
deals
by
purchase
in
the
past
month 27
    28. 28. Event
Tracking
‐
What
have
we
sold
lots
of? * Sorry, we can’t show you what these numbers actually are. 28
    29. 29. Code?function
sum_top_n_by_range(from,
to,
event,
property,
count)
{

var
r
=
db.event.rollup.mapReduce(function
()
{



emit(this.k[0].v,
this.c);

},
function
(k,
v)
{



var
s
=
0;



for
(var
i
in
v)
{





s
+=
v[i];



}



return
s;

},
{



query:
{e:
event,
s:
day,
w:
{$gte:
from,
$lte:
to},
k:
{$size:
1},
k.k:
property}

});

//
take
the
top
count
property_values

var
keys
=
r.find().sort({value:
‐1}).limit(count).map(function
(o)
{return
o._id;});

r.drop();

var
cursor
=
db.event.rollup.find({



e:
event,
s:
hour,
w:
{$gte:from,
$lte:to},
k:
{$size:
1},



k.k:
property,
k.v:
{$in:
keys}

}).sort({w:
1});

var
result
=
{};

cursor.forEach(function(dataPoint)
{



//
organize
data
points
in
the
result
object
as
needed
by
your
output

});


return
result;} 29
    30. 30. Event
Tracking
‐
MySQL• Event
Table+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+|
event_tracking_id
|
visitor_key






|
session_key


|
event_name
|+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+|
















1
|
2928712...8447ac2
|
E2AEB...6E78C
|
dealClick

||
















2
|
30f3afc...72a504f
|
D16E7...C10DD
|
dealClick

|• Event
Property
Table+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐+|
event_tracking_property_id
|
event_tracking_id
|
event_key
|
event_value
|+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐+|

























1
|
















5
|
‘deal_id’
|



‘123887’
||

























2
|
















5
|



‘node’
|





‘web1’
||

























3
|
















5
|



‘algo’
|


‘default’
|• Querying
is
hard
and
can
be
slow• SlowQL 30
    31. 31. Events
‐
New
Hotness { 

"_id"
:
ObjectId("4de53c85c7d8327f56284f3a"), 

"visitor_key"
:
"50f46d60e00efe0a17c42c7fffdc68bda530572d1f83972a83d6c61d684ba3ce", 

"event_name"
:
"deal‐click", 

"event_properties"
:
{ 



"ca"
:
"dl", 



"deal_at_location_id"
:
98303, 



"session_ca"
:
"dl", 



"ct"
:
"?", 



"cr"
:
"?", 



"node"
:
"web2", 



"bot"
:
"false", 



"user_id"
:
163181, 



"guest"
:
"true", 



"session_cr"
:
"?", 



"session_ct"
:
"?" 

}, 

"session_key"
:
"1C9FA142A59465B745ECE2C1ACB15E62", 

"timestamp"
:
ISODate("2011‐05‐31T19:07:48.533Z"), 

"window"
:
{ 



"minute"
:
ISODate("2011‐05‐31T19:07:00Z"), 



"hour"
:
ISODate("2011‐05‐31T19:00:00Z"), 



"day"
:
ISODate("2011‐05‐31T00:00:00Z"), 



"week"
:
ISODate("2011‐05‐30T00:00:00Z"), 



"month"
:
ISODate("2011‐05‐01T00:00:00Z"), 



"year"
:
ISODate("2011‐01‐01T00:00:00Z") 

} } 31
    32. 32. Event
Rollups
‐
Can
you
say
finance
background?{

"_id"
:
ObjectId("4de54041c7d8327f56285563"),

"c"
:
1,

"e"
:
"deal‐click",

"k"
:
[{



"k"
:
"deal_at_location_id",



"v"
:
245931

}],

"s"
:
"month",

"w"
:
ISODate("2011‐05‐01T00:00:00Z")}• Allows
for
many
permutaDons
and
drill‐downs• Easy
to
add
new
tracking
aSributes,
just
by
adding
key/value
 pairs• Easy
to
index
(queries
like
lightning) 32
    33. 33. Tracking
Events
‐
Problems• Rollups
taking
up
too
much
space ― Need
to
move
to
document
format
from
Kyle
Banker’s
“The
MongoDB
 Gamut:
Four
ApplicaDon
Designs” ― Tell
business
to
track
fewer
permutaDons?
(yea
right)• Fixer
skipping
events ― Moved
from
{_id:
{$gt:
ObjectId(...)}}
to
{version:
{$lt:
5}}• Fixer
running
slow ― Moved
from
findAndModify
to
find(...).limit(...)
and
then
update
in
bulk ― {version:
{$lt:
5}}
is
slower
than
{version:
{$lte:
4}}
!!!• KEEP
INDEXES
IN
MEMORY!!! ― We
had
>
12G
indexes
on
a
7.5G
box 33
    34. 34. We
Track
Everything• Request
Tracking• Event
Tracking• ExcepGon
Tracking
/
Triage 34
    35. 35. ExcepDon
Tracking• Special
events
in
the
event
tracking
collecDon 35
    36. 36. Full
Stack
Traces,
Permalinks,
Request
Context 36
    37. 37. Triage• Group
excepDons
over
the
past
day,
week,
month 37
    38. 38. Make
Users
Happy! 38

    ×