How Signpost uses MongoDB for Tracking and Analytics

  • 6,281 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
6,281
On Slideshare
0
From Embeds
0
Number of Embeds
6

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Transcript

  • 1. MongoNYC
‐
June
7
2011Behind
the
Curtain
  • 2. Signpost
‐
Who
are
we?• AdWords
meets
Groupon ― LocaDon
Based ― Merchant
and
User
Contributed
Deals ― Decentralized
Sales
PlaKorm 2
  • 3. Who
am
I?• UDlity
Knife
Engineer• @maRnsler• hSp://github.com/maRnsler• hSp://www.maRnsler.com 3
  • 4. Numbers• >
11M
Tracked
Requests• >
11M
Tracked
Events• 4
Developers• 2
Coffee
runs
/
Day 4
  • 5. What
Makes
it
Work• Websites ― Java
(Tomcat/Spring/MyBaDs),
MySQL,
Memcache• AnalyDcs ― Java
(Tomcat/Cement/GuiceyMongo),
MongoDB,
Node.js 5
  • 6. GuiceyMongo?
(A
liSle
shameless
self‐promoDon)• Enables
MongoDB
Database/CollecDon
config
in
GuiceInjector
injector
=
Guice.createInjector(



GuiceyMongo.configure("TEST")







.mapDatabase("MAIN").to("test")







.mapCollection("USER").to("user").inDatabase("MAIN"),




GuiceyMongo.configure("PROD")







.mapDatabase("MAIN").to("prod")







.mapCollection("USER").to("user").inDatabase("MAIN"),




GuiceyMongo.chooseConfiguration("TEST"));@Injectvoid
loadPerson(@GuiceyMongoCollection("person")
GuiceyCollection<Person>
personCollection)
{



Person
p
=
personCollection.findOne();



//
...} 6
  • 7. GuiceyMongo?• Generates
Wrapper/Builder
code
from
a
simple
DDL Person
p
=
personCollection.findOne(); System.out(p.getName()); System.out(StringUtil.join(p.getAliasSet(),
",
")); if
(p.hasPicture())
{ 



Image
picture
=
ImageIO.read(p.getPictureInputStream()); 



//
do
something
with
the
picture }data Person { string name; set<string> alias; blob picture;} Person.Builder
p
=
Person.newBuilder() 















.setName("Matt
Insler") 















.addAlias("Matt") 















.addAlias("Guice
&
Mongo
Guru") 















.setPictureBucket("pictures"); ImageIO.write(picture,
format,
p.getPictureOutputStream()); personCollection.save(p.build()); 7
  • 8. GuiceyMongo?• Supports
embedded
complex
objects,
lists,
maps,
sets,
enumsdata Contact { data InstantMessenger { data Address { enum Application { string street_1; AIM, string street_2; ICQ, string city; Jabber, string state; MSN, int zip_code; Yahoo } } [identity] string screen_name; string identity; string alias; IMApplication application; string first_name; } string last_name; map<string, Address> address; map<string, string> phone_number; map<string, string> email_address; map<string, InstantMessenger> instant_messenger; set<string> tag; blob picture;} 8
  • 9. GuiceyMongo?• Proxies
Server‐Side
Methods
(Stored
Procedures)public
interface
ContactQuery
{



Contact
findContactByName(String
name);



List<Contact>
findContactsByIMAlias(String
alias);}Injector
injector
=
Guice.createInjector(







GuiceyMongo.configure("Test")











.mapDatabase("Main").to("test_db"),








GuiceyMongo.javascriptProxy(ContractQuery.class,
"Main"),








GuiceyMongo.chooseConfiguration("Test"));
@Injectvoid
exercise(ContactQuery
query)
{



query.findContactByName("Matt
Insler");} 9
  • 10. Cement?• MVC
Framework
wriSen
for
Guice/MongoDB• Efficient
rouDng
and
built‐in
logging
and
excepDon
handling• Wrote
it
because
I
was
bored
and
I
could• Definitely
not
producDon‐ready
whatsoever 10
  • 11. Enough
of
That• On
to
the
meat
and
potatoes 11
  • 12. We
Track
Everything• Request
Tracking• Event
Tracking• ExcepDon
Tracking
/
Triage 12
  • 13. We
Track
Everything• Request
Tracking• Event
Tracking• ExcepDon
Tracking
/
TriageEvery Fat-Fingered Form Submission 13
  • 14. Request
Tracking• Queued
and
wriSen
from
a
low‐priority
thread
from
Java ― WriSen
to
MySQL
first ― WriSen
to
MongoDB
with
MySQL
Primary
Key
for
differenDaDon• Visitor
Key
‐
Cookied
browser/computer• Session
Key
‐
Per
User
session• Stripe
sessions
with
extra
informaDon ― acquisiDon/referrer
informaDon 14
  • 15. AnalyDcs
Architecture
‐
Request
Tracking
FlowWeb MySQL MongoDB request 15
  • 16. Request
Tracking
‐
What
we
use
this
for• Spying
on
you• Customer
Service• UX
Analysis• Funnel
Analysis• ExcepDon
tracing
and
debugging 16
  • 17. User
AcDvity
‐
Visitor
Keys
(we’re
watching
you) 17
  • 18. User
AcDvity
‐
Sessions
(we’re
watching
you)
 18
  • 19. User
AcDvity
‐
Requests
(we’re
watching
you) 19
  • 20. Request
objects{

"_id"
:
ObjectId("4de53548b09a6541874eb641"),

"method"
:
"GET",

"response_code"
:
200,

"parameters"
:
{



"raw"
:
"false",



"feedId"
:
"dealActivity",



"user‐opts"
:
"username,avatar",



"count"
:
"100",



"no‐cache"
:
"1306867013431",



"feedType"
:
"comment",



"userId"
:
"",



"targetUserId"
:
"",



"deal‐opts"
:
"id,location‐name,title,category",



"activity‐opts"
:
"id,user,type,rawtime,commentcontent",



"types"
:
"commenter",



"dealId"
:
"243844"

},

"user_agent"
:
"Mozilla/5.0
(Windows;
U;
Windows
NT
6.0;
en‐US;
rv:1.9.2.17)
Gecko/20110420
Firefox/3.6.17
(.NET
CLR
3.5.30729)",

"referer"
:
"http://www.signpost.com/deals/Boston‐MA/Cafe‐Gigu/‐5‐for‐10‐Worth‐of‐Food‐at‐Cafe‐Gigu‐/243844",

"visitor_key"
:
"6f6d7e229f2a06e6a0dcdac42b09b26d17dbfac2318be53bea72a996204e3731",

"uri"
:
"/api/recent‐activity",

"user_id"
:
null,

"session_key"
:
"008F75D5B7515B202DC1A610F9D86539",

"ip"
:
"123.28.156.46",

"response_time"
:
4,

"timestamp"
:
ISODate("2011‐05‐31T18:36:56.177Z")} 20
  • 21. Request
Tracking
‐
Queries• Visitor
Keys
by
Userdb.request.mapReduce(function()
{

emit(this.visitor_key,
{start:
this.timestamp,
end:
this.timestamp});},
function(key,
values)
{

return
{start:
values[0].start,
end:
values[values.length
‐
1].end};},
{

query:
{user_id:
12345},

sort:
{timestamp:
1}});• Requests
by
Visitor
Keydb.request.find({visitor_key:
‘a3896b987c98d798e791432fff332’}).sort({create_time:
‐1}); 21
  • 22. We
Track
Everything• Request
Tracking• Event
Tracking• ExcepDon
Tracking
/
Triage 22
  • 23. Event
Tracking• Queued
and
wriSen
from
a
low‐priority
thread
from
Java ― WriSen
to
MySQL
first ― WriSen
to
MongoDB
with
MySQL
Primary
Key
for
differenDaDon• Events
are
wriSen
into
a
“raw”
collecDon• Event
Fixer
process
normalizes
events
into
“fixed”
collecDon ― Drops
bot
traffic ― Accounts
for
historical
naming
changes
(user
||
userId
||
u
‐>
user_id) ― Converts
Strings
to
Numbers
if
necessary
for
later
indexing• Event
Indexer
calculates
and
updates
“rollup”
collecDon 23
  • 24. AnalyDcs
Architecture
‐
Event
FlowWeb MySQL MongoDB eventBI Event Fixer event.fixed Event Indexer event.rollup 24
  • 25. Event
Tracking
‐
What
we
use
this
for• Email
Funnel
Analysis• Purchase/Signup
Funnels• Counts
or
Sums
of
event
permutaDons
over
Dme ― We
aggregate
seconds,
minutes,
days,
weeks,
months,
years• Top
Count/Sum
over
a
Dmeframe ― Such
as
top
5
deals
by
purchase
in
the
past
month 25
  • 26. Event
Tracking
‐
Funnelsvar
email_funnel
=
{

steps:
[{



type:
event,



name:
Unique
Emails
Sent,



output:
length,



algorithm:
distinct,



algorithm_data:
event_properties.email_tracking_id,



query:
function()
{





var
query
=
{







event_name:
email‐sent,







event_properties.type:
Daily
Deal,







create_time:
{}





};





query.create_time[$gte]
=
startDate;





query.create_time[$lt]
=
endDate;





return
query;



}

},
{



type:
event,



name:
Unique
Sessions
with
an
Email
Click,



output:
length,



algorithm:
distinct,



algorithm_data:
session_key,



query:
{





event_name:
email‐click,





event_properties.email_tracking_id:
{$in:
${data[0].data}}



}

},
{



...

}]};execute_funnel(email_funnel); 26
  • 27. Event
Tracking
‐
What
we
use
this
for• Email
Funnel
Analysis• Purchase/Signup
Funnels• Counts
or
Sums
of
event
permutaDons
over
Dme ― We
aggregate
seconds,
minutes,
days,
weeks,
months,
years• Top
Count/Sum
over
a
Gmeframe ― Such
as
top
5
deals
by
purchase
in
the
past
month 27
  • 28. Event
Tracking
‐
What
have
we
sold
lots
of? * Sorry, we can’t show you what these numbers actually are. 28
  • 29. Code?function
sum_top_n_by_range(from,
to,
event,
property,
count)
{

var
r
=
db.event.rollup.mapReduce(function
()
{



emit(this.k[0].v,
this.c);

},
function
(k,
v)
{



var
s
=
0;



for
(var
i
in
v)
{





s
+=
v[i];



}



return
s;

},
{



query:
{e:
event,
s:
day,
w:
{$gte:
from,
$lte:
to},
k:
{$size:
1},
k.k:
property}

});

//
take
the
top
count
property_values

var
keys
=
r.find().sort({value:
‐1}).limit(count).map(function
(o)
{return
o._id;});

r.drop();

var
cursor
=
db.event.rollup.find({



e:
event,
s:
hour,
w:
{$gte:from,
$lte:to},
k:
{$size:
1},



k.k:
property,
k.v:
{$in:
keys}

}).sort({w:
1});

var
result
=
{};

cursor.forEach(function(dataPoint)
{



//
organize
data
points
in
the
result
object
as
needed
by
your
output

});


return
result;} 29
  • 30. Event
Tracking
‐
MySQL• Event
Table+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+|
event_tracking_id
|
visitor_key






|
session_key


|
event_name
|+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+|
















1
|
2928712...8447ac2
|
E2AEB...6E78C
|
dealClick

||
















2
|
30f3afc...72a504f
|
D16E7...C10DD
|
dealClick

|• Event
Property
Table+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐+|
event_tracking_property_id
|
event_tracking_id
|
event_key
|
event_value
|+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐+|

























1
|
















5
|
‘deal_id’
|



‘123887’
||

























2
|
















5
|



‘node’
|





‘web1’
||

























3
|
















5
|



‘algo’
|


‘default’
|• Querying
is
hard
and
can
be
slow• SlowQL 30
  • 31. Events
‐
New
Hotness { 

"_id"
:
ObjectId("4de53c85c7d8327f56284f3a"), 

"visitor_key"
:
"50f46d60e00efe0a17c42c7fffdc68bda530572d1f83972a83d6c61d684ba3ce", 

"event_name"
:
"deal‐click", 

"event_properties"
:
{ 



"ca"
:
"dl", 



"deal_at_location_id"
:
98303, 



"session_ca"
:
"dl", 



"ct"
:
"?", 



"cr"
:
"?", 



"node"
:
"web2", 



"bot"
:
"false", 



"user_id"
:
163181, 



"guest"
:
"true", 



"session_cr"
:
"?", 



"session_ct"
:
"?" 

}, 

"session_key"
:
"1C9FA142A59465B745ECE2C1ACB15E62", 

"timestamp"
:
ISODate("2011‐05‐31T19:07:48.533Z"), 

"window"
:
{ 



"minute"
:
ISODate("2011‐05‐31T19:07:00Z"), 



"hour"
:
ISODate("2011‐05‐31T19:00:00Z"), 



"day"
:
ISODate("2011‐05‐31T00:00:00Z"), 



"week"
:
ISODate("2011‐05‐30T00:00:00Z"), 



"month"
:
ISODate("2011‐05‐01T00:00:00Z"), 



"year"
:
ISODate("2011‐01‐01T00:00:00Z") 

} } 31
  • 32. Event
Rollups
‐
Can
you
say
finance
background?{

"_id"
:
ObjectId("4de54041c7d8327f56285563"),

"c"
:
1,

"e"
:
"deal‐click",

"k"
:
[{



"k"
:
"deal_at_location_id",



"v"
:
245931

}],

"s"
:
"month",

"w"
:
ISODate("2011‐05‐01T00:00:00Z")}• Allows
for
many
permutaDons
and
drill‐downs• Easy
to
add
new
tracking
aSributes,
just
by
adding
key/value
 pairs• Easy
to
index
(queries
like
lightning) 32
  • 33. Tracking
Events
‐
Problems• Rollups
taking
up
too
much
space ― Need
to
move
to
document
format
from
Kyle
Banker’s
“The
MongoDB
 Gamut:
Four
ApplicaDon
Designs” ― Tell
business
to
track
fewer
permutaDons?
(yea
right)• Fixer
skipping
events ― Moved
from
{_id:
{$gt:
ObjectId(...)}}
to
{version:
{$lt:
5}}• Fixer
running
slow ― Moved
from
findAndModify
to
find(...).limit(...)
and
then
update
in
bulk ― {version:
{$lt:
5}}
is
slower
than
{version:
{$lte:
4}}
!!!• KEEP
INDEXES
IN
MEMORY!!! ― We
had
>
12G
indexes
on
a
7.5G
box 33
  • 34. We
Track
Everything• Request
Tracking• Event
Tracking• ExcepGon
Tracking
/
Triage 34
  • 35. ExcepDon
Tracking• Special
events
in
the
event
tracking
collecDon 35
  • 36. Full
Stack
Traces,
Permalinks,
Request
Context 36
  • 37. Triage• Group
excepDons
over
the
past
day,
week,
month 37
  • 38. Make
Users
Happy! 38