This document discusses various techniques for monitoring applications without interfering with core engineering work. It recommends using open source tools like the Wordnik profiler, Swagger, and MongoDB's oplog to provide business metrics monitoring and allow product teams to define their own metrics. The oplog can be used to access real-time data changes and power use cases like alerts, analytics, and pushing data to external systems without interrupting application code.
3. Monitoring?
Disk Host
Space Checks
IT Ops 101
System
Network Load
4. Monitoring?
Disk Host
Space Checks
Necessary
(but insufficient)
System
Network Load
5. Why Insufficient?
• What about Services?
• Database running?
• HTTP traffic?
• Install Munin Node!
• Some (good) service-level insight
6.
7. Your boss “OH pretty
LOVES charts colors!”
“up and
to the
“it MUST right!”
be
important!”
8. Good vs. Bad?
• Database calls avg 1ms?
• Great! DB working well
• But called 1M times per page load/user?
• Most tools are for system, not your app
• By the time you know, it’s too late
Need business
metrics
monitoring!
9. Enter APM
• Application Performance Monitoring
• Many flavors, degrees of integration
• Heavy: transaction monitoring, code performance,
heap, memory analysis
• Medium: home-grown profiling
• Light: digest your logs (failure forensics)
• What you need depends on architecture,
business + technology stage
10. APM @ Wordnik
• Micro Services make the System
Monolithic
application
11. APM @ Wordnik
• Micro Services make the System
API Calls
are the unit
of work!
Monolithic
application
12. Monitoring API Calls
• Every API must be
profiled
• Other logic as needed
• Database calls
• Connection manager
• etc...
• Anything that might
matter!
13. How?
• Wordnik-OSS Profiler for Scala
• Apache 2.0 License, available in Maven Central
• Profiling Arbitrary code block:
import com.wordnik.util.perf.Profile
Profile("create a cat", {/* do something */})
• Profiling an API call:
Profile("/store/purchase", {/* do something */})
14. Profiler gives you…
• Nearly free*** tracking
• Simple aggregation
• Trigger mechanism
• Actions on time spent “doing things”:
Profile.triggers += new Function1[ProfileCounter, Unit] {
def apply(counter: ProfileCounter): Unit = {
if (counter.name == "getDb" && counter.duration > 5000)
wakeUpSysAdminAndEveryoneWhoCanFixShit(Urgency.NOW)
return counter
}
}
15. Profiler gives you…
• Nearly free*** tracking
• Simple aggregation
• Trigger mechanism
• Actions on time spent “doing things”:
Profile.triggers += new Function1[ProfileCounter, Unit] {
def apply(counter: ProfileCounter): Unit = {
if (counter.name == "getDb" && counter.duration > 5000)
wakeUpSysAdminAndEveryoneWhoCanFixShit(Urgency.NOW)
This is intrusive
return counter
}
}
on your
codebase
16. Accessing Profile Data
• Easy to get in code
ProfileScreenPrinter.dump
• Output where you want
logger.info(ProfileScreenPrinter.toString)
• Send to logs, email, etc.
17. Accessing Profile Data
• Easier to get via API with Swagger-JAXRS
import com.wordnik.resource.util
@Path("/activity.json")
@Api("/activity")
@Produces(Array("application/json"))
class ProfileResource extends ProfileTrait
24. That’s not Actionable!
• Custompretty
But it’s
Time APIs to
window track?
What’s missing?
Too much Low + High
custom Watermark
Engineerin s
g
25. That’s not Actionable!
Custom
Time APIs to
window track?
Call to Action!
Too much Low + High
custom Watermarks
Engineering
26. Make it Actionable
• Swagger + a tiny bit of engineering
• Let your *product* people create monitors, set
goals
• A Check: specific API call mapped to a
service function
{
"name": "word-page-view",
"path": "/word/*/wordView (post)",
"checkInterval": 60,
"healthSpan": 300,
"minCount": 300,
"maxCount": 100000
}
27. Make it Actionable
• A Service Type: a collection of checks
which make a functional unit
{
"name": "www-api",
"checks": [
"word-of-the-day",
"word-page-view",
"word-definitions",
"user-login",
"api-account-signup",
"api-account-activated"
]
}
28. Make it Actionable
• A Host: “directions” to get to the checks
{
"host": "ip-10-132-43-114",
"path": "/v4/health.json/profile?api_key=XYZ",
"serviceType": "www-api”
},
{
"host": "ip-10-130-134-82",
"path": "/v4/health.json/profile?api_key=XYZ",
"serviceType": "www-api”
}
31. Make it Actionable
• Point Nagios at this!
serviceHealth.json/status/www-
api?explodeOnFailure=true Metrics from
Product
• Get a 500, get an alert
Treat like Based on
system YOUR app
failure
33. Is this Enough?
System monitoring
Aggregate monitoring
Windowed monitoring
Object monitoring?
• Action on a specific event/object
Why!?
34. Object-level Actions
• Any back-end engineer can build this
• But shouldn’t
• ETL to a cube?
• Run BI queries against production?
• Best way to “siphon” data from production
w/o intrusive engineering?
35. Avoiding Code Invasion
• We use MongoDB everywhere
• We use > 1 server wherever we use
MongoDB
• We have an opLog record against
everything we do
36. What is the OpLog
• All participating members have one
• Capped collection of all write ops t3
time
t0 t1 t2
primary replica replica
37. So What?
• It’s a “pseudo-durable global topic
message bus” (PDGTMB)
• WTF?
• All DB transactions in there
• It’s persistent (cyclic collection)
• It’s fast (as fast as your writes)
• It’s non-blocking
• It’s easily accessible
39. Tapping into the Oplog
• Made easy for you!
https://github.com/wordnik/wordnik-oss
40. Tapping into the Oplog
• Made easy for you!
https://github.com/wordnik/wordnik-oss
Incremental
Backup Snapshots
Replication
Same
Technique!
41. Tapping into the Oplog
• Create an OpLogProcessor
class OpLogReader extends OplogRecordProcessor {
val recordTriggers =
new HashSet[Function1[BasicDBObject, Unit]]
@throws(classOf[Exception])
def processRecord(dbo: BasicDBObject) = {
recordTriggers.foreach(t => t(dbo))
}
@throws(classOf[IOException])
def close(string: String) = {}
}
42. Tapping into the Oplog
• Attach it to an OpLogTailThread
val util = new OpLogReader
val coll: DBCollection =
(MongoDBConnectionManager.getOplog("oplog",
"localhost", None, None)).get
val tailThread = new OplogTailThread(util, coll)
tailThread.start
43. Tapping into the Oplog
• Add some observer functions
util.recordTriggers +=
new Function1[BasicDBObject, Unit] {
def apply(e: BasicDBObject): Unit =
Profile("inspectObject", {
totalExamined += 1
/* do something here */
}
})
}
}
44. /* do something here */
• Like?
• Convert to business objects and act!
• OpLog to domain object is EASY
• Just process the ns that you care about
"ns" : "test.animals”
• How?
45. Converting OpLog to Object
• Jackson makes this trivial
case class User(username: String, email: String,
createdAt: Date)
val user = jacksonMapper.convertValue(
dbo.get("o").asInstanceOf[DBObject],
classOf[User])
• Reuse your DAOs? Bonus points!
• Got your objects!
46. Converting OpLog to Object
• Jackson makes this trivial
“o” is for
case class User(username: String, email: String,
createdAt: Date)
“Object”
val user = jacksonMapper.convertValue(
dbo.get("o").asInstanceOf[DBObject],
classOf[User])
• Reuse your DAOs? Bonus points!
• Got your objects! Now What?
47. Use Case 1: Alert on Action
• New account!
obj match {
case newAccount: UserAccount => {
/* ring the bell! */
}
case _ => {
/* ignore it */
}
}
48. Use case 2: What’s Trending?
• Real-time activity
case o: VisitLog =>
Profile("ActivityMonitor:processVisit", {
wordTracker.add(o.word)
})
49. Use case 3: External Analytics
case o: UserProfile => {
getSqlDatabase().executeSql(
"insert into user_profile values(?,?,?)",
o.username, o.email, o.createdAt)
}
50. Use case 3: External Analytics
case o: UserProfile => {
getSqlDatabase().executeSql(
"insert into user_profile values(?,?,?)",
Your Data
o.username, o.email, o.createdAt)
} pushes to
Relational!
Don’t mix
runtime &
OLAP!
51. Use case 4: Cloud analysis
case o: NewUserAccount => {
getSalesforceConnector().create(
Lead(Account.ID, o.firstName, o.lastName,
o.company, o.email, o.phone))
}
52. Use case 4: Cloud analysis
case o: NewUserAccount => {
getSalesforceConnector().create(
Lead(Account.ID, o.firstName, o.lastName,
o.company, o.email, o.phone))
}
We didn’t
Pushed interrupt core
directly to engineering!
Salesforce!
57. Summary
• Don’t mix up monitoring servers & your
application
• Leave core engineering alone
• Make a tiny engineering investment now
• Let your product folks set metrics
• FOSS tools are available (and well tested!)
• The opLog is incredibly powerful
• Hack it!
58. Find out more
• Wordnik: developer.wordnik.com
• Swagger: swagger.wordnik.com
• Wordnik OSS: github.com/wordnik/wordnik-oss
• Atmosphere: github.com/Atmosphere/atmosphere
• MongoDB: www.mongodb.org