0
Insight without InterferenceMonitoring with Scala, Swagger, MongoDB and Wordnik OSS                       Tony Tam        ...
Nagios Dashboard
Monitoring? Disk                    HostSpace                   Checks          IT Ops 101                        SystemNe...
Monitoring? Disk                           HostSpace                          Checks            Necessary          (but in...
Why Insufficient?• What about Services? •   Database running? •   HTTP traffic?• Install Munin Node! •   Some (good) servi...
Your boss    “OH pretty   LOVES charts    colors!”                     “up and                      to the “it MUST       ...
Good vs. Bad?• Database calls avg 1ms? •   Great! DB working well •   But called 1M times per page load/user?• Most tools ...
Enter APM• Application Performance Monitoring• Many flavors, degrees of integration •   Heavy: transaction monitoring, cod...
APM @ Wordnik• Micro Services make the System       Monolithic       application
APM @ Wordnik• Micro Services make the System                           API Calls                          are the unit   ...
Monitoring API Calls• Every API must be  profiled• Other logic as needed •   Database calls •   Connection manager •   etc...
How?• Wordnik-OSS Profiler for Scala  •   Apache 2.0 License, available in Maven Central• Profiling Arbitrary code block:i...
Profiler gives you…• Nearly free*** tracking• Simple aggregation• Trigger mechanism  •   Actions on time spent “doing thin...
Profiler gives you…• Nearly free*** tracking• Simple aggregation• Trigger mechanism  •   Actions on time spent “doing thin...
Accessing Profile Data• Easy to get in code       ProfileScreenPrinter.dump• Output where you want  logger.info(ProfileScr...
Accessing Profile Data• Easier to get via API with Swagger-JAXRSimport com.wordnik.resource.util@Path("/activity.json")@Ap...
Accessing Profile Data
Accessing Profile Data                 Inspect                 without                 bugging                  devs!
Is Aggregate Data Enough?• Probably not• Not Actionable •   Have calls increased? Decreased? •   Faster response? Slower?
Make it Actionable    • “In a 3 hour window, I expect 300,000      views per server”      •   Poll & persist the counters{...
Make it Actionable
Make it Actionable              Your boss            LOVES charts
That’s not Actionable!• Custompretty  But it’s   Time                         APIs to  window                        track...
That’s not Actionable!Custom Time                             APIs towindow                            track?             ...
Make it Actionable• Swagger + a      tiny   bit of engineering •   Let your *product* people create monitors, set     goal...
Make it Actionable• A Service Type: a collection of checks  which make a functional unit  {          "name": "www-api",   ...
Make it Actionable• A Host: “directions” to get to the checks{  "host": "ip-10-132-43-114",  "path": "/v4/health.json/prof...
Make it Actionable• And finally, a simple GUI
Make it Actionable• And finally, a simple GUI
Make it Actionable• Point Nagios at this!serviceHealth.json/status/www-api?explodeOnFailure=true        Metrics from      ...
Make it Actionable
Is this Enough?System monitoringAggregate monitoringWindowed monitoringObject monitoring? •   Action on a specific eve...
Object-level Actions• Any back-end engineer can build this •   But shouldn’t• ETL to a cube?• Run BI queries against produ...
Avoiding Code Invasion• We use MongoDB everywhere• We use > 1 server wherever we use  MongoDB• We have an opLog record aga...
What is the OpLog• All participating members have one• Capped collection of all write ops        t3                  time ...
So What?• It’s a “pseudo-durable global topic  message bus” (PDGTMB)  •   WTF?• All DB transactions in there• It’s persist...
More about this{    "ts" : {         "t" : 1340948921000, "i" : 1    },    "h" : NumberLong("5674919573577531409"),    "op...
Tapping into the Oplog• Made easy for you!https://github.com/wordnik/wordnik-oss
Tapping into the Oplog • Made easy for you! https://github.com/wordnik/wordnik-ossIncremental  Backup                     ...
Tapping into the Oplog    • Create an OpLogProcessorclass OpLogReader extends OplogRecordProcessor {  val recordTriggers =...
Tapping into the Oplog• Attach it to an OpLogTailThreadval util = new OpLogReaderval coll: DBCollection = (MongoDBConnecti...
Tapping into the Oplog• Add some observer functionsutil.recordTriggers +=  new Function1[BasicDBObject, Unit] {      def a...
/* do something here */• Like?• Convert to business objects and act! •   OpLog to domain object is EASY •   Just process t...
Converting OpLog to Object• Jackson makes this trivialcase class User(username: String, email: String,  createdAt: Date)va...
Converting OpLog to Object• Jackson makes this trivial                     “o” is forcase class User(username: String,   e...
Use Case 1: Alert on Action• New account!obj match {  case newAccount: UserAccount => {    /* ring the bell! */  }  case _...
Use case 2: What’s Trending?• Real-time activitycase o: VisitLog => Profile("ActivityMonitor:processVisit", {   wordTracke...
Use case 3: External Analyticscase o: UserProfile => {    getSqlDatabase().executeSql(      "insert into user_profile valu...
Use case 3: External Analyticscase o: UserProfile => {    getSqlDatabase().executeSql(      "insert into user_profile valu...
Use case 4: Cloud analysiscase o: NewUserAccount => {    getSalesforceConnector().create(      Lead(Account.ID, o.firstNam...
Use case 4: Cloud analysiscase o: NewUserAccount => {    getSalesforceConnector().create(      Lead(Account.ID, o.firstNam...
Examples     Polling profile      APIs cross        cluster
Examples       Siphoning        hashtags      from opLog
Examples       Page view      activity from         opLog
Examples      Health check          w/o      engineering
Summary• Don’t mix up monitoring servers & your  application• Leave core engineering alone• Make a tiny engineering invest...
Find out more• Wordnik: developer.wordnik.com• Swagger: swagger.wordnik.com• Wordnik OSS: github.com/wordnik/wordnik-oss• ...
System insight without Interference
Upcoming SlideShare
Loading in...5
×

System insight without Interference

3,307

Published on

Talk at Wordnik HQ about how to monitor application performance and business goals without intrusive engineering work on your core product.

Published in: Technology, Design
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,307
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
22
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "System insight without Interference"

  1. 1. Insight without InterferenceMonitoring with Scala, Swagger, MongoDB and Wordnik OSS Tony Tam @fehguy
  2. 2. Nagios Dashboard
  3. 3. Monitoring? Disk HostSpace Checks IT Ops 101 SystemNetwork Load
  4. 4. Monitoring? Disk HostSpace Checks Necessary (but insufficient) SystemNetwork Load
  5. 5. Why Insufficient?• What about Services? • Database running? • HTTP traffic?• Install Munin Node! • Some (good) service-level insight
  6. 6. Your boss “OH pretty LOVES charts colors!” “up and to the “it MUST right!” beimportant!”
  7. 7. Good vs. Bad?• Database calls avg 1ms? • Great! DB working well • But called 1M times per page load/user?• Most tools are for system, not your app• By the time you know, it’s too late Need business metrics monitoring!
  8. 8. Enter APM• Application Performance Monitoring• Many flavors, degrees of integration • Heavy: transaction monitoring, code performance, heap, memory analysis • Medium: home-grown profiling • Light: digest your logs (failure forensics)• What you need depends on architecture, business + technology stage
  9. 9. APM @ Wordnik• Micro Services make the System Monolithic application
  10. 10. APM @ Wordnik• Micro Services make the System API Calls are the unit of work! Monolithic application
  11. 11. Monitoring API Calls• Every API must be profiled• Other logic as needed • Database calls • Connection manager • etc...• Anything that might matter!
  12. 12. How?• Wordnik-OSS Profiler for Scala • Apache 2.0 License, available in Maven Central• Profiling Arbitrary code block:import com.wordnik.util.perf.ProfileProfile("create a cat", {/* do something */})• Profiling an API call:Profile("/store/purchase", {/* do something */})
  13. 13. Profiler gives you…• Nearly free*** tracking• Simple aggregation• Trigger mechanism • Actions on time spent “doing things”:Profile.triggers += new Function1[ProfileCounter, Unit] { def apply(counter: ProfileCounter): Unit = { if (counter.name == "getDb" && counter.duration > 5000) wakeUpSysAdminAndEveryoneWhoCanFixShit(Urgency.NOW) return counter }}
  14. 14. Profiler gives you…• Nearly free*** tracking• Simple aggregation• Trigger mechanism • Actions on time spent “doing things”:Profile.triggers += new Function1[ProfileCounter, Unit] { def apply(counter: ProfileCounter): Unit = { if (counter.name == "getDb" && counter.duration > 5000) wakeUpSysAdminAndEveryoneWhoCanFixShit(Urgency.NOW) This is intrusive return counter }} on your codebase
  15. 15. Accessing Profile Data• Easy to get in code ProfileScreenPrinter.dump• Output where you want logger.info(ProfileScreenPrinter.toString)• Send to logs, email, etc.
  16. 16. Accessing Profile Data• Easier to get via API with Swagger-JAXRSimport com.wordnik.resource.util@Path("/activity.json")@Api("/activity")@Produces(Array("application/json"))class ProfileResource extends ProfileTrait
  17. 17. Accessing Profile Data
  18. 18. Accessing Profile Data Inspect without bugging devs!
  19. 19. Is Aggregate Data Enough?• Probably not• Not Actionable • Have calls increased? Decreased? • Faster response? Slower?
  20. 20. Make it Actionable • “In a 3 hour window, I expect 300,000 views per server” • Poll & persist the counters{ • Example: Log page views, every min "_id" : "web1-word-page-view-20120625151812", "host" : "web1", "count" : 627172, "timestamp" : NumberLong("1340637492247")},{ "_id" : "web1-word-page-view-20120625151912", "host" : "web1", "count" : 627372, "timestamp" : NumberLong("1340637552778")}
  21. 21. Make it Actionable
  22. 22. Make it Actionable Your boss LOVES charts
  23. 23. That’s not Actionable!• Custompretty But it’s Time APIs to window track? What’s missing?Too much Low + High custom WatermarkEngineerin s g
  24. 24. That’s not Actionable!Custom Time APIs towindow track? Call to Action! Too much Low + High custom WatermarksEngineering
  25. 25. Make it Actionable• Swagger + a tiny bit of engineering • Let your *product* people create monitors, set goals• A Check: specific API call mapped to a service function { "name": "word-page-view", "path": "/word/*/wordView (post)", "checkInterval": 60, "healthSpan": 300, "minCount": 300, "maxCount": 100000 }
  26. 26. Make it Actionable• A Service Type: a collection of checks which make a functional unit { "name": "www-api", "checks": [ "word-of-the-day", "word-page-view", "word-definitions", "user-login", "api-account-signup", "api-account-activated" ] }
  27. 27. Make it Actionable• A Host: “directions” to get to the checks{ "host": "ip-10-132-43-114", "path": "/v4/health.json/profile?api_key=XYZ", "serviceType": "www-api”},{ "host": "ip-10-130-134-82", "path": "/v4/health.json/profile?api_key=XYZ", "serviceType": "www-api”}
  28. 28. Make it Actionable• And finally, a simple GUI
  29. 29. Make it Actionable• And finally, a simple GUI
  30. 30. Make it Actionable• Point Nagios at this!serviceHealth.json/status/www-api?explodeOnFailure=true Metrics from Product• Get a 500, get an alert Treat like Based on system YOUR app failure
  31. 31. Make it Actionable
  32. 32. Is this Enough?System monitoringAggregate monitoringWindowed monitoringObject monitoring? • Action on a specific event/object Why!?
  33. 33. Object-level Actions• Any back-end engineer can build this • But shouldn’t• ETL to a cube?• Run BI queries against production?• Best way to “siphon” data from production w/o intrusive engineering?
  34. 34. Avoiding Code Invasion• We use MongoDB everywhere• We use > 1 server wherever we use MongoDB• We have an opLog record against everything we do
  35. 35. What is the OpLog• All participating members have one• Capped collection of all write ops t3 time t0 t1 t2 primary replica replica
  36. 36. So What?• It’s a “pseudo-durable global topic message bus” (PDGTMB) • WTF?• All DB transactions in there• It’s persistent (cyclic collection)• It’s fast (as fast as your writes)• It’s non-blocking• It’s easily accessible
  37. 37. More about this{ "ts" : { "t" : 1340948921000, "i" : 1 }, "h" : NumberLong("5674919573577531409"), "op" : "i", "ns" : "test.animals", "o" : {"_id" : "fred", "type" : "cat" }}, { "ts" : { "t" : 1340948935000, "i" : 1 }, "h" : NumberLong("7701120461899338740"), "op" : "i", "ns" : "test.animals", "o" : { "_id" : "bill", "type" : "rat" }}
  38. 38. Tapping into the Oplog• Made easy for you!https://github.com/wordnik/wordnik-oss
  39. 39. Tapping into the Oplog • Made easy for you! https://github.com/wordnik/wordnik-ossIncremental Backup Snapshots Replication Same Technique!
  40. 40. Tapping into the Oplog • Create an OpLogProcessorclass OpLogReader extends OplogRecordProcessor { val recordTriggers = new HashSet[Function1[BasicDBObject, Unit]] @throws(classOf[Exception]) def processRecord(dbo: BasicDBObject) = { recordTriggers.foreach(t => t(dbo)) } @throws(classOf[IOException]) def close(string: String) = {}}
  41. 41. Tapping into the Oplog• Attach it to an OpLogTailThreadval util = new OpLogReaderval coll: DBCollection = (MongoDBConnectionManager.getOplog("oplog", "localhost", None, None)).getval tailThread = new OplogTailThread(util, coll)tailThread.start
  42. 42. Tapping into the Oplog• Add some observer functionsutil.recordTriggers += new Function1[BasicDBObject, Unit] { def apply(e: BasicDBObject): Unit = Profile("inspectObject", { totalExamined += 1 /* do something here */ } }) } }
  43. 43. /* do something here */• Like?• Convert to business objects and act! • OpLog to domain object is EASY • Just process the ns that you care about "ns" : "test.animals”• How?
  44. 44. Converting OpLog to Object• Jackson makes this trivialcase class User(username: String, email: String, createdAt: Date)val user = jacksonMapper.convertValue( dbo.get("o").asInstanceOf[DBObject], classOf[User])• Reuse your DAOs? Bonus points!• Got your objects!
  45. 45. Converting OpLog to Object• Jackson makes this trivial “o” is forcase class User(username: String, email: String, createdAt: Date) “Object”val user = jacksonMapper.convertValue( dbo.get("o").asInstanceOf[DBObject], classOf[User])• Reuse your DAOs? Bonus points!• Got your objects! Now What?
  46. 46. Use Case 1: Alert on Action• New account!obj match { case newAccount: UserAccount => { /* ring the bell! */ } case _ => { /* ignore it */ }}
  47. 47. Use case 2: What’s Trending?• Real-time activitycase o: VisitLog => Profile("ActivityMonitor:processVisit", { wordTracker.add(o.word) })
  48. 48. Use case 3: External Analyticscase o: UserProfile => { getSqlDatabase().executeSql( "insert into user_profile values(?,?,?)", o.username, o.email, o.createdAt)}
  49. 49. Use case 3: External Analyticscase o: UserProfile => { getSqlDatabase().executeSql( "insert into user_profile values(?,?,?)", Your Data o.username, o.email, o.createdAt)} pushes to Relational! Don’t mix runtime & OLAP!
  50. 50. Use case 4: Cloud analysiscase o: NewUserAccount => { getSalesforceConnector().create( Lead(Account.ID, o.firstName, o.lastName, o.company, o.email, o.phone))}
  51. 51. Use case 4: Cloud analysiscase o: NewUserAccount => { getSalesforceConnector().create( Lead(Account.ID, o.firstName, o.lastName, o.company, o.email, o.phone))} We didn’t Pushed interrupt core directly to engineering!Salesforce!
  52. 52. Examples Polling profile APIs cross cluster
  53. 53. Examples Siphoning hashtags from opLog
  54. 54. Examples Page view activity from opLog
  55. 55. Examples Health check w/o engineering
  56. 56. Summary• Don’t mix up monitoring servers & your application• Leave core engineering alone• Make a tiny engineering investment now• Let your product folks set metrics• FOSS tools are available (and well tested!)• The opLog is incredibly powerful • Hack it!
  57. 57. Find out more• Wordnik: developer.wordnik.com• Swagger: swagger.wordnik.com• Wordnik OSS: github.com/wordnik/wordnik-oss• Atmosphere: github.com/Atmosphere/atmosphere• MongoDB: www.mongodb.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×