Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Java one 2011 monitoring a large-scale infrastructure with clojure


Published on

Published in: Technology, News & Politics
  • Be the first to comment

Java one 2011 monitoring a large-scale infrastructure with clojure

  1. 1. Monitoring a Large-Scale Infrastructure with Clojure
  2. 2. Who am I? Dennis Rowe Senior Software Developer Dell MessageOne - DevOps2 Oracle OpenWorld 2011
  3. 3. MessageOne E-Mail Continuity E-Mail Archive E-Mail Search3 Oracle OpenWorld 2011
  4. 4. The Basics 2646 Servers 3 Countries 3 Billion E-Mail 5 Million Users 12 Tired People4 Oracle OpenWorld 2011
  5. 5. We got to have a way to monitor all that stuff… Maybe not the people5 Oracle OpenWorld 2011
  6. 6. So, we came up with a solution…6 Oracle OpenWorld 2011
  7. 7. “Kneel before Zod” -- General Zod7 Oracle OpenWorld 2011
  8. 8. A Bit of History Initially written in Python Utilized Twisted framework Historical Data stored in relational database It worked, but it did not perform8 Oracle OpenWorld 2011
  9. 9. Why? Global Interpreter Lock (GIL) caused performance problems Relational Database not efficient for time-series data9 Oracle OpenWorld 2011
  10. 10. So… Why switch to Clojure? It is hip It was designed with multi-threading in mind It is a functional language It uses the JVM We can use all the Java libraries lying around Homoiconic10 Oracle OpenWorld 2011
  11. 11. “And there was much rejoicing” -- Monty Python and the Holy Grail11 Oracle OpenWorld 2011
  12. 12. So, this is how we did it12 Oracle OpenWorld 2011
  13. 13. Loader Takes XML and dumps it on a Message Bus (RabbitMQ) Nothing much to see here but…13 Oracle OpenWorld 2011
  14. 14. Data is Code So, how do we store the configurations we want for the various datacenters? As code … data … code … [“dc1” “url1” “type1” “dc2” “url2” “type2”] The configs are just Clojure code and they make sense14 Oracle OpenWorld 2011
  15. 15. RabbitMQ That is easy We will just use the RabbitMQ Java API We will create Clojure centric data structures This whole Java interoperability is kind of nice … things just kind of work15 Oracle OpenWorld 2011
  16. 16. Also! If code is data … then we can just send the code over RabbitMQ16 Oracle OpenWorld 2011
  17. 17. Wait – What? We don’t need any funky configurations? We don’t need to use XML? We don’t need to use JSON? If it is Clojure talking to Clojure we can just use data (or is it code, I am confused)17 Oracle OpenWorld 2011
  18. 18. Persister Takes the data off the bus and writes it to disk The Java ecosystem has tools for that, too JrobinWe now have our own little timeseries database and we didn’t really have to work for it.18 Oracle OpenWorld 2011
  19. 19. Consumer Takes metrics and does stuff with them Checks Computes Aggregates Historical Aggregates19 Oracle OpenWorld 2011
  20. 20. Examples Check (check “mta-delay” :degraded (above (* 3600 72)) :fmt “%,.1f secs”) Compute (compute “mem-swap-used” :using [swap_total swap_free] :as (- swap_total swap_free))20 Oracle OpenWorld 2011
  21. 21. Aggregate (aggregate “cfg-anomalies”) Historical Aggregate (hist-aggregate “index-percent-failed” “index-percent- failed#hist-1h” 3600 :agg-fn avg)21 Oracle OpenWorld 2011
  22. 22. Threading All those metrics are Clojure Agents, so I don’t have to worry about it All 16 of my processors get used Life is easy22 Oracle OpenWorld 2011
  23. 23. Look23 Oracle OpenWorld 2011
  24. 24. WWW We are not web developer types, which is fine, Clojure (plus some libraries) makes that easy, too Compojure Hiccup So, no HTML. Just code [:a {:href “/”}]24 Oracle OpenWorld 2011
  25. 25. Query We need a way to query the data in real time Clojure is homiconic So… We will just create our own DSL25 Oracle OpenWorld 2011
  26. 26. The DSL It is just code We can use existing Clojure functions plus new ones like: where select pivot filter sort format sum-by and agg-by26 Oracle OpenWorld 2011
  27. 27. Query Examplewhere :metric [“qsize” “qsize-2h-old” “rate”] |pivot |filter (> :qsize 50000) |select :host :qsize [(* 100 (- 1 (/ :qsize-2h-old :qsize))) :pct-recent] :rate |sort :pct-recent27 Oracle OpenWorld 2011
  28. 28. Explanation Looks a lot like Linux pipes Which is a good way to think about it Clojure way of reading it is: (sort (select (filter (pivot (where)))))28 Oracle OpenWorld 2011
  29. 29. Output29 Oracle OpenWorld 2011
  30. 30. DevOps What we needed (and what we got) Reports Ad-hoc Queries Corrective actions? Make the app smarter?30 Oracle OpenWorld 2011
  31. 31. Corrective Actions Write little python scripts that pull data and take actions This was so easy that we had to do it Simple, repetitive actions are now fully automated Life is better31 Oracle OpenWorld 2011
  32. 32. App Smarter App now uses the monitoring to feed intelligently Less operator interaction needed More time spent solving real problems32 Oracle OpenWorld 2011
  33. 33. Q and A33