Successfully reported this slideshow.
Your SlideShare is downloading. ×

Zabbix: Beyond Thunderdome

Upcoming SlideShare
Introducing BugBase 1.0
Introducing BugBase 1.0
Loading in …3

Check these out next

1 of 94 Ad
1 of 94 Ad

More Related Content

Similar to Zabbix: Beyond Thunderdome (20)


Zabbix: Beyond Thunderdome

  1. 1. What’s going on? @ablythe
  2. 2. Huh? @ablythe
  3. 3. Huh? • Does anyone know what movie that was? @ablythe
  4. 4. @ablythe
  5. 5. World Record • Highest Profit to Cost Ratio Ever • But before that… @ablythe
  6. 6. @ablythe
  7. 7. Zabbix: Beyond Thunderdome Aaron Blythe
  8. 8. This presentation is about… @ablythe
  9. 9. This presentation is about… @ablythe
  10. 10. This presentation is about… @ablythe
  11. 11. This presentation is about… @ablythe
  12. 12. Past Now Future @ablythe
  13. 13. Past Now Future @ablythe
  14. 14. What is Zabbix? @ablythe
  15. 15. What is Mad Max? @ablythe
  16. 16. Why Zabbix? @ablythe
  17. 17. Why Zabbix? Necessity @ablythe
  18. 18. Why Zabbix? @ablythe
  19. 19. Why Zabbix? Open Source Linus’s Law Given enough ‘s all ‘s are Community Based @ablythe
  20. 20. Why Zabbix? @ablythe
  21. 21. Why Zabbix? @ablythe
  22. 22. Why Zabbix? Mission Statement To contribute to the systemic improvement of health care delivery and the health of communities. @ablythe
  23. 23. @ablythe
  24. 24. Zabbix Linux Template - Cost • Connect Host as Agent to Zabbix Server (Via Chef) • Download Template from Zabbix • Upload Template to Zabbix Server • Apply Template to Host ____________________ • Cost = 4 steps 2 Steps 1 Step @ablythe
  25. 25. Zabbix Linux Template - Return • ~ 11 applications • ~ 90 items • ~ 120 triggers • ~ 20 graphs @ablythe
  26. 26. Profit to Cost Ratio • Mad Max – $100 million worldwide/A$400,000 • Zabbix Linux Template – 120 Triggers/2 Steps @ablythe
  27. 27. Benefit • 80% full alerts – Disk space/inodes – RAM • Make better decisions on size needed Decision Find file or process Extend LVM @ablythe
  28. 28. Chase Scenes and Crashes @ablythe
  29. 29. Creators Byron Kennedy George Miller Alexei Vladishev Zabbix (Latvia) Mad Max(Australia) @ablythe
  30. 30. Past Now Future @ablythe
  31. 31. Mad Max 2 – The Road Warrior @ablythe
  32. 32. @ablythe
  33. 33. Scale @ablythe
  34. 34. Highly Available Deployments Proxy Layer Service Layer @ablythe
  35. 35. Highly Available Deployments Proxy Layer Service Layer @ablythe
  36. 36. Highly Available Deployments Proxy Layer Service Layer @ablythe
  37. 37. Highly Available Deployments @ablythe
  38. 38. Email Alerts to uCern Discussions @ablythe
  39. 39. Screens/Graphs – ack rates @ablythe
  40. 40. Screens/Graphs @ablythe
  41. 41. Brahe Hubble { “{INDEX_MACRO}"=>”name]}", “{VERSION_MACRO}"=>” version", “{ERROR_MACRO}"=>"#{error}" } @ablythe
  42. 42. Zabbix Low Level Discovery @ablythe Zabbix Host Zabbix Agent UserParameter Shell Script or RubyGem Zabbix Server json Document Template w/ Macro
  43. 43. Zabbix Low Level Discovery @ablythe
  44. 44. Zabbix Low Level Discovery @ablythe
  45. 45. @ablythe
  46. 46. Who? Kalin Hicks – Set up original GCL VM – countless explanations whiteboard sessions Brian Cook – Set up original Sepsis Zabbix VM’s John Breese – Set up 2.0 templates spanning hosts Brad Beam – Many dashboards, alerts and triggers Chris Rooney – Brahe-hubble gem Nidhi Bhargava – Low level discovery on 2.0 Dev – White Ops - Yellow @ablythe
  47. 47. @ablythe
  48. 48. It’s not all dogs… @ablythe
  49. 49. …and Gyrocopters @ablythe
  50. 50. Sometimes my email inbox… @ablythe
  51. 51. Has me feeling like @ablythe
  52. 52. Bus Factor @ablythe
  53. 53. Bus Factor Dystopian Future Where The Survival of Many is in the Hands of One Man @ablythe
  54. 54. The Information Model @ablythe
  55. 55. Host Group Host Group Host Template Template (0..n) Item TriggerGraph Applications 0..n Action email command Items 1..n … has a learning curve
  56. 56. Mad Max 2: The Road Warrior @ablythe
  57. 57. Past Now Future @ablythe
  58. 58. We Want Tina Turner! @ablythe
  59. 59. Beyond Thunderdome @ablythe
  60. 60. Virtualization thru Skybox Labs @ablythe
  61. 61. Dashboards chapters divided by types of data rather than types of display chapters on multi-variables, correlationand proportions Honestly a little too textbook- ish for me from more than two dozen experts, real world case studies, beautiful layers, how to’s @ablythe
  62. 62. Pull Data External? @ablythe
  63. 63. Zabbix Maps @ablythe
  64. 64. Alert Exhaustion Ain’t Nobody Got @ablythe
  65. 65. Two Men Enter, One Man Leaves @ablythe
  66. 66. Correlation of Alerts Proxy Layer Service Layer @ablythe
  67. 67. Trigger Dependencies • Sometimes the availability of one host depends on another. A server that is behind some router will become unreachable if the router goes down. With triggers configured for both, you might get notifications about two hosts down - while only the router was the guilty party. @ablythe
  68. 68. “Flap Detection” and a Grace Period Nagios uses "flap detection" to prevent many ERROR's and OK's being sent right after each other. Zabbix calls this "hysteresis". @ablythe
  69. 69. Hysteresis Hysteresis is the dependence of a system not only on its current environment but also on its past environment @ablythe
  70. 70. Delaying Notifications @ablythe
  71. 71. Correlation of Alerts We need to get to the point where: 100’s of Related Alerts Enter, One Causal Alert Leaves @ablythe
  72. 72. What if someone misses something? With 100+ alert emails per day, they are almost guaranteed to miss something. @ablythe “Why on earth was I not notified?!” On
  73. 73. Trends of Flakiness These should not be dealt with by alerts/alarms. Rather by daily/weekly reports. Unfortunately Zabbix is not strong in this area yet. There is a thread: =18901 @ablythe
  74. 74. False Alarms Due to Chef Restarts Current – Manual Maintenance Periods Potentially – Automated Automate the Maintenance Periods Delaying Notifications Hysteresis Promise Theory @ablythe
  75. 75. Highly Available Deployments Delayed Notifications/Hystersis Proxy Layer Service Layer Delay Alert 120 seconds Works!! @ablythe
  76. 76. Highly Available Deployments Delayed Notifications/Hystersis Proxy Layer Service Layer Delay Alert 120 seconds Delay Alert 120 seconds Delay Alert 120 seconds No Delay Doesn’t Work @ablythe
  77. 77. Beyond Thunderdome @ablythe
  78. 78. Promise Theory @ablythe
  79. 79. Deconstructing Promises @ablythe
  80. 80. Promise Theory +data a1 a2 My Service Zabbix @ablythe
  81. 81. Leveraging Init.d to Manage State … case "$1" in start) touch /var/<service>/start … rm -f /var/<service>/start ;; stop) touch /var/<service>/stop ;; rm -f /var/<service>/stop restart) touch /var/<service>/restart $0 stop $0 start rm -f /var/<service>/restart ;; … This of course is messy if the service ever hangs during a restart. More discussion needs to be had in this area. @ablythe
  82. 82. Mark Burgess – Book of Promises ses.pdf Draft published on January 21st 2013 @ablythe
  83. 83. For the Project Managers Nobody PLANS TO FAIL Some just FAIL TO PLAN @ablythe
  84. 84. For the Project Managers Everybody should PLAN TO FAIL PRACTICE LOCALIZED FAILURE And MINIMIZE RECOVERY TIME @ablythe
  85. 85. The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win @ablythe
  86. 86. The Brent Effect Brent is the one person who understands the how the entire system fits together. Brent is the one person who fixes most of the issues. Being spread so thin, Brent is also the one person who causes most of the issues. @ablythe
  87. 87. Dystopian Future Where The Survival of Many is in the Hands of One Man The system or crucial parts of the system Man or Woman @ablythe
  88. 88. What is OpsInfra? A team built on enablement of DevOps. @ablythe Other tools As needed Build an Ecosystem Tool Virtualization Repeatable Deployment Documentation Discussion Auxiliary Tooling Education The Success of: Population Health Millennium+ Project Go
  89. 89. Incubator • r • 4 steps – Log a Jira with the intent to research a tool – Write a wiki article on how to use it – Write a blog on how it is awesome – Record a demo of the tool @ablythe
  90. 90. For the Architects Monitoring is only “technical debt” if you choose to carry it that way. Depending on when you invest, it easily can be “technical capital” @ablythe
  91. 91. Beyond Thunderdome @ablythe
  92. 92. Past – Hackers - Craft Now – SysAdmin - Trade Future – Devops - Science @ablythe
  93. 93. The Tell The years travel fast And time after time, I've done the tell But this ain't one body’s tell It's the tell of us all And you gotta listen it and 'member Cuz what you hears today You gotta tell the newborn tomorrow @ablythe
  94. 94. What’d ya think? @ablythe

Editor's Notes

  • That was the Blair Witch Project
  • Blair Witch at one point held the record for the highest profit to cost ratio ever. &lt;enter&gt;But before that…
  • Mad Max held that record for a couple decades.
  • My name is Aaron Blythe, and this presentation is CalledZabbix: Beyond Thunderdome.
  • Mad Max
  • ZabbixBy show of hands who has logged into a Zabbix instance?And who has received email alerts from Zabbix?
  • We will go through where we have been.Where we are.And where we can go with Zabbix.I will try to not give too many spoilers on the Mad Max series of films, merely just lay down the story line.
  • First I want to go through how we got here with Zabbix so far, using the original Mad Max as a guide.
  • Zabbix is an Open Source Monitoring ToolWebsite claims:Up-to 100,000 monitored devicesUp-to 1,000,000 of metrics
  • Mad Max is set in Australia in a dystopian future where earth’s oil supply has been nearly exhausted.Max Rockatansky is the top driver in the Main Force Patrol (basically the police). Gangs have taken over the highway. In a car chase, Max kills one of the gang members, so they want revenge.Honestly the story is sort of dis-jointed. The movie was edited in the home of one of the producers on a home made editing machine, created by his father (an engineer).
  • Brian Cook told me a story of when they were first working on one of our cloud applications.  It was memory bound.  When a lot of data was being pumped through in batches it would actually clobber the machine.  He would have to call someone in the data center at 2 in the morning to physically reboot the machine.  Oh, and after doing this a few times he would always make sure to tell them to bring a pencil so they could actually get to the button
  • Kalin Hicks and Brian Cook told me:Zabbix was originally installed to bridge the gap in our monitoring for the Sepsis project, while we waited for a permanent solution, we just chose to use another monitoring tool instead of a bunch of scripts.It was a Skunkworks project that went viral and certainly was not ever intended to become such a big project.
  • Necessity helps us create or adapt great fun thingsDavid Eggby, responsible for much of the footage for Mad Max had this to say about filming.“… [Shooting from the back of the Goose bike] I couldn&apos;t have a helmet on because you can&apos;t operate a camera, it gets in the way… They put a seat belt strap around us and we went for it, and you can see on the speedo that it&apos;s cracking 180kph.” From: is ‘stra’in for spedometer…
  • Unlike proprietary monitoring tools that we use now or have used in the past, we don’t have to worry about paying a license for every stakeholder that has a business need to see the data. &lt;enter&gt;&lt;enter&gt;Fixes on the 2.0 line have so far been decently timely. With a community of hundreds of contributors Linus’s law applies.Which is given enough eyeballs all bugs are shallow.&lt;enter&gt;Zabbix is community based
  • Community based means there are forums, where we can ask questions and get answers ourselves or see the answers to others questions. &lt;Enter&gt;Yes that is almost 40,000 posts to over 10,000 threads. We could never expect this level of interaction and support for a internally developed monitoring tool.
  • The number of users in the freenode IRC channel continues to grow to nearly 200 people on average.This is a place to ask advanced questions in real time from users around the world.Oh and this graph was created and gathered in Zabbix over 7 years.
  • We providehealth care solutions, if we can integrate tools that solve software and hardware problems, that gets us to our goal faster.
  • For those of you who now want to see the movie because of this talk I don’t want to ruin it for you.But some bad things happen to people Max knows in this movie.This causes Max to quit the force, but he is talked into just taking a holiday instead. At this point Max is just a regular guy. He is trying to keep the peace and lead a good life with his girlfriend.
  • There are 4 steps to get your host connected to the Zabbix Server and use the Linux OS Template. &lt;enter&gt;However 2 of them have likely been done for you on the Zabbix Server already &lt;enter&gt;And soon we plan to automate Application of the Template to the Host using Auto-Discovery of Linux nodes.So we are left with one step.
  • For those couple steps you get (roughly depending on the layout of the host):11 applications90 items120 triggersAnd20 graphs
  • As I said at the beginning Mad Max made a ton of money for the amount of money spent. About 500 to 1000 dollars for every dollar spent.With the Zabbix Linux Template, we are talking about a couple hours of work for 120 Triggers. Once you’ve set this up before it is really only about 10 minutes work to set it up for future nodes.
  • The 80% full alerts have been extremely beneficial.In the case of disk space and inodes, these alerts give us the time and ability to troubleshoot the issue and make a decision if we Extend the Logical Volume or Find the offending large file or processIn the case of the volume reaching 100% the only choice is extend the LVMIn the case that I spoke of before that Brian Cook ran into with RAM, we can make better decisions on the size and number of nodes we need for Map Reduce.
  • The entire Mad Max series is built on Car chases, which are awesome to watch.So far it has been awesome to watch Zabbix grow so prolifically throughout Cerner.
  • What impresses me most about Zabbix and Mad Max is that something so simple and easy could gain so much mindshare.The Creators of each poured time and effort into something that has universal and world wide appeal.We are adaptors of there work and I want to thank them.
  • So that is where we have been and howwe got started.Now let’s talk about where we now using Mad Max 2: The Road Warrior
  • Mad Max 2 The Road Warrior picks up a few years later. Max is older and hardened from the tragedy at the end of the first movie. Oil is still scarce. There are still street gangs.Max is now a Lone Wolf.He is looking for more ammunition for his sawed off.
  • Oh and the villians have slightly better costumes… more budget.
  • We have well over 2000 nodes currently in the ProductionZabbix 2.0 instance currently.And we believe we can scale that much incredibly higher with our current deployment structure.
  • A common setup for a highly available system (or HA) is to have N+1 nodes.Here we see 2 proxy layer nodes fronting 3 service layer nodes.
  • If one of the service layer nodes goes down that is a problem, that needs to be addressed and likely quickly.However the system as a whole is still functioning.
  • However if all 3 nodes go down that is a disaster that needs to be addressed immediately and someone needs to be paged to fix it.
  • John Breese was able to set this up for us on Semantic Solutions using templates.We receive high alerts in the event that any single node goes down.We receive disaster alerts in the event that all of there servers or proxies are down.
  • The alerts go to auCern Space set up specifically for monitoring our system. Associates are free to subscribe or unsubscribe from this space as they need.The discussion can occur in the open and the URL can quickly be pasted on other discussions or Jiras that are occurring on other related issues.
  • Brad Beam created these graphs that anyone who can access the production Zabbix system can see. Meaning if you have the need to see this, you only have to log an issue in Jira.This graph is monitoring the Real Time processing of data through Storm.The Storm acknowledgement rates (or ack rates) are away to gauge system healthA low ack rate and a sufficient backlog in notifications, it is indicative of an issue.I’ll be honest, I am not sure how exactly these graphs were created, nor that many details about it specifically. What I do know is that many people have been watching this information to understand the system behavior and improve it over the last couple months.
  • Another Dashboard created by Brad BeamWe currently have a bug in the JVM reuse for the M/R jobs The resources for the finished JVMs wouldn&apos;t be reclaimed which would eventually exhaust the resources on the box. So with this graph we can identify if a server has bogus JVMs out there and need to be addressed.Development of basic monitoring features can now be measured in hours or days, as opposed to months.We need the freedom to change these metrics daily/weekly as we learn more.
  • Brahe Hubble is a Ruby Gem created by Chris Rooney here are Cerner&lt;enter&gt;Not to steal any thunder from Ben Brown and KartikVishwanath presenting on Brahe later in this conference, Brahe is named after the astronomer Tycho Brahe (similar to the project Kepler, which many of you may be more familiar with).Brahe Solr is a cloud based indexing application also created here at Cerner &lt;enter&gt;presents at least 2 replicas &lt;enter&gt; That are fronted by a Brahe REST services &lt;enter&gt; to manage and query their state &lt;enter&gt;Brahehubble uses this rest services &lt;enter&gt;To present a Json document &lt;enter&gt;To be used by a Zabbix TemplateSo why not have Zabbix call the rest interface directly?Basically the logic done by Brahe Hubble is too complicated for Zabbix to complete on it’s own.
  • With the help of Kalin and Brad Beam, NidhiBhargava worked through this for our Brahe Hubble deploymentYou have your Host or Node and aZabbix Server &lt;enter&gt;First you have to get the Zabbix Agent Installed (preferably through Chef) &lt;enter&gt;Then a script (or in the case of Brahe Hubble a RubyGem) that does the gathering of information and outputs a json documentBut how will the Zabbix Agent know about the script or command line? &lt;enter&gt;Easy you will have to configure the UserParameter for Zabbix Agent (simple to do if your are using the zabbix_agent_chef cookbook) &lt;enter&gt;This will allow you to present a json document to the Zabbix Server &lt;enter&gt;The Zabbix Server then uses this json document in a Template with a Macro.
  • In Templates &lt;enter&gt;The important part is that this is created under “discovery” &lt;enter&gt;In Discovery we created an item and a trigger &lt;enter&gt;The item &lt;enter&gt;
  • It is here where you can use the name value pairs presented in json from the script or RubyGem.
  • Let me stop for a minute and tell you about my 2 favorite characters in Mad Max 2Max meets this guy that we refer to as the “Gyro Captain” because no one says his name in the movie and Max never asks.Oh and probably because he drives a gyro copter.Character development is starting to become part of the Mad Max movie this time around. Even if names are not. I personally like names and would love to celebrate things you do with Zabbix as I just did with the cool stuff I have seen done with Zabbix.
  • Names I have already said so far. &lt;enter&gt;There are many more, but notice that there are 3 dev and 3 ops. Each of us have learned a lot from one another.
  • There is also The Feral Kid, named for similar reasons. Max gives the feral kid a music box. Max’s heart is starting to soften some and he decides to help this village of people protecting their oil try to get away from the road gang.Max has become more invested in the village. Over the past couple years Zabbix has moved from that side project, or Skunkworks project to an investment in the health of our system.
  • Max tries to leave the village once, but does not make it. He comes back after a pretty severe beating.
  • Remember that Max was the best driver on the Main Force Patrol.Max is the only one who is going to be able to drive the tankard of oil out of the protected village.Oh and there is an epic oil tanker chase scene. It goes on for like 20 minutes.In Software we often refer to situations where only one or a few can do something critical as having a low “Bus Factor”. Which put simply is the total number of key developers who would need to be hit by a bus (or tankard) before the project would not be able to proceed.
  • I would describe Mad Max 2 as aREAD SLIDE
  • The Zabbix Information model has a rather steep learning curve. But I believe it is one worth climbing.From
  • As I often do,I asked Kalin to talk to me like I&apos;m a 3rd grader and he boiled it down to this for me.* A Host can be part of many Host Groups.A Host can have many Templates applied to itA Template can have Graphs, Items, and TriggersYou can define actions for TriggersKyle McGovern and Ben Hemphill mentioned yesterday that they are using Zabbix to restart Hadoop Region Servers.So Self healing system of the future? We have that now.
  • The Road Warrior won critical acclaim, and is an incredibly better movie than the first. The story line is cohesive and somewhat compelling. Max truly comes out a hero.By putting in more work, we have a better story and done some awesome stuff with Zabbix so far…
  • Let’s talk about where we want to go with Zabbix in the next couple years.
  • We want Tina Turner level success…In the third installment of the saga, Mad Max: Beyond Thunderdome, Tina Turner is the leader of Bartertown. She plays Aunty Entity.
  • Bartertown has regained some technology through the use of methane.Years have past and an aging Max has some of his supplies stolen and becomes involved in the local political power struggle.
  • Recently Nimesh Subramanian created a Skybox Labs virtual cluster with a Chef Server and a Zabbix Server.You can check this out upload the cookbook for your app or service and start playing around with Zabbix without affecting a shared domain where others are working.When you are finished you can just throw the image away.
  • Dashboards are an area that could use a lot of work. Each of these titles are available on Safari Online. The way people read books is a personal decision. I personally use my library card and each of these 4 are available on Safari Online so I can read them on my iPad.How do we convey the most information in the least amount of space to make only the real problems gain attention?
  • Zabbix has a full API.Many have been pulling Jira and Splunk data already into Dashing from Shopify which can be optimized It should be rather trivial.
  • Zabbix does have some interesting features.A couple weeks ago, in the blog, Zabbix Maps were explained fairly well.We have not made use of this very heavily however this could potentially give us a graphical relational way to reason about the data that Zabbix is gathering.
  • Seriously…
  • In Mad Max Beyond Thunderdome there is a cage match between Max and a huge opponent named Blaster.The crowd chants “Two men enter, one man leaves”
  • Remember back to my example of High Alerts vs. Disaster for the Service Layer? In the disaster scenario I get 4 alerts. 3 for each of the host, and one for the disaster.However this is likely all from one cause. Meaning those alerts are correlated, but how to do I get the system to only email me once?Sometimes a single cause can result in hundreds of emails from Zabbix. I heard one system engineer recently refer to this as “Getting Zabbixed”
  • Straight from the Zabbix Documentation
  • can get into states where they send Error then immediately send OK’s.A different monitoring system, Nagios, calls this “Flap detection”.In these cases real time alerts are not of much value, Because the system is doing one of two things:Correcting itself somehow faster than a human can interveneOr these are just the downstream effect of the network or another factor (that we should be using the previously mentioned trigger dependency for)Zabbix calls this Hysteresis pronounced “Historee Sis”
  • Hysteresis is the dependence of a system not only on its current environment but also on its past environment &lt;Enter&gt;For alerts such as this we can use the unix pipe command to chain. &lt;enter&gt;Problem: being less than 10GB for 5 minutes &lt;enter&gt;notice you set this a max of 5 minutes &lt;enter&gt;Recovery: being more than 40 GB in the last 10 minutes &lt;enter&gt;notice the min of 10 minutes &lt;enter&gt;
  • the Zabbix documentation (I have not fully tested this myself).First check the box to Schedule Actions – This allows the actions on the right sideNext, set a period (maybe 120 seconds)Enable a recovery messageMake sure Trigger value = “PROBLEM” or you will delay the recovery messageStep 2 happens after 120 seconds (step 1 is not defined) so nothing happens.
  • We need Thunderdome for our alerts100’s of related alerts enterOne causal alert leaves
  • In discussing these methods of correlation, suppression, and delaying messages, I often get asked, “What if someone misses something?” &lt;enter&gt;A monitoring system that cries wolf too often is almost guaranteed not to get listened to. When I hear a car alarm these days I unfortunately almost never think that someone is trying to steal a car.While this is a valid question, it is not the most interesting question to me. It seems like a question that could stunt progress.The Zabbix community is working through an Action Simulator that may be part of a future release of Zabbix. Look for the blog entry entitled: “Why on earth was I not notified?!”
  • Trends of flapping are better dealt with in an wholistic manner.Zabbix is not yet great at daily/weekly reports, but it appears that the community has made a lot of headway and it will be in a near future release.
  • So let’s return to my previous example. &lt;enter&gt;If I delay the notification by 120 seconds and the node recovers in time, then I get no notification – this is good as it will cut down on a number of notificationsIf the node does not recover in that time - the system as a whole is still up and I can deal with the problematic node individually &lt;enter&gt;
  • If all 3 nodes are down at the same time, I would not however delay the notifications of the Disaster.In this case, the system is not likely to recover in 2 minutes so I would just be delaying the other 3 emails. &lt;enter&gt;I may be able to set up a trigger dependency, however that would sort of be circular in my current opinion. Remember trigger dependency was for a separate host. &lt;enter&gt;
  • In beyond Thunderdome, Max is banished from Bartertown. He is found by a tribe of children who have a “tell” that prophesizes his arrival. Again Max becomes a reluctant hero to this tribe of people.
  • When Adam Jacob from OpsCode was visiting our campus he walked through an example that we had been working through with proxies.He mentioned Promise Theory. &lt;enter&gt;I am going to use an example I lifted from John Willis of the DevOps Café Podcast.A promise of B from agent 1 to agent 2.
  • There are promises to give and promises to receiveLet’s use + for give and – for receiveI (a1) promise to feed my neighbor’s cat (a2) My neighbor (a2) promises to grant me access to his house.Trust comes in:That my neighbor gave me the correct code and I will not get arrested.That I will not drink his 25 year old scotch
  • My Service promises to publish state.
  • If you think this subject is interesting Mark Burgess (who wrote cfengine – a precursor to Chef - well before it’s time) recently published a 303 page Draft of his book on the subject.
  • I have had the opportunity to read many books and take classes on project management.We see this quote many times Nobody Plans to fail, some just Fail to Plan &lt;Enter&gt;This is cute &lt;Enter&gt;But it is wrong
  • Read the slideSchedule strategic iteration time to work through monitoring…So you are not scheduling weekend war rooms
  • The Phoenix Project is a novel about IT and DevOps.It is about a company on the brink of complete failure.
  • Beyond Thunderdome is yet again a Dystopian Future where the Survival of many is in the hands of one Man &lt;enter&gt;It makes a great action movie, but not a great way to do business.
  • Our team is built on enablement. We are structured around understanding, harnessing and providing the capabilities needed to deliver software in the Big Data world.There are many tools already in use by a large number of teams. Each of the tools used have a large open community outside of Cerner.We are focused on building an ecosystem within Cerner to solve the large scale problems we are facing with these large scale deployments.
  • I have been asked many times in the past couple months “Have you seen monitoring tool X? It is awesome.”I am sure that it is. Please show me why it is awesome. We have set up a way that you can do this.Visit the our Incubator link on the uCern wiki. We would like to collect the awesome DevOps tools you are looking into, in a place where you can compare the capabilities to make the best decisions on which ones should be applied to your team.
  • I had an architect recently refer to working on a monitoring solution as “technical debt” when his system was not yet in production.READ SLIDE
  • The third installment closes with yet another epic chase in all sorts of vehicles and epic explosions. Max again comes out a hero…
  • So to relate this back to Chris Brown’s Keynote yesterday?