Scaling a Global Support Team - Atlassian Summit 2012
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Hi
    I couldn't see this presentation online. When I try to download it it give me a *.key file. If it's possible please solve this problem.
    Thank you so much.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
2,969
On Slideshare
962
From Embeds
2,007
Number of Embeds
5

Actions

Shares
Downloads
29
Comments
1
Likes
2

Embeds 2,007

http://summit.atlassian.com 1,419
https://summit.atlassian.com 568
http://magnolia-staging.private.atlassian.com 14
https://www.google.com 5
http://www.atlassian.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • Dots or numbers?\n
  • So, why am I talking to you today, what did we get done that was so great, that warrants 40 minutes of your time?\nLong story short, the GreenHopper team told us about this thing called Kanban. We got excited. We read up on it, and took a long hard look at ourselves in the mirror and then made some changes one in trial team. And ran with that for a month.\n
  • A bit of background\n\nHow large is Support right now - 100+ people now, up from around 70, aiming for 150 this time next year\n\nWe are across five locations - Sydney, Kuala Lumpur, Amsterdam, Porto Alegre and San Francisco\n\nWe’re available 24/7 for enterprise, and critical issues\n\nWe get on average 1500 issues a week across all four product families\n\nThis is up from around 1 thousand this time last year\n\nEvery license we provide comes with support. Even the evaluation licenses.\n\n\nDramatic growth in the popularity of the product is driving a dramatic growth in the number of support issues being created.\n
  • But what was the problem we faced?\n
  • Remove points: Problem is scale:\n\nThe number of issues we are getting is growing faster than we are. With the increase in our customer base, new products, enterprise licenses, and competitive offerings that include support, \n\nWith that, we want to make sure that we are also improving the quality of the service we are providing. We call ourselves legendary. We need to be legendary as well. It’d be easy enough to trade quality for scalability or vice versa, but we need to improve both. This means not just the quality of the individual issues we are facing, but also of the work we do to continue improving our support, this will mean that when we sit down today to work out how to improve and scale tomorrow, we will have a better chance of having a good solution.\n\nTo top this all off, we don’t have a moment in our week, where we can tell people to sit down, take a break and stop working. We always need someone on that queue in case a JIRA instance somewhere in the world falls over. 24/7.\n\nSo, we need to work out that a way of making whatever we do, work with our 24/7 service. If it means we need to stop, it probably won’t work. This means rollouts of process change can be hard.\n\nNow factor in that we are 5 locations and you can see why this gets hard.\n
  • Remove points: Problem is scale:\n\nThe number of issues we are getting is growing faster than we are. With the increase in our customer base, new products, enterprise licenses, and competitive offerings that include support, \n\nWith that, we want to make sure that we are also improving the quality of the service we are providing. We call ourselves legendary. We need to be legendary as well. It’d be easy enough to trade quality for scalability or vice versa, but we need to improve both. This means not just the quality of the individual issues we are facing, but also of the work we do to continue improving our support, this will mean that when we sit down today to work out how to improve and scale tomorrow, we will have a better chance of having a good solution.\n\nTo top this all off, we don’t have a moment in our week, where we can tell people to sit down, take a break and stop working. We always need someone on that queue in case a JIRA instance somewhere in the world falls over. 24/7.\n\nSo, we need to work out that a way of making whatever we do, work with our 24/7 service. If it means we need to stop, it probably won’t work. This means rollouts of process change can be hard.\n\nNow factor in that we are 5 locations and you can see why this gets hard.\n
  • Remove points: Problem is scale:\n\nThe number of issues we are getting is growing faster than we are. With the increase in our customer base, new products, enterprise licenses, and competitive offerings that include support, \n\nWith that, we want to make sure that we are also improving the quality of the service we are providing. We call ourselves legendary. We need to be legendary as well. It’d be easy enough to trade quality for scalability or vice versa, but we need to improve both. This means not just the quality of the individual issues we are facing, but also of the work we do to continue improving our support, this will mean that when we sit down today to work out how to improve and scale tomorrow, we will have a better chance of having a good solution.\n\nTo top this all off, we don’t have a moment in our week, where we can tell people to sit down, take a break and stop working. We always need someone on that queue in case a JIRA instance somewhere in the world falls over. 24/7.\n\nSo, we need to work out that a way of making whatever we do, work with our 24/7 service. If it means we need to stop, it probably won’t work. This means rollouts of process change can be hard.\n\nNow factor in that we are 5 locations and you can see why this gets hard.\n
  • Remove points: Problem is scale:\n\nThe number of issues we are getting is growing faster than we are. With the increase in our customer base, new products, enterprise licenses, and competitive offerings that include support, \n\nWith that, we want to make sure that we are also improving the quality of the service we are providing. We call ourselves legendary. We need to be legendary as well. It’d be easy enough to trade quality for scalability or vice versa, but we need to improve both. This means not just the quality of the individual issues we are facing, but also of the work we do to continue improving our support, this will mean that when we sit down today to work out how to improve and scale tomorrow, we will have a better chance of having a good solution.\n\nTo top this all off, we don’t have a moment in our week, where we can tell people to sit down, take a break and stop working. We always need someone on that queue in case a JIRA instance somewhere in the world falls over. 24/7.\n\nSo, we need to work out that a way of making whatever we do, work with our 24/7 service. If it means we need to stop, it probably won’t work. This means rollouts of process change can be hard.\n\nNow factor in that we are 5 locations and you can see why this gets hard.\n
  • Remove points: Problem is scale:\n\nThe number of issues we are getting is growing faster than we are. With the increase in our customer base, new products, enterprise licenses, and competitive offerings that include support, \n\nWith that, we want to make sure that we are also improving the quality of the service we are providing. We call ourselves legendary. We need to be legendary as well. It’d be easy enough to trade quality for scalability or vice versa, but we need to improve both. This means not just the quality of the individual issues we are facing, but also of the work we do to continue improving our support, this will mean that when we sit down today to work out how to improve and scale tomorrow, we will have a better chance of having a good solution.\n\nTo top this all off, we don’t have a moment in our week, where we can tell people to sit down, take a break and stop working. We always need someone on that queue in case a JIRA instance somewhere in the world falls over. 24/7.\n\nSo, we need to work out that a way of making whatever we do, work with our 24/7 service. If it means we need to stop, it probably won’t work. This means rollouts of process change can be hard.\n\nNow factor in that we are 5 locations and you can see why this gets hard.\n
  • And that’s how the managers feel, and a good overall idea of the problem. But the ways this affects the 100+ Support Engineers is an important factor too. \n\nSilo-ed - as we did not have any way to see the work others were doing, we were becoming increasingly silo-ed in our work, focussing on our issues only and not collaborating as much as we could be.\n\nIt’s stressful. Customers are awesome, but customer support at Atlassian and a lot of technical places is a very interesting mix of highly technical skills along with some very untangible people skills. Throw in a new release with crazy new technology that you can’t Google because we made it up, a rising pool of new issues to handle, and the occasional Critical issue to respond to, it can be quite a stressful place if you don’t have a good framework to handle this in.\n\nFinally the interruptions. What if I have an issue that is going to take me half a day to solve. A weird data integrity issue causing index corruption. I might need to restore data multiple times into multiple instances, understand the problem, replicate it, dig really deep to find the cause and then formulate a resolution. Test that resolution after deploying the data yet again and see that it works, before we can get back to a customer with an answer.\n\nBut what if a critical comes into the queue, or a phone rings, or someone walks up to ask if Support has seen any patterns around this particular issue, or a customer gets sick of waiting and escalates their issue.\n\nWe’re interrupt driven and we need to ensure that we can work with this. But at 100+ people, if we all jump onto each interrupt at once, we’ll never get anything done. Again we need a framework to manage this.\n\nSo we wanted to change. We don’t have all the answers pre-baked, so we have been running experiments to see what improvements we can make, so that we can catch up with the issue growth, and over take it, rather than having to hire another 100 people.\n
  • And that’s how the managers feel, and a good overall idea of the problem. But the ways this affects the 100+ Support Engineers is an important factor too. \n\nSilo-ed - as we did not have any way to see the work others were doing, we were becoming increasingly silo-ed in our work, focussing on our issues only and not collaborating as much as we could be.\n\nIt’s stressful. Customers are awesome, but customer support at Atlassian and a lot of technical places is a very interesting mix of highly technical skills along with some very untangible people skills. Throw in a new release with crazy new technology that you can’t Google because we made it up, a rising pool of new issues to handle, and the occasional Critical issue to respond to, it can be quite a stressful place if you don’t have a good framework to handle this in.\n\nFinally the interruptions. What if I have an issue that is going to take me half a day to solve. A weird data integrity issue causing index corruption. I might need to restore data multiple times into multiple instances, understand the problem, replicate it, dig really deep to find the cause and then formulate a resolution. Test that resolution after deploying the data yet again and see that it works, before we can get back to a customer with an answer.\n\nBut what if a critical comes into the queue, or a phone rings, or someone walks up to ask if Support has seen any patterns around this particular issue, or a customer gets sick of waiting and escalates their issue.\n\nWe’re interrupt driven and we need to ensure that we can work with this. But at 100+ people, if we all jump onto each interrupt at once, we’ll never get anything done. Again we need a framework to manage this.\n\nSo we wanted to change. We don’t have all the answers pre-baked, so we have been running experiments to see what improvements we can make, so that we can catch up with the issue growth, and over take it, rather than having to hire another 100 people.\n
  • And that’s how the managers feel, and a good overall idea of the problem. But the ways this affects the 100+ Support Engineers is an important factor too. \n\nSilo-ed - as we did not have any way to see the work others were doing, we were becoming increasingly silo-ed in our work, focussing on our issues only and not collaborating as much as we could be.\n\nIt’s stressful. Customers are awesome, but customer support at Atlassian and a lot of technical places is a very interesting mix of highly technical skills along with some very untangible people skills. Throw in a new release with crazy new technology that you can’t Google because we made it up, a rising pool of new issues to handle, and the occasional Critical issue to respond to, it can be quite a stressful place if you don’t have a good framework to handle this in.\n\nFinally the interruptions. What if I have an issue that is going to take me half a day to solve. A weird data integrity issue causing index corruption. I might need to restore data multiple times into multiple instances, understand the problem, replicate it, dig really deep to find the cause and then formulate a resolution. Test that resolution after deploying the data yet again and see that it works, before we can get back to a customer with an answer.\n\nBut what if a critical comes into the queue, or a phone rings, or someone walks up to ask if Support has seen any patterns around this particular issue, or a customer gets sick of waiting and escalates their issue.\n\nWe’re interrupt driven and we need to ensure that we can work with this. But at 100+ people, if we all jump onto each interrupt at once, we’ll never get anything done. Again we need a framework to manage this.\n\nSo we wanted to change. We don’t have all the answers pre-baked, so we have been running experiments to see what improvements we can make, so that we can catch up with the issue growth, and over take it, rather than having to hire another 100 people.\n
  • \n\nMeanwhile, in development, the GreenHopper team had just finished putting their final touches on the Kanban Rapid Board. This was exciting for us, mainly because as they built it, they told us what it was all about. Going to regular demos to see the new shiny features and bugs that had been fixed and being told as we went, why these things were needed and what the use case was really helped cement in our support minds what it all meant. This was great because a lot of us had not had experience in a proper agile environment. And Kanban looked like the right combination of simplicity and flexibility that we needed to frame our work into.\nSo we read up on this some more and found that step one, is always visualising our workflow. And for step one, this changed a lot about how we thought of our queue, and the issues in it.\n
  • \n
  • \n
  • \n
  • \n
  • Traditionally, this is how our queue was described.\n\nEither it was a new issue, that had just been created by a customer and was not assigned to anyone. Or it didn’t matter.\n
  • Traditionally, this is how our queue was described.\n\nEither it was a new issue, that had just been created by a customer and was not assigned to anyone. Or it didn’t matter.\n
  • Traditionally, this is how our queue was described.\n\nEither it was a new issue, that had just been created by a customer and was not assigned to anyone. Or it didn’t matter.\n
  • Traditionally, this is how our queue was described.\n\nEither it was a new issue, that had just been created by a customer and was not assigned to anyone. Or it didn’t matter.\n
  • But from a more grass roots level, the engineers, this is how it looked.\n\nBut when we sat down and thought about every stage an issue can be in, we found it was pretty epic.\n
  • And it wasn’t just the statuses themselves, but the movement through them.\n
  • An unassigned issue gets assigned and then goes into my issue pool. From there I respond to it and it waits for a customer. The customer can respond to it, putting it back into my issue pool. I could determine this is too hard, and escalate.\n\nOr the customer could escalate because I am taking too long. Or they could escalate because my response was not suitable.\n\nOnce escalated we send it back to the customer. And from there the customer can say it’s solved and closed. Once an issue is closed, we do a post mortem and wrap it up, when we finally consider it “Done”.\n
  • An unassigned issue gets assigned and then goes into my issue pool. From there I respond to it and it waits for a customer. The customer can respond to it, putting it back into my issue pool. I could determine this is too hard, and escalate.\n\nOr the customer could escalate because I am taking too long. Or they could escalate because my response was not suitable.\n\nOnce escalated we send it back to the customer. And from there the customer can say it’s solved and closed. Once an issue is closed, we do a post mortem and wrap it up, when we finally consider it “Done”.\n
  • An unassigned issue gets assigned and then goes into my issue pool. From there I respond to it and it waits for a customer. The customer can respond to it, putting it back into my issue pool. I could determine this is too hard, and escalate.\n\nOr the customer could escalate because I am taking too long. Or they could escalate because my response was not suitable.\n\nOnce escalated we send it back to the customer. And from there the customer can say it’s solved and closed. Once an issue is closed, we do a post mortem and wrap it up, when we finally consider it “Done”.\n
  • An unassigned issue gets assigned and then goes into my issue pool. From there I respond to it and it waits for a customer. The customer can respond to it, putting it back into my issue pool. I could determine this is too hard, and escalate.\n\nOr the customer could escalate because I am taking too long. Or they could escalate because my response was not suitable.\n\nOnce escalated we send it back to the customer. And from there the customer can say it’s solved and closed. Once an issue is closed, we do a post mortem and wrap it up, when we finally consider it “Done”.\n
  • An unassigned issue gets assigned and then goes into my issue pool. From there I respond to it and it waits for a customer. The customer can respond to it, putting it back into my issue pool. I could determine this is too hard, and escalate.\n\nOr the customer could escalate because I am taking too long. Or they could escalate because my response was not suitable.\n\nOnce escalated we send it back to the customer. And from there the customer can say it’s solved and closed. Once an issue is closed, we do a post mortem and wrap it up, when we finally consider it “Done”.\n
  • An unassigned issue gets assigned and then goes into my issue pool. From there I respond to it and it waits for a customer. The customer can respond to it, putting it back into my issue pool. I could determine this is too hard, and escalate.\n\nOr the customer could escalate because I am taking too long. Or they could escalate because my response was not suitable.\n\nOnce escalated we send it back to the customer. And from there the customer can say it’s solved and closed. Once an issue is closed, we do a post mortem and wrap it up, when we finally consider it “Done”.\n
  • An unassigned issue gets assigned and then goes into my issue pool. From there I respond to it and it waits for a customer. The customer can respond to it, putting it back into my issue pool. I could determine this is too hard, and escalate.\n\nOr the customer could escalate because I am taking too long. Or they could escalate because my response was not suitable.\n\nOnce escalated we send it back to the customer. And from there the customer can say it’s solved and closed. Once an issue is closed, we do a post mortem and wrap it up, when we finally consider it “Done”.\n
  • An unassigned issue gets assigned and then goes into my issue pool. From there I respond to it and it waits for a customer. The customer can respond to it, putting it back into my issue pool. I could determine this is too hard, and escalate.\n\nOr the customer could escalate because I am taking too long. Or they could escalate because my response was not suitable.\n\nOnce escalated we send it back to the customer. And from there the customer can say it’s solved and closed. Once an issue is closed, we do a post mortem and wrap it up, when we finally consider it “Done”.\n
  • An unassigned issue gets assigned and then goes into my issue pool. From there I respond to it and it waits for a customer. The customer can respond to it, putting it back into my issue pool. I could determine this is too hard, and escalate.\n\nOr the customer could escalate because I am taking too long. Or they could escalate because my response was not suitable.\n\nOnce escalated we send it back to the customer. And from there the customer can say it’s solved and closed. Once an issue is closed, we do a post mortem and wrap it up, when we finally consider it “Done”.\n
  • An unassigned issue gets assigned and then goes into my issue pool. From there I respond to it and it waits for a customer. The customer can respond to it, putting it back into my issue pool. I could determine this is too hard, and escalate.\n\nOr the customer could escalate because I am taking too long. Or they could escalate because my response was not suitable.\n\nOnce escalated we send it back to the customer. And from there the customer can say it’s solved and closed. Once an issue is closed, we do a post mortem and wrap it up, when we finally consider it “Done”.\n
  • An unassigned issue gets assigned and then goes into my issue pool. From there I respond to it and it waits for a customer. The customer can respond to it, putting it back into my issue pool. I could determine this is too hard, and escalate.\n\nOr the customer could escalate because I am taking too long. Or they could escalate because my response was not suitable.\n\nOnce escalated we send it back to the customer. And from there the customer can say it’s solved and closed. Once an issue is closed, we do a post mortem and wrap it up, when we finally consider it “Done”.\n
  • When we used GreenHopper to map our issues out like this, we nearly had a heart attack\n\nThe overwhelming majority of our issues were sitting there waiting for customer. This is essentially a hidden queue of ticking time bombs, waiting to go off and come back to us as more work to be done. Or not, it might already be done. We didn’t know immediately how many of those came back, we’d never looked at it before.\n\nSo we took a moment to get some numbers on this column, and found that over a 24 hour period, one third of these issues come back to us.\n\nSo normally our incoming workload is judged on the unassigned issues, plus a vague estimate of our existing issues in the my issues column. Now we realised we were missing this giant chunk, one third of the waiting for customer issues should be in that estimate for the day.\n\nOn top of this, the focus away from the two status concept to a 6 status workflow, helped us focus more on pulling issues to Done, rather than pushing issues to “everything else” like we were before.\n\n\n
  • Add arrow pointing from WFC to My Issues and highlight that with the % returning\n\nWhen we used GreenHopper to map our issues out like this, we nearly had a heart attack\n\nThe overwhelming majority of our issues were sitting there waiting for customer. This is essentially a hidden queue of ticking time bombs, waiting to go off and come back to us as more work to be done. Or not, it might already be done. We didn’t know immediately how many of those came back, we’d never looked at it before.\n\nSo we took a moment to get some numbers on this column, and found that over a 24 hour period, one third of these issues come back to us.\n\nSo normally our incoming workload is judged on the unassigned issues, plus a vague estimate of our existing issues in the my issues column. Now we realised we were missing this giant chunk, one third of the waiting for customer issues should be in that estimate for the day.\n\nOn top of this, the focus away from the two status concept to a 6 status workflow, helped us focus more on pulling issues to Done, rather than pushing issues to “everything else” like we were before.\n\n\n
  • Add arrow pointing from WFC to My Issues and highlight that with the % returning\n\nWhen we used GreenHopper to map our issues out like this, we nearly had a heart attack\n\nThe overwhelming majority of our issues were sitting there waiting for customer. This is essentially a hidden queue of ticking time bombs, waiting to go off and come back to us as more work to be done. Or not, it might already be done. We didn’t know immediately how many of those came back, we’d never looked at it before.\n\nSo we took a moment to get some numbers on this column, and found that over a 24 hour period, one third of these issues come back to us.\n\nSo normally our incoming workload is judged on the unassigned issues, plus a vague estimate of our existing issues in the my issues column. Now we realised we were missing this giant chunk, one third of the waiting for customer issues should be in that estimate for the day.\n\nOn top of this, the focus away from the two status concept to a 6 status workflow, helped us focus more on pulling issues to Done, rather than pushing issues to “everything else” like we were before.\n\n\n
  • Monday to Friday resolve rate - issues are commonly logged early in the week and resolved at the end, this gives us a shape to our week.\n\nInversely, we strive to only time out issues on a weekday, not on weekends, so the longer running issues get a \n
  • Monday to Friday resolve rate - issues are commonly logged early in the week and resolved at the end, this gives us a shape to our week.\n\nInversely, we strive to only time out issues on a weekday, not on weekends, so the longer running issues get a \n
  • Monday to Friday resolve rate - issues are commonly logged early in the week and resolved at the end, this gives us a shape to our week.\n\nInversely, we strive to only time out issues on a weekday, not on weekends, so the longer running issues get a \n
  • Monday to Friday resolve rate - issues are commonly logged early in the week and resolved at the end, this gives us a shape to our week.\n\nInversely, we strive to only time out issues on a weekday, not on weekends, so the longer running issues get a \n
  • Monday to Friday resolve rate - issues are commonly logged early in the week and resolved at the end, this gives us a shape to our week.\n\nInversely, we strive to only time out issues on a weekday, not on weekends, so the longer running issues get a \n
  • Monday to Friday resolve rate - issues are commonly logged early in the week and resolved at the end, this gives us a shape to our week.\n\nInversely, we strive to only time out issues on a weekday, not on weekends, so the longer running issues get a \n
  • Monday to Friday resolve rate - issues are commonly logged early in the week and resolved at the end, this gives us a shape to our week.\n\nInversely, we strive to only time out issues on a weekday, not on weekends, so the longer running issues get a \n
  • Monday to Friday resolve rate - issues are commonly logged early in the week and resolved at the end, this gives us a shape to our week.\n\nInversely, we strive to only time out issues on a weekday, not on weekends, so the longer running issues get a \n
  • Next we took at look at where we were spending time in this process.\n\nOne of the most important aspects of agile development is the focus on the important of a team, and as a global team of a 100+ people, we should really be leveraging that.\n\nNow, the time we spend on each task is something we’d often thought about, but not as a team, usually we thought of it as individuals.\n
  • Traditionally, this is how it went. We’d look at the top issue in the queue. Then we’d determine if we can solve it. Is it something I’m good at? Is it something that Michael is good at. Is it something that I know well, but Michael could learn from? Do I have enough time to do this and the other issues in my queue. How long has it been there for and what priority is it? Can it wait another hour, or does it need a response now? Etc.\n\nThis was like a mini planning meeting going on inside the head of each engineer, determining if this is the right issue to take.\n\nThis system had major flaws. What if an issue was too hard for anyone to do? Or if the person that would be best to handle it is too overloaded with other issues? Well, as the issue sat their longer and got closer to the SLA, we’d be more likely to hand it to the next available engineer, regardless of some or all of the previous considerations. This might be a stressed engineer now with too many issues, or a stressed engineer with an issue they don’t feel comfortable with.\n
  • On top of this, the biggest waste here was that one issue could be reviewed multiple times over. Say it takes us 10 minutes to read the issue, understand the problem, peek at the logs a bit and determine if it’s the issue for me or for someone else. If five engineers do this, we’ve lost nearly an hour right away. Now our magic number for new issues per engineer per day is approximately five issues. If we’re taking a look at 7 issues to find those five, then we’ve potentially lost 10-20 minutes a day, per engineer. If just 70 of the 100+ people in support are full time support engineers taking issues, then we’re losing over 11 man hours a day!\n
  • On top of this, the biggest waste here was that one issue could be reviewed multiple times over. Say it takes us 10 minutes to read the issue, understand the problem, peek at the logs a bit and determine if it’s the issue for me or for someone else. If five engineers do this, we’ve lost nearly an hour right away. Now our magic number for new issues per engineer per day is approximately five issues. If we’re taking a look at 7 issues to find those five, then we’ve potentially lost 10-20 minutes a day, per engineer. If just 70 of the 100+ people in support are full time support engineers taking issues, then we’re losing over 11 man hours a day!\n
  • We wanted to do more with less\n
  • We now split the duties up to allow greater focus\n\n* One person looking at issues\n* One person who knows the strengths and weaknesses of the team members\n* Assigns out issues\n* Rotating\n
  • We now split the duties up to allow greater focus\n\n* One person looking at issues\n* One person who knows the strengths and weaknesses of the team members\n* Assigns out issues\n* Rotating\n
  • Two to three times a night, we get to a stage where the critical and higher priority issues are all handled, but the non-critical issues are pooling at the bottom. We have between 8 and 24 hours to respond to these so we leave them alone intentionally until we can bulk dispatch them.\n\nThe team gathers around the Dispatchers computer, and the dispatcher, having already reviewed all the cases knows what they are and goes through one by one calling out a headline summary: “LDAP Configuration issue”, “Plugin installation problem” etc, and the team members sticks up their hand to grab them as they come through. If you are already working on an existing LDAP case, then you have the right environment and tools already setup and loaded, so it makes sense to grab the LDAP one before anyone else does.\n\nIncidentally, this is the first time I saw support engineers leaping at the chance to troubleshoot LDAP problems.\n\nIf there are two or more similar issues we assign them to the same person. Whilst we don’t cut and paste the response, the hard part of the work is the investigating and finding an answer. Two customers means two sets of logs, better opportunities to find patterns and rule configurations in or out. Less waste and better results in the one move.\n\nEngineers leave this standup once they reach their capacity and return to their own personal backlogs.\n\n\n
  • Two to three times a night, we get to a stage where the critical and higher priority issues are all handled, but the non-critical issues are pooling at the bottom. We have between 8 and 24 hours to respond to these so we leave them alone intentionally until we can bulk dispatch them.\n\nThe team gathers around the Dispatchers computer, and the dispatcher, having already reviewed all the cases knows what they are and goes through one by one calling out a headline summary: “LDAP Configuration issue”, “Plugin installation problem” etc, and the team members sticks up their hand to grab them as they come through. If you are already working on an existing LDAP case, then you have the right environment and tools already setup and loaded, so it makes sense to grab the LDAP one before anyone else does.\n\nIncidentally, this is the first time I saw support engineers leaping at the chance to troubleshoot LDAP problems.\n\nIf there are two or more similar issues we assign them to the same person. Whilst we don’t cut and paste the response, the hard part of the work is the investigating and finding an answer. Two customers means two sets of logs, better opportunities to find patterns and rule configurations in or out. Less waste and better results in the one move.\n\nEngineers leave this standup once they reach their capacity and return to their own personal backlogs.\n\n\n
  • Two to three times a night, we get to a stage where the critical and higher priority issues are all handled, but the non-critical issues are pooling at the bottom. We have between 8 and 24 hours to respond to these so we leave them alone intentionally until we can bulk dispatch them.\n\nThe team gathers around the Dispatchers computer, and the dispatcher, having already reviewed all the cases knows what they are and goes through one by one calling out a headline summary: “LDAP Configuration issue”, “Plugin installation problem” etc, and the team members sticks up their hand to grab them as they come through. If you are already working on an existing LDAP case, then you have the right environment and tools already setup and loaded, so it makes sense to grab the LDAP one before anyone else does.\n\nIncidentally, this is the first time I saw support engineers leaping at the chance to troubleshoot LDAP problems.\n\nIf there are two or more similar issues we assign them to the same person. Whilst we don’t cut and paste the response, the hard part of the work is the investigating and finding an answer. Two customers means two sets of logs, better opportunities to find patterns and rule configurations in or out. Less waste and better results in the one move.\n\nEngineers leave this standup once they reach their capacity and return to their own personal backlogs.\n\n\n
  • Two to three times a night, we get to a stage where the critical and higher priority issues are all handled, but the non-critical issues are pooling at the bottom. We have between 8 and 24 hours to respond to these so we leave them alone intentionally until we can bulk dispatch them.\n\nThe team gathers around the Dispatchers computer, and the dispatcher, having already reviewed all the cases knows what they are and goes through one by one calling out a headline summary: “LDAP Configuration issue”, “Plugin installation problem” etc, and the team members sticks up their hand to grab them as they come through. If you are already working on an existing LDAP case, then you have the right environment and tools already setup and loaded, so it makes sense to grab the LDAP one before anyone else does.\n\nIncidentally, this is the first time I saw support engineers leaping at the chance to troubleshoot LDAP problems.\n\nIf there are two or more similar issues we assign them to the same person. Whilst we don’t cut and paste the response, the hard part of the work is the investigating and finding an answer. Two customers means two sets of logs, better opportunities to find patterns and rule configurations in or out. Less waste and better results in the one move.\n\nEngineers leave this standup once they reach their capacity and return to their own personal backlogs.\n\n\n
  • Two to three times a night, we get to a stage where the critical and higher priority issues are all handled, but the non-critical issues are pooling at the bottom. We have between 8 and 24 hours to respond to these so we leave them alone intentionally until we can bulk dispatch them.\n\nThe team gathers around the Dispatchers computer, and the dispatcher, having already reviewed all the cases knows what they are and goes through one by one calling out a headline summary: “LDAP Configuration issue”, “Plugin installation problem” etc, and the team members sticks up their hand to grab them as they come through. If you are already working on an existing LDAP case, then you have the right environment and tools already setup and loaded, so it makes sense to grab the LDAP one before anyone else does.\n\nIncidentally, this is the first time I saw support engineers leaping at the chance to troubleshoot LDAP problems.\n\nIf there are two or more similar issues we assign them to the same person. Whilst we don’t cut and paste the response, the hard part of the work is the investigating and finding an answer. Two customers means two sets of logs, better opportunities to find patterns and rule configurations in or out. Less waste and better results in the one move.\n\nEngineers leave this standup once they reach their capacity and return to their own personal backlogs.\n\n\n
  • Two to three times a night, we get to a stage where the critical and higher priority issues are all handled, but the non-critical issues are pooling at the bottom. We have between 8 and 24 hours to respond to these so we leave them alone intentionally until we can bulk dispatch them.\n\nThe team gathers around the Dispatchers computer, and the dispatcher, having already reviewed all the cases knows what they are and goes through one by one calling out a headline summary: “LDAP Configuration issue”, “Plugin installation problem” etc, and the team members sticks up their hand to grab them as they come through. If you are already working on an existing LDAP case, then you have the right environment and tools already setup and loaded, so it makes sense to grab the LDAP one before anyone else does.\n\nIncidentally, this is the first time I saw support engineers leaping at the chance to troubleshoot LDAP problems.\n\nIf there are two or more similar issues we assign them to the same person. Whilst we don’t cut and paste the response, the hard part of the work is the investigating and finding an answer. Two customers means two sets of logs, better opportunities to find patterns and rule configurations in or out. Less waste and better results in the one move.\n\nEngineers leave this standup once they reach their capacity and return to their own personal backlogs.\n\n\n
  • Once we put GreenHopper onto Support.atlassian.com (internally known as SAC) we were able to instantly create new prototype rapid boards, tweak filters, swimlanes, quick filters, everything on that board could be very quickly changed and turned around in an instant. \n\nThis is huge for us, because Support.atlassian.com, like our service is a 24/7 instance and any minute it’s down, is bad enough. In the past we’ve needed to restart to apply changes, but since JIRA 4.4’s reloadable plugin system and GreenHopper’s Rapid Board have been in play, we’ve been able to make these changes much much faster\n\nWe settled on using Swimlanes to show assignees, with the top swimlane being unassigned issues\nQuick filters to show long running or high touch cases, since we are handling multiple issues a day these normally aren’t as pronounced during a traditional standup.\n\nNo restarts, no database changes, no installing of anything beyond the initial greenhopper install, and we could quickly test and iterate these changes.\n\n\n
  • We went from this\n
  • Talk about WIPs - trialling them\nTalk about quick filters\nTalk about Swimlanes\n
  • So what did the detailed results look like?\n
  • Now, as with anything we do in support, our primary goal is always customer satisfaction, but we can track the likelihood of that improving through improving the following metrics\nCSat\nHow quickly we respond to a ticket - we make a commitment to our customers that we will respond within a certain time frame, based on the priority of the ticket, and we measure our response time as a percentage of tickets that we achieve that goal on.\nEscalations. If we cannot answer a question, we escalate. This represent either a knowledge gap, or, far less often, a severe product problem. \nOur goal was to improve these metrics that track the day to day of our tickets, in the hope that improvements there would flow on to customer satisfaction results.\n\nOnce we got the ball rolling, we become so confident about these changes (because of the what the process told us about ourselves during that long hard look). In fact we were so confident, that we put these changes into play whilst launching a new major version of two of our products, so we were predicting a substantially high increase in support load from that. \n\nWe then compared those changes in that test team against the rest of the global team, to see if we had made improvements. And after a month, it looked a bit like this:\n\n
  • Traditionally we would measure CSat. And we still treat it as our primary metric. But if we want to know about something today, or understand a ticket that is still not yet resolved, the best way we can do this, is to find out immediately measurable stats and how they correlate to CSat to get an idea if the experiment is working or not.\n
  • Traditionally we would measure CSat. And we still treat it as our primary metric. But if we want to know about something today, or understand a ticket that is still not yet resolved, the best way we can do this, is to find out immediately measurable stats and how they correlate to CSat to get an idea if the experiment is working or not.\n
  • Traditionally we would measure CSat. And we still treat it as our primary metric. But if we want to know about something today, or understand a ticket that is still not yet resolved, the best way we can do this, is to find out immediately measurable stats and how they correlate to CSat to get an idea if the experiment is working or not.\n
  • Traditionally we would measure CSat. And we still treat it as our primary metric. But if we want to know about something today, or understand a ticket that is still not yet resolved, the best way we can do this, is to find out immediately measurable stats and how they correlate to CSat to get an idea if the experiment is working or not.\n
  • Our percentage of issues responded to within their SLA period has done nothing but steadily increase since we began this project. It has traditionally been higher than the other teams for the \n
  • Our percentage of issues responded to within their SLA period has done nothing but steadily increase since we began this project. It has traditionally been higher than the other teams for the \n
  • By grouping the issues by topic, we prevented escalations and bundled them too.\n\nAdditionally, less cases were escalated in the first place because we had more information for each problem, coming from multiple customers.\n\nMeanwhile, outside the trial this number increased a small amount. We were dealing with the release of a new product, JIRA 5.0, and this meant that we were discovering new things for the first time.\n
  • By grouping the issues by topic, we prevented escalations and bundled them too.\n\nAdditionally, less cases were escalated in the first place because we had more information for each problem, coming from multiple customers.\n\nMeanwhile, outside the trial this number increased a small amount. We were dealing with the release of a new product, JIRA 5.0, and this meant that we were discovering new things for the first time.\n
  • By grouping the issues by topic, we prevented escalations and bundled them too.\n\nAdditionally, less cases were escalated in the first place because we had more information for each problem, coming from multiple customers.\n\nMeanwhile, outside the trial this number increased a small amount. We were dealing with the release of a new product, JIRA 5.0, and this meant that we were discovering new things for the first time.\n
  • By grouping the issues by topic, we prevented escalations and bundled them too.\n\nAdditionally, less cases were escalated in the first place because we had more information for each problem, coming from multiple customers.\n\nMeanwhile, outside the trial this number increased a small amount. We were dealing with the release of a new product, JIRA 5.0, and this meant that we were discovering new things for the first time.\n
  • By grouping the issues by topic, we prevented escalations and bundled them too.\n\nAdditionally, less cases were escalated in the first place because we had more information for each problem, coming from multiple customers.\n\nMeanwhile, outside the trial this number increased a small amount. We were dealing with the release of a new product, JIRA 5.0, and this meant that we were discovering new things for the first time.\n
  • By grouping the issues by topic, we prevented escalations and bundled them too.\n\nAdditionally, less cases were escalated in the first place because we had more information for each problem, coming from multiple customers.\n\nMeanwhile, outside the trial this number increased a small amount. We were dealing with the release of a new product, JIRA 5.0, and this meant that we were discovering new things for the first time.\n
  • By grouping the issues by topic, we prevented escalations and bundled them too.\n\nAdditionally, less cases were escalated in the first place because we had more information for each problem, coming from multiple customers.\n\nMeanwhile, outside the trial this number increased a small amount. We were dealing with the release of a new product, JIRA 5.0, and this meant that we were discovering new things for the first time.\n
  • By grouping the issues by topic, we prevented escalations and bundled them too.\n\nAdditionally, less cases were escalated in the first place because we had more information for each problem, coming from multiple customers.\n\nMeanwhile, outside the trial this number increased a small amount. We were dealing with the release of a new product, JIRA 5.0, and this meant that we were discovering new things for the first time.\n
  • The non trial team was not impacted in any way. In fact, most of the team in Sydney had no idea it was happening.\n
  • The non trial team was not impacted in any way. In fact, most of the team in Sydney had no idea it was happening.\n
  • The non trial team was not impacted in any way. In fact, most of the team in Sydney had no idea it was happening.\n
  • The non trial team was not impacted in any way. In fact, most of the team in Sydney had no idea it was happening.\n
  • \n
  • Where we are taking this\n
  • \n
  • \n
  • \n
  • \n
  • So how can you do this with your team.\n
  • Start by visualising this. If you’re already using JIRA, then you need to install GreenHopper the minute you get back to your office. Grab a free 30 day trial if you need to.\n\nCreate a rapid board with your projects you want to look at in there.\n\nMake the columns you need and map your workflow to those columns\n
  • Start by visualising this. If you’re already using JIRA, then you need to install GreenHopper the minute you get back to your office. Grab a free 30 day trial if you need to.\n\nCreate a rapid board with your projects you want to look at in there.\n\nMake the columns you need and map your workflow to those columns\n
  • Start by visualising this. If you’re already using JIRA, then you need to install GreenHopper the minute you get back to your office. Grab a free 30 day trial if you need to.\n\nCreate a rapid board with your projects you want to look at in there.\n\nMake the columns you need and map your workflow to those columns\n
  • Look at what you can see with both the Rapid Board visualisations, but also the reporting and charting\n\n
  • Look at what you can see with both the Rapid Board visualisations, but also the reporting and charting\n\n
  • Look at what you can see with both the Rapid Board visualisations, but also the reporting and charting\n\n
  • \n
  • \n
  • \n
  • \n
  • By using what we’d learnt about Kanban we’ve improved our support offerings, in a way that will scale to our needs\n
  • By using what we’d learnt about Kanban we’ve improved our support offerings, in a way that will scale to our needs\n
  • By using what we’d learnt about Kanban we’ve improved our support offerings, in a way that will scale to our needs\n
  • \n
  • \n

Transcript

  • 1. Scaling a Global Support TeamTo resolve 1,500 requests a week usingKanban and GreenHopperChris LePetitService Enablement Engineer, Atlassian
  • 2. • Background• The Experiment• The Results• Where to go next
  • 3. Background
  • 4. Who We Are• Grown from 70 to 100+ staff• Five Locations• 24/7 Availability• Up from 1k to 1.5k issues per week• Support with every product
  • 5. The Problem
  • 6. Our Problem• Scale
  • 7. Our Problem• Scale
  • 8. Our Problem• Scale 2011 2012 Staff Growth New Issues Growth
  • 9. Affect on Engineers
  • 10. Affect on Engineers• Silo-ed
  • 11. Affect on Engineers• Silo-ed• Stressed
  • 12. Affect on Engineers• Silo-ed• Stressed• More Interruptions
  • 13. The Experiment
  • 14. Where and when
  • 15. Where and when• One trial team - Kuala Lumpur JIRA Team
  • 16. Where and when• One trial team - Kuala Lumpur JIRA Team• Trial during JIRA 5 / GreenHopper 5.9 launch
  • 17. Where and when• One trial team - Kuala Lumpur JIRA Team• Trial during JIRA 5 / GreenHopper 5.9 launch • Support processes must withstand a launch
  • 18. 1. Visualise the Workflow
  • 19. Unassigned Issues Everything Else
  • 20. Unassigned Issues My Issues Everything Else
  • 21. Unassigned Issues My Issues Waiting for Customer Escalated Resolved Done
  • 22. Unassigned Issues My Issues Waiting for Customer Escalated Resolved Done
  • 23. Unassigned Issues My Issues Waiting for Customer Escalated Resolved Done
  • 24. Unassigned Issues My Issues Waiting for Customer Escalated Resolved Done
  • 25. Unassigned Issues My Issues Waiting for Customer Escalated Resolved Done
  • 26. Unassigned Issues My Issues Waiting for Customer Escalated Resolved Done
  • 27. Unassigned Issues My Issues Waiting for Customer Escalated Resolved Done
  • 28. Unassigned Issues My Issues Waiting for Customer Escalated Resolved Done
  • 29. Unassigned Issues My Issues Waiting for Customer Escalated Resolved Done 3% 19% 65% < 1% 7%
  • 30. Unassigned Issues My Issues Waiting for Customer Escalated Resolved Done 3% 19% 65% < 1% 7%
  • 31. Unassigned Issues My Issues Waiting for Customer Escalated Resolved Done 3% 19% 65% < 1% 7% 33% return in 24 hours
  • 32. Charting
  • 33. Charting
  • 34. Monday-Friday resolve rateCharting
  • 35. Monday-Friday resolve rateCharting
  • 36. Monday-Friday resolve rate New YearCharting
  • 37. Monday-Friday resolve rate New YearCharting
  • 38. Monday-Friday resolve rate New YearCharting Monday - Friday
  • 39. Monday-Friday resolve rate New YearCharting Monday - Friday
  • 40. Monday-Friday resolve rate New YearCharting 7 Day Timeout Monday - Friday
  • 41. 2. Identify Waste
  • 42. How we used to work
  • 43. How we used to work• Look at issue in detail • Can I solve this or will someone else be better? • Can I or someone else learn from this? • Are we going to respond in time?• Compare global and personal queue • Work on the busiest
  • 44. Wasteful
  • 45. Wasteful• One issue could be reviewed multiple times• Five engineers taking 10 minutes each• No wonder it’s busy
  • 46. Wasteful • One issue could be reviewed multiple times • Five engineers taking 10 minutes each • No wonder it’s busy los t a d ay ! man h o u rs s = Ov e r 11 r t Eng i ne e r 70 S u pp o 0 min u te s X1
  • 47. 3. Remove Waste
  • 48. Dispatcher Vs Engineer
  • 49. Dispatcher Vs Engineer• Dispatcher • Triage New Issues • Monitor Critical issues • Monitor Escalations
  • 50. Dispatcher Vs Engineer• Dispatcher • Engineers • Triage New Issues • Only their 5-6 Issues • Monitor Critical issues • Monitor Escalations
  • 51. Standups
  • 52. Standups• Run by the Dispatcher
  • 53. Standups• Run by the Dispatcher• Bulk assign issues
  • 54. Standups• Run by the Dispatcher• Bulk assign issues• Reduce context switching
  • 55. Standups• Run by the Dispatcher• Bulk assign issues• Reduce context switching
  • 56. Standups• Run by the Dispatcher• Bulk assign issues• Reduce context switching
  • 57. Standups• Run by the Dispatcher• Bulk assign issues• Reduce context switching
  • 58. GreenHopper Awesome
  • 59. GreenHopper Awesome• No downtime• Setup a Rapid Board• Live Prototyping and Configuration Changes
  • 60. Old Wallboard
  • 61. GreenHopper Rapid Board
  • 62. Results
  • 63. Measuring Success
  • 64. Measuring Success• Response Time • Percentage of issues responded to within SLA• Escalation Rate • Percentage of issues that need input from developers
  • 65. Why not Customer Satisfaction?
  • 66. Why not Customer Satisfaction?• Sample size is smaller
  • 67. Why not Customer Satisfaction?• Sample size is smaller• Up to one month delay
  • 68. Why not Customer Satisfaction?• Sample size is smaller• Up to one month delay• Not suitable for measuring experiments
  • 69. Why not Customer Satisfaction?• Sample size is smaller• Up to one month delay• Not suitable for measuring experiments• Direct correlations to Response time and Escalation rate
  • 70. Response Time
  • 71. Response Time96%93%91%88%85% Pre-Trial Week 1 Week 2 Week 3 Week 4
  • 72. Response Time96%93%91%88%85% Pre-Trial Week 1 Week 2 Week 3 Week 4 Trial Team Non-Trial Team
  • 73. Escalations Trial Team Non-trial Teams
  • 74. Escalations Trial Team Non-trial Teams5.0%3.8%2.5%1.3% 0% January February
  • 75. Escalations Trial Team Non-trial Teams5.0% 2.43%3.8% 1.82%2.5% 1.22%1.3% 0.61% 0% 0% January February January February
  • 76. Escalations Trial Team Non-trial Teams5.0% 2.43% 46.6% Drop!3.8% 1.82%2.5% 1.22%1.3% 0.61% 0% 0% January February January February
  • 77. Anecdotal Results
  • 78. Anecdotal Results• Engineers are feeling more in control
  • 79. Anecdotal Results• Engineers are feeling more in control• Less Stress
  • 80. Anecdotal Results• Engineers are feeling more in control• Less Stress• High energy after a Wallboard Standup
  • 81. Anecdotal Results• Engineers are feeling more in control• Less Stress• High energy after a Wallboard Standup• No impact on non-trial team
  • 82. Issues handled
  • 83. Issues handled• JIRA 5.0 and GreenHopper 5.9 Release• 9.6% increase in issues handled during the trial month • Results still improved
  • 84. Future
  • 85. Support Kanban Experiment
  • 86. Support Kanban Experiment• Standard Process for two teams now
  • 87. Support Kanban Experiment• Standard Process for two teams now• Rolling it out globally as the new standard
  • 88. Support Kanban Experiment• Standard Process for two teams now• Rolling it out globally as the new standard• Trialling WIP limits
  • 89. Support Kanban Experiment• Standard Process for two teams now• Rolling it out globally as the new standard• Trialling WIP limits• Reducing Timeout Length
  • 90. Where do I start?
  • 91. 1. Visualise
  • 92. 1. Visualise• Install GreenHopper
  • 93. 1. Visualise• Install GreenHopper• Create a Rapid Board
  • 94. 1. Visualise• Install GreenHopper• Create a Rapid Board
  • 95. 2. Identify Waste
  • 96. 2. Identify Waste• Find the biggest time vampires
  • 97. 2. Identify Waste• Find the biggest time vampires• Find Overlaps
  • 98. 2. Identify Waste• Find the biggest time vampires• Find Overlaps• Team vs Individuals
  • 99. 3. Remove Waste
  • 100. 3. Remove Waste• Use a dispatcher
  • 101. 3. Remove Waste• Use a dispatcher• Bulk issues by type whenever possible
  • 102. 3. Remove Waste• Use a dispatcher• Bulk issues by type whenever possible• Look at the reality of the work
  • 103. 3. Remove Waste• Use a dispatcher• Bulk issues by type whenever possible• Look at the reality of the work• Re-learn the work you do
  • 104. Summary
  • 105. Summary• Scaled our capacity
  • 106. Summary• Scaled our capacity• Improved Quality
  • 107. Summary• Scaled our capacity• Improved Quality• No negative impact to other teams
  • 108. Contact meTwitter: @clepetitEmail: clepetit@atlassian.comChris LePetitService Enablement Engineer, Atlassian
  • 109. Thank you!