Data-Driven Operations - Practice realtime data analyse

534 views

Published on

Grab data from any of logs and operations in realtime. Enable the power to find problem instantly. And make all operations based on data.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
534
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data-Driven Operations - Practice realtime data analyse

  1. 1. Data-Driven Operations Practice realtime data analyse @khsing
  2. 2. Who am I • Currently, I am a operations architect in SINA. • Focus on automation tools and devops method
  3. 3. What kind of data is for operations?
  4. 4. Before we talk data
  5. 5. How is one day of ops?
  6. 6. • Check the Dashboard and looks good. • Start work, write scripts or configurations • Suddenly, Receiving alert SMS/Email or problem reported by CS. • Start work with event/problem/outage
  7. 7. You are the Fireman http://www.flickr.com/photos/40699207@N05/3838012090/
  8. 8. Find the problem • take a look at Dashboard, Nagios, and monitor • grep logs from hundreds of host. • watch the network diagram • guess what is going wrong
  9. 9. Driven by problem
  10. 10. Passive
  11. 11. Be Active
  12. 12. Let’s talk data
  13. 13. datas • Logs • Access log, error log, exception log, step log • Configuration Change log, Release log • Performance Measurement • Product operations data.
  14. 14. Logs • Success is useless. • Error is useful.
  15. 15. Process logs • Realtime or near realtime take big benefit • You can’t waste 1 hour when problem really happen • You have to feel problem before too many users blame.
  16. 16. Process Logs • Automatically category.
  17. 17. Normal logs
  18. 18. Categorised logs
  19. 19. Performance Measurement • How fast when end-user visit our website? • Where are they come from? • Which datacenter are they visited? • What the slow/fast user ratio?
  20. 20. Product Operations Data • like DAU • Drop, Spike, Increase are event, need take action.
  21. 21. Change/Release log • Many problem come with Change or Release • You have to watch those data after you did a change or release. • Change/Release log have to visible on dashboard.
  22. 22. Change/Release log
  23. 23. Be active
  24. 24. Don’t defensive
  25. 25. Attack is the best form of defence –Olbrich Desouza
  26. 26. Tools • Splunk - commercial • Logstash, ElasticSearch, Kibana • Graphite • StatsD
  27. 27. Q&A

×