Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kubernetes debug like a pro

43 views

Published on

Go fit perfectly inside containers, you can ship apps as tiny images on k8s, distributing them across the globe. Gianluca will show how InfluxData debugs containers running on Kubernetes to allow sysadmins and developers to troubleshoot and replicate issues using core dump, debuggers, and logs.

Go applications are perfect to be run inside a container. You can build a single binary, a tiny Docker image and you can ship them on your Kubernetes cluster. A successful production environment requires stability and simplicity, it needs to be easy to troubleshoot and operators need to be able to get all the information developers will need to fix a bug. During this talk, Gianluca will share what influxData is doing to allow developers and system administrator to work together, understanding problems running live at scale on Kubernetes and how to escalate them down to Software Engineer using logs, delve, gdb, core dumps, and traces to replicate and fix issues.

Published in: Software
  • Be the first to comment

Kubernetes debug like a pro

  1. 1. ~ @gianarb - https://gianarb.it ~ Debug like a pro on Kubernetes Golab - Florence - 2018
  2. 2. ~ @gianarb - https://gianarb.it ~ Gianluca Arbezzano Site Reliability Engineer @InfluxData ● http://gianarb.it ● @gianarb What I like: ● I make dirty hacks that look awesome ● I grow my vegetables 🍅🌻🍆 ● Travel for fun and work
  3. 3. 1. Yo n ! Your team knows and use Docker for local development and testing 2. Kub te ! Everyone speaks about kubernetes. 3. Hir ! You don’t know why but you hired a DevOps that kind of know k8s. 3. Ex i m ! You are moving everything and everyone to kubernetes
  4. 4. Inspired by a true story
  5. 5. You need a good book 1. Short 2. Driven by experiences 3. Practical 4. Easy
  6. 6. We need to make our hands dirty
  7. 7. Spin up a cluster that you can break Bring developers in the loop
  8. 8. Deploy CI on Kubernetes Bring developers in the loop
  9. 9. Run your code in prod Bring developers in the loop
  10. 10. Don’t be scared and write your own tools!
  11. 11. K8s as code: From YAML to code (golang) 1. You have the ability to use Golang autocomplete as documentation, reference for every kubernetes resources 2. You feel less a YAML engineer (great feeling btw) 3. Code is better than YAML! You can reuse it, compile it, embed it in other projects.
  12. 12. K8s as code: From YAML to code (golang) Tiny cli to make the migration to golang Some manual refactoring
  13. 13. K8s as code: From YAML to code (golang) Tiny cli to make the migration to golang Some manual refactoring ● Continue to improve our CI to validate that YAML and Go file are the same, and the resources in Kubernetes are like the Go file. ● Maybe we will be able to remove the YAML at some point.
  14. 14. Examples ● Everything has an API because you should USE it TO make something good! (cURL is good but you can make something better) ● Some of our tools: ○ Backup and Restore Operator for Persistent Volumes ○ We have a service to create runtime isolated environment to allow devs to test or product people to have a safe environment to demo, try. We also use it for the integration and smoke tests. ○ We have a tool to replicate environment locally on minikube to install and configure all the dependencies
  15. 15. Instrumentation and Observability
  16. 16. We need to have processes and tools that give us the ability to take a real time picture of our system
  17. 17. Observability It is all about how we collect and aggregate the data
  18. 18. Normal state vs Current state
  19. 19. Bring developers in the loop You need knowledgeable devs to drive the team
  20. 20. Observability Events, metrics, logs and traces
  21. 21. Golang and Kubernetes: OpenCensus ● Open Source project sponsored by Google ● It is a SPEC plus a set of libraries in different languages to instrument your application ● To collect metrics, traces and events.
  22. 22. Golang and Kubernetes: OpenCensus Common Interface to get stats and traces from your app Different exporters to persist your data
  23. 23. gianarb.it ~ @gianarb # HELP http_requests_total The total number of HTTP requests. # TYPE http_requests_total counter http_requests_total{method="post",code="200"} 1027 1395066363000 http_requests_total{method="post",code="400"} 3 1395066363000 # Escaping in label values: msdos_file_access_time_seconds{path="C:DIRFILE.TXT",error="Cannot find file:n"FILE.TXT""} 1.458255915e9 # Minimalistic line: metric_without_timestamp_and_labels 12.47 # A weird metric from before the epoch: something_weird{problem="division by zero"} +Inf -3982045 # A histogram, which has a pretty complex representation in the text format: # HELP http_request_duration_seconds A histogram of the request duration. # TYPE http_request_duration_seconds histogram http_request_duration_seconds_bucket{le="0.05"} 24054 http_request_duration_seconds_bucket{le="0.1"} 33444 http_request_duration_seconds_bucket{le="0.2"} 100392 http_request_duration_seconds_bucket{le="0.5"} 129389 http_request_duration_seconds_bucket{le="1"} 133988 http_request_duration_seconds_bucket{le="+Inf"} 144320 http_request_duration_seconds_sum 53423 http_request_duration_seconds_count 144320
  24. 24. OpenMetrics v2 Prometheus exposition format
  25. 25. gianarb.it ~ @gianarb
  26. 26. gianarb.it ~ @gianarb func FetchMetricFamilies(url string, ch chan<- *dto.MetricFamily, certificate string, key string, skipServerCertCheck bool) error { defer close(ch) var transport *http.Transport if certificate != "" && key != "" { cert, err := tls.LoadX509KeyPair(certificate, key) if err != nil { return err } tlsConfig := &tls.Config{ Certificates: []tls.Certificate{cert}, InsecureSkipVerify: skipServerCertCheck, } tlsConfig.BuildNameToCertificate() transport = &http.Transport{TLSClientConfig: tlsConfig} } else { transport = &http.Transport{ TLSClientConfig: &tls.Config{InsecureSkipVerify: skipServerCertCheck}, } } client := &http.Client{Transport: transport} return decodeContent(client, url, ch) } https://github.com/prometheus/prom2json/blob/master/prom2json.go#L123
  27. 27. gianarb.it ~ @gianarb More to read ● OpenMetrics: https://github.com/OpenObservability/OpenMetrics ● OpenMetrics mailing list: https://groups.google.com/forum/#protectkern-. 1667emrelaxforum/openmetrics ● WIP branch for Python library https://github.com/prometheus/client_python/tree/openmetrics ● Thanks RICHARD for your work! I got some slides from here: https://promcon.io/2018-munich/slides/openmetrics-transforming-the-prometh eus-exposition-format-into-a-global-standard.pdf
  28. 28. Tracing How do you “tell stories” about concurrent systems?
  29. 29. How do you “tell stories” about concurrent systems?
  30. 30. OpenTracing
  31. 31. gianarb.it ~ @gianarb OpenTracing
  32. 32. I was just waiting for a new standard! cit. Troll
  33. 33. © 2017 InfluxData. All rights reserved.34 Typical problems with logs ¨ Which library do I need to use? ¨ Every library has a different format ¨ Every languages exposes a different format
  34. 34. © 2017 InfluxData. All rights reserved.35 Tracing is not something new ¨ There are vendors ¨ Every vendor has their own format
  35. 35. © 2017 InfluxData. All rights reserved.36 log log log log log log Parent Span Span Context / Baggage Child Child Child Span ¨ Spans - Basic unit of timing and causality. Can be tagged with key/value pairs. ¨ Logs - Structured data recorded on a span. ¨ Span Context - serializable format for linking spans across network boundaries. Carries baggage, such as a request and client IDs. ¨ Tracers - Anything that plugs into the OpenTracing API to record information. ¨ ZipKin, Jaeger, LightStep, others ¨ Also metrics (Prometheus) and logging
  36. 36. © 2017 InfluxData. All rights reserved.37 OpenTracing API application logic µ-service frameworks Lambda functions RPC & control-flow frameworks existing instrumentation tracing infrastructure main() I N S T A N A J a e g e r microservice process
  37. 37. © 2017 InfluxData. All rights reserved.38 import "github.com/opentracing/opentracing-go" import ".../some_tracing_impl" func main() { opentracing.SetGlobalTracer( // tracing impl specific: some_tracing_impl.New(...), ) ... } https://github.com/opentracing/opentracing-go Opentracing: Configure the GlobalTracer
  38. 38. © 2017 InfluxData. All rights reserved.39 func xyz(ctx context.Context, ...) { ... span, ctx := opentracing.StartSpanFromContext(ctx, "operation_name") defer span.Finish() span.LogFields( log.String("event", "soft error"), log.String("type", "cache timeout"), log.Int("waited.millis", 1500)) ... } https://github.com/opentracing/opentracing-go Opentracing: Create a Span from the Context
  39. 39. © 2017 InfluxData. All rights reserved.40 func xyz(parentSpan opentracing.Span, ...) { ... sp := opentracing.StartSpan( "operation_name", opentracing.ChildOf(parentSpan.Context())) defer sp.Finish() ... } https://github.com/opentracing/opentracing-go Opentracing: Create a Child Span
  40. 40. Golang and Kubernetes: pprof ● It is the Golang native profiler ● You can use it via the `go pprof` command ● `import "runtime/pprof"` writes runtime profiling data ● `import "net/http/pprof"` serves via HTTP server runtime profiling data
  41. 41. Golang and Kubernetes: pprof package main import ( "log" "net/http" _ "net/http/pprof" ) func main() { log.Println(http.ListenAndServe("localhost:6060", nil)) }
  42. 42. $ go tool pprof http://localhost:6060/debug/pprof/heap Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap Saved profile in /home/gianarb/pprof/pprof.main.alloc_objects.alloc_space.inuse_objects.inuse_space.009.pb.gz File: main Type: inuse_space Time: Oct 15, 2018 at 4:22pm (CEST) Entering interactive mode (type "help" for commands, "o" for options) (pprof) png Generating report in profile001.png
  43. 43. Golang and Kubernetes: pprof https://github.com/influxdata/influxdb/blob/4cbdc197b8117fee648d62e2e5be75c6575352f0/services/httpd/pprof.go // handleProfiles determines which profile to return to the requester. func (h *Handler) handleProfiles(w http.ResponseWriter, r *http.Request) { switch r.URL.Path { case "/debug/pprof/cmdline": httppprof.Cmdline(w, r) case "/debug/pprof/profile": httppprof.Profile(w, r) case "/debug/pprof/symbol": httppprof.Symbol(w, r) case "/debug/pprof/all": h.archiveProfilesAndQueries(w, r) default: httppprof.Index(w, r) } }
  44. 44. Golang and Kubernetes: pprof https://github.com/influxdata/influxdb/blob/4cbdc197b8117fee648d62e2e5be75c6575352f0/services/httpd/pprof.go // archiveProfilesAndQueries collects the following profiles: // - goroutine profile // - heap profile // - blocking profile // - mutex profile // - (optionally) CPU profile // // It also collects the following query results: // // - SHOW SHARDS // - SHOW STATS // - SHOW DIAGNOSTICS // // All information is added to a tar archive and then compressed, before being // returned to the requester as an archive file.
  45. 45. Golang and Kubernetes: pprof https://github.com/influxdata/influxdb/blob/4cbdc197b8117fee648d62e2e5be75c6575352f0/services/httpd/pprof.go var allProfs = []*prof{ {Name: "goroutine", Debug: 1}, {Name: "block", Debug: 1}, {Name: "mutex", Debug: 1}, {Name: "heap", Debug: 1}, }
  46. 46. gz := gzip.NewWriter(&resp) tw := tar.NewWriter(gz) // Collect and write out profiles. for _, profile := range allProfs { if profile.Name == "cpu" { if err := pprof.StartCPUProfile(&buf); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } sleep(w, time.Duration(profile.Debug)*time.Second) pprof.StopCPUProfile() } else { prof := pprof.Lookup(profile.Name) if prof == nil { http.Error(w, "unable to find profile "+profile.Name, http.StatusInternalServerError) return } if err := prof.WriteTo(&buf, int(profile.Debug)); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } }
  47. 47. Golang and Kubernetes: pprof https://github.com/influxdata/influxdb/blob/4cbdc197b8117fee648d62e2e5be75c6575352f0/services/httpd/pprof.go // Collect and write out the queries. var allQueries = []struct { name string fn func() ([]*models.Row, error) }{ {"shards", h.showShards}, {"stats", h.showStats}, {"diagnostics", h.showDiagnostics}, } tabW := tabwriter.NewWriter(&buf, 8, 8, 1, 't', 0) for _, query := range allQueries { rows, err := query.fn() if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) }
  48. 48. func (h *Handler) showDiagnostics() ([]*models.Row, error) { diags, err := h.Monitor.Diagnostics() if err != nil { return nil, err } // Get a sorted list of diagnostics keys. sortedKeys := make([]string, 0, len(diags)) for k := range diags { sortedKeys = append(sortedKeys, k) } sort.Strings(sortedKeys) rows := make([]*models.Row, 0, len(diags)) for _, k := range sortedKeys { row := &models.Row{Name: k} row.Columns = diags[k].Columns row.Values = diags[k].Rows rows = append(rows, row) } return rows, nil }
  49. 49. Remote Debugging in Kubernetes Delve, GDB
  50. 50. Remote Debugging In Kubernetes DOESN’T APPLY!
  51. 51. That’s not totally true. You can expose the debugger endpoint. It works. It doesn’t apply for production workload.
  52. 52. ~ @gianarb - https://gianarb.it ~ Thanks @gianarb

×