More Related Content Similar to Octo and the DevSecOps Evolution at Oracle by Ian Van Hoven (20) More from InfluxData (20) Octo and the DevSecOps Evolution at Oracle by Ian Van Hoven1. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Octo
AND THE DEVOPS EVOLUTION @ ORACLE
Ian Van Hoven, Oracle Engineering Services
InfluxDays NYC ✣ March 13, 2019
1
2. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Let me tell you a (user) story
3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle aims to achieve 99.99%+ availability and industry-
leading performance across the Cloud portfolio
To delight existing customers, attract new ones,
and enable Oracle teams to innovate
4. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Achieving our SLA goals requires evolutionary change in
how we monitor and measure our products
And – especially – a cultural shift toward metrics immersion,
supported and driven by ubiquitous, user-friendly tooling
5. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Octo will leverage proven industry-standard
technology demonstrably scalable to Oracle's
global footprint and volume
While complementing existing M&M tech … and allowing us
to leverage innovation and attract/retain top talent
6. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
The 1st order priorities are to improve Oracle Cloud
customer satisfaction & protect revenue
Commercialization of Octo is explicitly a 2nd order priority
7. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
SOUNDS EASY
8. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Scene from the Past
9. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Scene from the Past
10. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Looks good.
Let’s do that.
11. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Modern Day Silos Aplenty (Teams, Tools, Data)
12. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Modern Day I have that data. Also, you don’t get that data.
13. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Modern Day PreDevOps Monitoring Alert
14. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Legacy MindsetModern Day PreDevOps Math $fear{abstract} != $fear{concrete}
15. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Why would devs ever want to be on-call?
It would eliminate our incentive to ship.
Modern Day PreDevOps Mindset
16. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
If you don’t like
something…
17. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Octo Is
A stunningly innovative product acronym…
Oracle Cloud Telemetry (Oracle)
18. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Octo Also Is
A real-time streaming metrics framework...
• Conceptualized at May 2017 leadership offsite; GA since Thanksgiving 2018
• Aimed directly at enabling culture change of product ownership & metrics immersion across Oracle
• Aggregator & cross-correlator of discrete time series data: metrics from diverse sources in one view
• Sharing and collaboration tool to connect colleagues from teams across all of Oracle
• Easy to...
• FIND: goto/octo (browser alias), /go octo (Slack command)
• USE: up & running in seconds (read) or minutes (write)
• SHARE: include graph URLs & images in Slack, email, ticket, preso, wall-screens, etc.
• Common UX: Octo for Fusion looks/works like Octo for TOA, RNT, Eloqua, Taleo, etc.
• Simple, secure, declarative, RESTful write API – abstracted from data pipeline & b/e data tier
• Built with proven, scalable industry-standard tech: nginx, memcache, Kafka, InfluxDB/TICK, Grafana, etc.
• Integrated with standard Cloud-wide tools: Slack, PagerDuty, Akamai, Dyn, CatchPoint, 1000 Eyes, Jira, etc.
• Data source & product/property agnostic: collect data any way/where – then visualize & share w/ Octo
• Imperfect – which was the plan all along
19. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Octo Is Not
• Monitoring tool or alerting framework
• Replacement for [insert monitoring solution here]
• Troubleshooting cure-all
• Data analytics warehouse
• Auto-remediation framework
• Application profiler / testbench
• Asset management system
• Runtime dependency
• Log processor/aggregator/analytics
• Big data (map/reduce) platform
• Reporting platform (canned or ad-hoc)
20. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Octo Use Cases
• General Golden Signal Awareness: Business KPIs, Availability, RPS, Latency, Errors, Consumption
Every SaaS engineer (Dev, SRE, NOC, CS) should know peak weekday FE RPS – and be aware of
(and responsive to) any WoW divergence
• Troubleshooting Accelerator: quickly narrow down outage scope, identify next step/s & escalation PoC/s
”Back End” dashboard shows sharp DB read/write RPS plunges; anyone can infer something amiss at DB tier (net? sys? sw?);
pivot to specialized tool/s to discover precise failure RC/s
“SaaS Cloud Web FE” dashboard shows sharp increase in LB 4xx error RPS, sharp drop in LB 2xx OK RPS, drop to 0 RPS in HTTPD RPS;
pivot to LB forensics & log analysis to identify IP whitelist RC
• ChM Validation, Incremental Rollout A|B Test: ethereal/point-in-time dashboarding
NetEng, SvcOps & GNC monitor one dashboard showing LB VIPs/throughput/RPS, app RPS/latency, business KPIs,
APM availability, etc. to confirm GBIC swap successful
Time shift (HoH, DoD, WoW, MoM) allows quick analysis of impact/delta for new code/infra/etc.
21. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Octo Use Cases (cont’d)
• Presentations
• Product usage awareness (features)
• Pre-emptive capacity/scaling strategic planning (optimize or add capacity)
• Reactive troubleshooting (post-outage start)
• Slow degradation catch
• Corpus of “detailed” data for after-the-fact analysis (not necessarily on permanent graphs/dashboards)
• Per-customer usage/performance/availability
23. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Octo Today
The Good…
• 8 “customers” onboarded; steady trickle of inbound activation requests; live in 5 PoPs (NorAm & EMEA)
• FE API latency dropping as RPS increases (~35ms at 50th %ile)
• “Shoot me the Octo graph” & “Put it on the Octo dash” heard more & more often in meetings/hallways
• Octo acquiring/retaining real estate on Oracle SaaS Cloud NOC “big board” & on hallway TVs
• Bi-weekly releases live to site, mostly error free
• Continued investments in o4o (“Octo for Octo”) – separate parallel ecosystem to customer Octo (thanks, Paul)
• Energized & ambitious team, collaborating closely with customers
The Imperfect…
• No customer self-onboarding; minimal onboarding automation; no read API; no DWH integration
• Time-shift CPU overhead driving investments in downsampling via CQ (near term) and Kapacitor (long term)
• Code and config living together in sin; general config messiness; documentation gaps
• Still in PIPD phase (evolutionary precursor to CICD)
24. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Why Influx[DB|Data]? (Non-Exhaustive Un-Sorted List)
1. Purpose built for our use case (streaming metrics)
2. O/S with strong/focused/visionary company & active community behind it (ala Elastic, Mongo, etc.)
3. Seamless integration with leading visualization & alerting/messaging platforms
4. So intuitive/easy even a VP can use it
5. TICK stack simplifies key considerations around collection/ingest & aggregation/downsampling
6. Horizontal scaling via TSDB clustering when needed
7. Not jerks; easy to work/collab with; have seen almost everything; very responsive; super patient *
8. Givers of sage advice ☛ Paul Dix: “Don’t monitor your monitoring system with your monitoring system”
9. See #1
25. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Questions?