Insights: the definitive talk - William Leese

•

0 likes•147 views

Creating great software just doesn’t cut it anymore. Nowadays you’re responsible for running it. When things go wrong, you’re the one pouring through all the data to find the fault. In this talk, we’ll take a step back, look at all these silos of data and gain an understanding of why it all came to be. We’ll advise on how to use these systems to understand what’s going on in production. This presentation was given during the the Spaces Summit, an internal IT conference by and for the engineers of bol.com.

Technology

Introduction
TAM Middleware -> SRT Retail -> SRT Platform / TES
Pretends to be British, but 日本語で話そう

Introduction TES
Monitoring & Alerting Thruk, Nagios, Opsgenie & Grafana
Logging ELK Stack + lot’s of Glue
Metrics Diamond, Statsd, Graphite
Dashboarding Grafana, Kibana, Lookingglass
All the other stuff
The Platform

$Metric Log [ 1495735876, sys.cpu.idle_percent, 90 ] { "line": "29", "class": "org.eclipse.jetty.EchoFormServlet", "source_host": "localhost", "thread_name": "qtp513694835-14", "message": "Got request from 0:0:0:0:0:0:0", "@timestamp": "1495735876", "level": "INFO" } 3,6 TB, 4 Machines 63 TB, 18 Machines$

Context
- Knowledge
- ... inner working of micro services
- ... broad picture of the landscape
- … specialised in logistics, finance, sales, etc.

Context
- Responsible for
- ... a few micro services
- ... an entire Space
- ... all bol.com network / compute / storage infrastructure

Continuity
- Standby shifts spread over SRT and scrum team
- Teams are dynamic, knowledge is always not retained

Monitoring Check Count
Scrum team (Team1b): 714
Engineer on Duty Middleware: 29.770
Multiply by N amount of environments

Symptoms are great SLI’s
Source: Google Site Reliability Engineering

Golden Signals
Source: Google Site Reliability Engineering
Latency
- Time it takes to service a request. Differentiating between failed and
successful requests
Traffic
- How much demand is placed on your service
Errors
- Rate of Failures
Saturation
- How “full” is your service

Creating Alerts - Best Practices
Great Alerts Are:
- Simple
- Urgent
- Actionable
- Require human intervention

Life After the Alert - Some advice
Meta:
- Mitigate first
- Strict time period before escalation
Specifics:
- Dashboard with SLI’s
- Link to other dashboards for drill down
- Logs for the audit trail
- Use $tool for more

Future of Monitoring at bol.com
- Separation of Alerting and Monitoring
- Improve Self Service Monitoring
- Focus on Metric Based Alerting
- Distributed Tracing

Similar to Insights: the definitive talk - William Leese

Monitoring microservices platform

Boyan Dimitrov

Network Automation with Salt and NAPALM: a self-resilient network

Cloudflare

Swift distributed tracing method and tools v2

zhang hua

Streaming meetup

karthik_krk

제3회난공불락 오픈소스 인프라세미나 - MySQL Performance

Tommy Lee

Exactpro: Non-functional testing approach

Iosif Itkin

Productionizing Machine Learning with a Microservices Architecture

Databricks

SolarWinds Scalability for the Enterprise

SolarWinds

Taking Splunk to the Next Level - Manager

Splunk

Splunk App for Stream

Splunk

Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible. In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."

SREcon 2016 Performance Checklists for SREs

Brendan Gregg

Splunk App for Stream for Enhanced Operational Intelligence from Wire Data

Splunk

Troubleshooting Skype for Business

Shane Hoey

Minimizing customer impact is a key feature in successfully rolling out frequent code updates. Learn how to leverage the AWS cloud so you can minimize bug impacts, test your services in isolation with canary data, and easily roll back changes. Learn to love deployments, not fear them, with a blue/green architecture model. This talk walks you through the reasons it works for us and how we set up our AWS infrastructure, including package repositories, Elastic Load Balancing load balancers, Auto Scaling groups, internal tools, and more to help orchestrate the process. Learn to view thousands of servers as resources at your command to help improve your engineering environment, take bigger risks, and not spend weekends firefighting bad deployments.

(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...

Amazon Web Services

Learn how to monitor your database performance closely and troubleshoot database issues quickly using a variety of features provided by Amazon RDS and MySQL including database events, logs, and engine-specific features. You also learn about the security best practices to use with Amazon RDS for MySQL. In addition, you learn about how to effectively move data between Amazon RDS and on-premises instances. Lastly, you learn the latest about MySQL 5.6 and how you can take advantage of its newest features with Amazon RDS.

Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...

Amazon Web Services

Getting Started with Splunk Enterprise

Splunk

Walking Through Cloud Serving at Yahoo!

Yahoo Developer Network

Performance Optimization in Large Systems - Cusec 2019

Pierre-Luc Maheu

Application Performance doesn't come easy. How to find the root cause of performance issues in modern and complex applications? All you have is a complaining user to start with? In this presentation (mainly in German, but understandable for english speakers) I'd reprised the fundamentals of trouble shooting and have some new examples on how to tackle issues. Follow up presentation to "Performance Trouble Shooting 101 - Schweine, Schlangen und Papierschnitte"

Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...

rschuppe

2013 - SVCC - Intuit continuous performance testing

Thirugnanam Subbiah

Similar to Insights: the definitive talk - William Leese (20)

Monitoring microservices platform

Network Automation with Salt and NAPALM: a self-resilient network

Swift distributed tracing method and tools v2

Streaming meetup

제3회난공불락 오픈소스 인프라세미나 - MySQL Performance

Exactpro: Non-functional testing approach

Productionizing Machine Learning with a Microservices Architecture

SolarWinds Scalability for the Enterprise

Taking Splunk to the Next Level - Manager

Splunk App for Stream

SREcon 2016 Performance Checklists for SREs

Splunk App for Stream for Enhanced Operational Intelligence from Wire Data

Troubleshooting Skype for Business

(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...

Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...

Getting Started with Splunk Enterprise

Walking Through Cloud Serving at Yahoo!

Performance Optimization in Large Systems - Cusec 2019

Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...

2013 - SVCC - Intuit continuous performance testing

More from Bol.com Techlab

Speaker: Remco Overdijk Genre & level: Backend, Way of working, Medior Familiar tools like Statsd, Graphite, Nagios, etc. are no longer used in the Cloud, meaning we’ve hitched a new ride: Prometheus, and it’s all about Metrics! “A Metric, The Hitchhiker’s Guide to Prometheus says, is about the most massively useful thing someone doing Monitoring can have. It has great practical value. You can wave your Metric in emergencies as a distress signal, and produce pretty Graphs at the same time.” Don’t Panic, this talk is not about deploying Prometheus, Kubernetes or Vogon Poetry, but all about YOU! How exactly would that work, using metrics for monitoring purposes? Is it really that different from having separate stacks? Can I export 42 as a Metric? How do I migrate from Statsd/Nagios to this new world? What do I do when metrics seem to be insufficient to monitor something? Like a Babel Fish, this talk translates your questions into hands-on tips and tricks on working with Prometheus. Not only for the cloud, but all applications/services in general.

The hitchhiker’s guide to Prometheus

Bol.com Techlab

Speaker: Cas Plattel Genre & level: Frontend, Backend, Medior Ever found cucumber tests requiring too much boilerplate and JUnit tests a bit too low level? Spock is a testing and specification framework for applications. It has an expressive syntax to write your tests in and can be used to unit test, integration test or even test your frontend using an additional layer called Geb. We’ve been using both JUnit for unit testing and Cucumber for component and integration testing but both didn’t really satisfy our wishes. Spock seems to be the positioning itself in the middle ground where we can both write simple unit tests and still describe high-level behavioral flows. This talk will elaborate on the Spock framework, some pro and cons. Spock is not a silver bullet but did turn out to help us due to, for instance, less false positive failing tests and a lot less boilerplate. The presentation will contain a short demo showing test runs.

Test long and prosper

Bol.com Techlab

Speaker: Wyko Rijnsburger Genre & level: Backend, Medior Reactive programming: everybody seems to be talking about it, but there has been little real-world utilization. This is a shame, since a reactive approach can significantly improve both the quality of your codebase and the performance of your application. At Team 52, we started building a back end for our mobile apps using a reactive architecture about 1.5 years ago. Along the way, we discovered how beautiful Reactive applications can be, but also how small mistakes can lead to huge issues in a way that is not the case with traditional applications. In this talk, I will guide you through the often-messy but ultimately successful process of developing these applications. I will discuss how the Reactive approach changed the way we write code, how we used it to optimize concurrency and performance, but also how it leads to some embarrassing bugs. Finally, I will show you how you can start building a stable, high performing reactive application right now.

The Reactive Rollercoaster

Bol.com Techlab

Speaker: Nikola Lucic Genre & level: Backend, Junior “Wow! Do we really need all this code?!” - anyone looking at Java code Yep... so let’s not do that again! Introducing to you, the relief to your java headache: Kotlin. Kotlin makes your code concise and beautiful. Never have a NullPointerException again, and use collections as they are meant the be used! All that while still running your code on JVM! This, and more, will convince you to use Kotlin, in this live coding session.

Best painkiller for Java headache

Bol.com Techlab

Speaker: Mary Gouseti Genre & level: Way of working, Junior Do you also have an insatiable hunger to attend conferences? It’s the best place to meet people, exchange ideas, get inspired! There are conferences for everyone and any topic. Let me take you on an amazing trip, organizing a conference in 80, ok let’s make it 90, days! People travelled around the world in less time, it should be possible. Would you be up for the challenge?

Organizing a conference in 80 days

Bol.com Techlab

Three steps to untangle data traffic jams

Bol.com Techlab

Speaker: Lex van Roon Genre & level: Backend, Medior Digital archaeology is fun! Especially if you can do it from the comfort of your laptop and without the prohibitive costs of the original hardware. Using emulators, you are able to learn about techniques and methodologies that you will likely never run into anymore in your day job, which can give you a new perspective on modern day IT technology. Ancient OS’s are much simpler then modern-day OS’s, which gives you the possibility to learn how OS’s work conceptually. You will also learn how to perform troubleshooting using only the basic system utilities. Various techniques to build your own emulator will be discussed and to top it off, I will distribute pre-built images for various OS’s which you can run on your laptop.

Understanding Operating Systems by breaking them

Bol.com Techlab

How to train your dragon

Bol.com Techlab

The hitchhiker’s guide to Prometheus

Bol.com Techlab

Speaker: Mattijs Meiboom Genre & level: Backend, Junior My local sports club gave me my all-time favorite pet project. Soccer, beer and code … what more could I ask for? I’ll explain how I reverse engineered the communications protocol of a smart draft system and replaced the software, touching on subjects such as beer tap security, draft concurrency and integrating with Google Cloud Messaging. The most ridiculously over-engineered piece of software for drafting a cold beer.

Software for drafting a cold beer

Bol.com Techlab

Speaker: Maarten Dirkse Genre & level: Way of working, Medior Can you handle the cloud the same way as you would handle our on-premise datacenter? The biggest issue is the mindset when using the cloud. This is completely different from the one you need when you’re ‘on the moon’ in our datacenter! Not only for developers, but even more so for classic operations-people. Let me make your road to the cloud less bumpy!

Going to the cloud: Forget EVERYTHING you know!

Bol.com Techlab

Speaker: Evelyn Grooten Genre & level: Way of working, Junior Have you ever lost yourself in making a PowerPoint presentation pixel perfect, only to find out that what you just spent two hours on, is cut out of the final presentation because the story was not quite right yet? Even when you work in IT, sometimes you need to get a story across, and convince people. I will walk you through the way I built my presentation using (paper) prototype, getting feedback early and spending time on the must haves before embellishments.

How to create your presentation in an iterative way

Bol.com Techlab

Wax on, wax off

Bol.com Techlab

Jupyter and Pandas to the rescue!

Bol.com Techlab

Speaker: Jorien Brangert Genre & level: Way of working, Medior Ever been assigned to a business feature that was completely designed and thought out beforehand, without your involvement, and you didn’t completely agree? What if you could be part of the idea for a new business feature from the start? From idea to production, including the design process! At Shopping Innovation & UX Design we’re doing just that! Found out how.

How the best of Design and Development come together

Bol.com Techlab

Speakers: Jason Compier & Paul van der Bles Genre & level: Frontend, Backend, Way of working, Junior Imagine this: a motivated team member who is eager to execute the grand ideas you don’t have enough time for. This dream is well within reach. Introducing: the IT intern. You bring the ideas for side projects, college students need a purpose for their internship, have the time and motivation to work on these projects. It’s a perfect match!

The addition to your team you never knew you needed

Bol.com Techlab

Speakers: Volkan Yazici & Ivan Budincevic Genre & level: Way of working, Junior What if we told you that as humanity we have been blind all these centuries and it is the first time in history we are really about to open our eyes to the universe? Ever since mankind started wondering about the universe, we used the same tool to investigate it: light. However, in 2017, the Nobel prize in physics was awarded to scientists which gave humanity a completely new tool for studying the universe: gravitational waves. Even though these waves were first predicted back in 1905 by Henri Poincare and subsequently by Albert Einstein in 1916, it took a century to finally detect and conduct experiments proving the existence of such waves. In this talk, we will introduce you to gravitational waves in a simple and fun way. Our goal is to try and get you as excited as we are about this revolutionary new tool for studying the universe.

Gravitational waves: A new era in astronomy

Bol.com Techlab

Speakers: Nithya & Stephan Genre & level: Backend, Way of working, Medior Are you running into interface problems in our increasingly microservices based landscape? Is updating your APIs a hassle? Are your consumers breaking because of your API changes? Let me introduce contract testing as a part of the solution for these problems. Contract testing is a relatively new kid on the testing block. In which we let consumers define contracts for our services so that we can confidently change our APIS when we want to, and if we introduce a breaking change we will know who is affected. This talk will focus on the practical implementation of contract testing. We will write some code examples in both Pact and Spring Cloud Contract so you can make an informed choice about which framework you want to use if you decide to implement contract testing.

Consumer Driven Contract Testing

Bol.com Techlab

Speaker: Carst Tankink Genre & level: Backend, Junior At bol.com, we like things fast: same-day delivery, rapid innovation cycles, and of course, a fast site. But what if the site is not fast? Or your service? How do we figure out what our performance bottleneck is when the alerts start going off and managers breathe down our necks? How do we interpret the numbers our tools throw at us? I have had the pleasure of analyzing a number of performance issues, ranging from the obvious to the plain weird. All these analyses required their own tools and interpretations, but they all shared a similar, empirical approach: instead of randomly hacking away, I can find the bottleneck faster by iterating over a few simple steps.In this talk, I will explain this process, illustrated by practical case studies of performance issues in the web shop. Afterwards, you should be able to go fast in making your application fast again.

I want to go fast! - Exposing performance bottlenecks

Bol.com Techlab

Kubernetes: love at first sight?

Bol.com Techlab

More from Bol.com Techlab (20)

The hitchhiker’s guide to Prometheus

Test long and prosper

The Reactive Rollercoaster

Best painkiller for Java headache

Organizing a conference in 80 days

Three steps to untangle data traffic jams

Understanding Operating Systems by breaking them

How to train your dragon

The hitchhiker’s guide to Prometheus

Software for drafting a cold beer

Going to the cloud: Forget EVERYTHING you know!

How to create your presentation in an iterative way

Wax on, wax off

Jupyter and Pandas to the rescue!

How the best of Design and Development come together

The addition to your team you never knew you needed

Gravitational waves: A new era in astronomy

Consumer Driven Contract Testing

I want to go fast! - Exposing performance bottlenecks

Kubernetes: love at first sight?

Recently uploaded

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

BooK Now Call us at +918448380779 to hire a gorgeous and seductive call girl for sex. Take a Delhi Escort Service. The help of our escort agency is mostly meant for men who want sexual Indian Escorts In Delhi NCR. It should be noted that any impersonator will get 100 attention from our Young Girls Escorts in Delhi. They will assume the position of reliable allies. VIP Call Girl With Original Photos Book Tonight +918448380779 Our Cheap Price 1 Hour not available 2 Hours 5000 Full Night 8000 TAG: Call Girls in Delhi, Noida, Gurgaon, Ghaziabad, Connaught Place, Greater Kailash Delhi, Lajpat Nagar Delhi, Mayur Vihar Delhi, Chanakyapuri Delhi, New Friends Colony Delhi, Majnu Ka Tilla, Karol Bagh, Malviya Nagar, Saket, Khan Market, Noida Sector 18, Noida Sector 76, Noida Sector 51, Gurgaon Mg Road, Iffco Chowk Gurgaon, Rajiv Chowk Gurgaon All Delhi Ncr Free Home Deliver

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Delhi Call girls

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

Advantages of Hiring UIUX Design Service Providers for Your Business

Pixlogix Infotech

Handwritten Text Recognition for manuscripts and early printed texts

Maria Levchenko

How to convert PDF to text with Nanonets

naman860154

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

Presentation on how to chat with PDF using ChatGPT code interpreter

naman860154

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Safe Software

Finology Group – Insurtech Innovation Award 2024

The Digital Insurer

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Delhi Call girls

The Raspberry Pi 5 was announced on October 2023. This new version of the popular embedded device comes with a new iteration of Broadcom’s VideoCore GPU platform, and was released with a fully open source driver stack, developed by Igalia. The presentation will discuss some of the major changes required to support this new Video Core iteration, the challenges we faced in the process and the solutions we provided in order to deliver conformant OpenGL ES and Vulkan drivers. The talk will also cover the next steps for the open source Raspberry Pi 5 graphics stack. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://eoss24.sched.com/event/1aBEx

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Igalia

Scaling API-first – The story of a global engineering organization

Radu Cotescu

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

What are drone anti-jamming systems? The drone anti-jamming systems and anti-spoof technology protect against interference, jamming, and spoofing of the UAVs. To protect their security, countries are beginning to research drone anti-jamming systems, also known as drone strike weapons. The anti-jam and anti-spoof technology protects against interference, jamming and spoofing. A drone strike weapon is a drone attack weapon that can attack and destroy enemy drones. So what is so unique about this amazing system?

What Are The Drone Anti-jamming Systems Technology?

Antenna Manufacturer Coco

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Neo4j

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

Choosing the right accounts payable services provider is a strategic decision that can significantly impact your business's financial performance and operational efficiency. By considering factors such as expertise, range of services, technology infrastructure, scalability, cost, and reputation, businesses can make informed decisions and select a provider that aligns with their unique needs and objectives. Partnering with the right provider can streamline accounts payable processes, drive cost savings, and position your business for long-term success. https://katprotech.com/accounts-payable-and-purchase-order-automation/

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Katpro Technologies

In an era where artificial intelligence (AI) stands at the forefront of business innovation, Information Architecture (IA) is at the core of functionality. See “There’s No AI Without IA” – (from 2016 but even more relevant today) Understanding and leveraging how Information Architecture (IA) supports AI synergies between knowledge engineering and prompt engineering is critical for senior leaders looking to successfully deploy AI for internal and externally facing knowledge processes. This webinar be a high-level overview of the methodologies that can elevate AI-driven knowledge processes supporting both employees and customers. Core Insights Include: Strategic Knowledge Engineering: Delve into how structuring AI's knowledge base is required to prevent hallucinations, enable contextual retrieval of accurate information. This will include discussion of gold standard libraries of use cases support testing various LLMs and structures and configurations of knowledge base. Precision in Prompt Engineering: Learn the art of crafting prompts that direct AI to deliver targeted, relevant responses, thereby optimizing customer experiences and business outcomes. Unified Approach for Enhanced AI Performance: Explore the intersection of knowledge and prompt engineering to develop AI systems that are not only more responsive but also aligned with overarching business strategies. Guiding Principles for Implementation: Equip yourself with best practices, ethical guidelines, and strategic considerations for embedding these technologies into your business ecosystem effectively. This webinar is designed to empower business and technology leaders with the knowledge to harness the full potential of AI, ensuring their organizations not only keep pace with digital transformation but lead the charge. Join us to map a roadmap to fully leverage Information Architecture (IA) and AI chart a course towards a future where AI is a key pillar of strategic innovation and business success.

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Earley Information Science

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Advantages of Hiring UIUX Design Service Providers for Your Business

Handwritten Text Recognition for manuscripts and early printed texts

How to convert PDF to text with Nanonets

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Presentation on how to chat with PDF using ChatGPT code interpreter

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Finology Group – Insurtech Innovation Award 2024

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Scaling API-first – The story of a global engineering organization

Boost PC performance: How more available memory can improve productivity

Axa Assurance Maroc - Insurer Innovation Award 2024

What Are The Drone Anti-jamming Systems Technology?

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Insights: the definitive talk - William Leese

1. Insights The Definitive Talk

2. Introduction TAM Middleware -> SRT Retail -> SRT Platform / TES Pretends to be British, but 日本語で話そう

3. Introduction TES Monitoring & Alerting Thruk, Nagios, Opsgenie & Grafana Logging ELK Stack + lot’s of Glue Metrics Diamond, Statsd, Graphite Dashboarding Grafana, Kibana, Lookingglass All the other stuff The Platform

4. Q&A

6. Black & White-box Monitoring

7. Functional & Technical Monitoring

8. Metric Log [ 1495735876, sys.cpu.idle_percent, 90 ] { "line": "29", "class": "org.eclipse.jetty.EchoFormServlet", "source_host": "localhost", "thread_name": "qtp513694835-14", "message": "Got request from 0:0:0:0:0:0:0", "@timestamp": "1495735876", "level": "INFO" } 3,6 TB, 4 Machines 63 TB, 18 Machines

9. Healthy Selfdiagnose page - Oh My..

10. Service Level Indicator

11. Service Level Objective

12. Bol.com Context

13. Context - Knowledge - ... inner working of micro services - ... broad picture of the landscape - … specialised in logistics, finance, sales, etc.

14. Context - Responsible for - ... a few micro services - ... an entire Space - ... all bol.com network / compute / storage infrastructure

15. Continuity - Standby shifts spread over SRT and scrum team - Teams are dynamic, knowledge is always not retained

16. Alerting Gone Wrong

17. Monitoring Check Count Scrum team (Team1b): 714 Engineer on Duty Middleware: 29.770 Multiply by N amount of environments

18. 1 day’s worth of alerts

19. Creating Alerts Best Practices

20. Symptoms are great SLI’s Source: Google Site Reliability Engineering

21. Golden Signals Source: Google Site Reliability Engineering Latency - Time it takes to service a request. Differentiating between failed and successful requests Traffic - How much demand is placed on your service Errors - Rate of Failures Saturation - How “full” is your service

22. Creating Alerts - Best Practices Great Alerts Are: - Simple - Urgent - Actionable - Require human intervention

23. Life after the Alert

24. Life After the Alert - Some advice Meta: - Mitigate first - Strict time period before escalation Specifics: - Dashboard with SLI’s - Link to other dashboards for drill down - Logs for the audit trail - Use $tool for more

25. Future of Insights at bol.com

26. Future of Monitoring at bol.com - Separation of Alerting and Monitoring - Improve Self Service Monitoring - Focus on Metric Based Alerting - Distributed Tracing

27. Q&A

Insights: the definitive talk - William Leese

Recommended

Recommended

More Related Content

Similar to Insights: the definitive talk - William Leese

Similar to Insights: the definitive talk - William Leese (20)

More from Bol.com Techlab

More from Bol.com Techlab (20)

Recently uploaded

Recently uploaded (20)

Insights: the definitive talk - William Leese