The document discusses an automatic operation bot for Ceph clusters at eBay. It describes monitoring the clusters with tools like Prometheus and node-exporter. When issues arise, over 95% are due to failed disks, which the bot would automatically remove and replace. It would record failures and perform remediation steps like removing OSDs, offline disks, and lighting indicator LEDs. The bot design includes components for monitoring, alerting, a task queue, and executor to automate common operations and reduce manual work.
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu ChaiCeph Community
Cephalocon APAC 2018
March 22-23, 2018 - Beijing, China
Greg Farnum,Red Hat RADOS Core Developer
Josh Durgin, Red Hat RADOS Lead
Kefu Chai, Red Hat Senior Software Engineer
OpenStack and Ceph case study at the University of AlabamaKamesh Pemmaraju
The University of Alabama at Birmingham gives scientists and researchers a massive, on-demand, virtual storage cloud using OpenStack and Ceph for less than $0.41 per gigabyte. This is a session at the OpenStack summit given by Kamesh Pemmaraju at Dell and John Paul at University of Alabama. This will detail how the university IT staff deployed a private storage cloud infrastructure using the Dell OpenStack cloud solution with Dell servers, storage, networking and OpenStack, and Inktank Ceph. After assessing a number of traditional storage scenarios, the University partnered with Dell and Inktank to architect a centralized cloud storage platform that was capable of scaling seamlessly and rapidly, was cost-effective, and that could leverage a single hardware infrastructure for the OpenStack compute and storage environment.
Pulsar is used by a portfolio of products at Splunk for stream processing of different types of data, including metrics and logs. In this talk, Karthik Ramasamy will share how Splunk helped a flagship customer scale a Pulsar deployment to handle 10 PB/day in a single cluster. He will talk about the journey, the challenges faced, and the trade-offs made to scale Pulsar and operate it reliably and stably in Google Cloud Platform (GCP).
OSv presentation from Linux Foundation Collaboration SummitDon Marti
OSv is a lightweight operating system designed to improve performance and administration for applications deployed in the cloud. Learn about the speed and manageability wins from a brand-new OS that works on your private or public cloud.
Best Practices & Performance Tuning - OpenStack Cloud Storage with Ceph - In this presentation, we discuss best practices and performance tuning for OpenStack cloud storage with Ceph to achieve high availability, durability, reliability and scalability at any point of time. Also discuss best practices for failure domain, recovery, rebalancing, backfilling, scrubbing, deep-scrubbing and operations
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu ChaiCeph Community
Cephalocon APAC 2018
March 22-23, 2018 - Beijing, China
Greg Farnum,Red Hat RADOS Core Developer
Josh Durgin, Red Hat RADOS Lead
Kefu Chai, Red Hat Senior Software Engineer
OpenStack and Ceph case study at the University of AlabamaKamesh Pemmaraju
The University of Alabama at Birmingham gives scientists and researchers a massive, on-demand, virtual storage cloud using OpenStack and Ceph for less than $0.41 per gigabyte. This is a session at the OpenStack summit given by Kamesh Pemmaraju at Dell and John Paul at University of Alabama. This will detail how the university IT staff deployed a private storage cloud infrastructure using the Dell OpenStack cloud solution with Dell servers, storage, networking and OpenStack, and Inktank Ceph. After assessing a number of traditional storage scenarios, the University partnered with Dell and Inktank to architect a centralized cloud storage platform that was capable of scaling seamlessly and rapidly, was cost-effective, and that could leverage a single hardware infrastructure for the OpenStack compute and storage environment.
Pulsar is used by a portfolio of products at Splunk for stream processing of different types of data, including metrics and logs. In this talk, Karthik Ramasamy will share how Splunk helped a flagship customer scale a Pulsar deployment to handle 10 PB/day in a single cluster. He will talk about the journey, the challenges faced, and the trade-offs made to scale Pulsar and operate it reliably and stably in Google Cloud Platform (GCP).
OSv presentation from Linux Foundation Collaboration SummitDon Marti
OSv is a lightweight operating system designed to improve performance and administration for applications deployed in the cloud. Learn about the speed and manageability wins from a brand-new OS that works on your private or public cloud.
Best Practices & Performance Tuning - OpenStack Cloud Storage with Ceph - In this presentation, we discuss best practices and performance tuning for OpenStack cloud storage with Ceph to achieve high availability, durability, reliability and scalability at any point of time. Also discuss best practices for failure domain, recovery, rebalancing, backfilling, scrubbing, deep-scrubbing and operations
Lightning talk showing various aspectos of software system performance. It goes through: latency, data structures, garbage collection, troubleshooting method like workload saturation method, quick diagnostic tools, famegraph and perfview
This is part 1 in a series of talks covering Padawan Monica Beckwith’s hands-on practical experience over the last two decades. Monica, who has trained with many Knights and a few Masters, will cover what it means to be sympathetic to the underlying hardware in Scaling Up and Scaling Out scenarios. In addition, she will share examples to put cloud performance into perspective.
Tips from Support: Always Carry a Towel and Don’t Panic!Perforce
What should you do if you think you’ve got a problem with Perforce Helix? In this session, understand what the common issues are, what to look for, where to find help and how our Support engineers can assist you.
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, CitrixThe Linux Foundation
Storage systems continue to deliver better performance year after year. High performance solutions are now available off-the-shelf, allowing users to boost their servers with drives capable of achieving several GB/s worth of throughput per host. To fully utilise such devices, workloads with large queue depths are often necessary. In virtual environments, this translates into aggregate workloads coming from multiple virtual machines.
Having previously addressed the impact of low latency devices in virtualised platforms, we are now aiming at optimising aggregate workloads. We will discuss the existing memory grant technologies available in Xen and compare trade-offs and performance implications of each: grant mapping, persistent grants and grant copy. For the first time, we will present grant copy as an alternative and show measurements over 7 GB/s, maxing out a set of local SSDs.
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
Tuning your EC2 web server will help you to improve application server throughput and cost-efficiency as well as reduce request latency. In this session we will walk through tactics to identify bottlenecks using tools such as CloudWatch in order to drive the appropriate allocation of EC2 and EBS resources. In addition, we will also be reviewing some performance optimizations and best practices for popular web servers such as Nginx and Apache in order to take advantage of the latest EC2 capabilities.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
7. Monitoring
&
Alerts
node-‐exporter
ceph
node
CPU Mem Disk Net
admin-‐sockets
megacli
iostat
ceph-‐exporter
pool
usage
cluster
reliability
cluster
availability
PG
status
ceph
admin
node
Prometheus
local
disk Prometheus
remote
disk
AlertManager Grafana
8. Remediation
• ~ 95%
warning/error
caused
by
bad
disks.
• ~3%
caused
by
host
down
• ~2%
other
errors,
such
as
network,
electric
power,
human
error
Percentage
Error Disk Host Down net/power/human error
9. • Ceph
in
eBay
• Ops
for
Ceph
Cluster
• Automatic
opertion
Bot
10. • Handbook
for
operation
• Interactive
chat
bot
• Automatic
operation
bot
How
to
build
a
automatic
bot?
11. ceph
pg
dump
|
grep
inconsistent
1 2
pg
id
3
osd.1 osd.2 osd.3
4
5
6
Swap
Disk
$
remove
OSD.2
from
ceph
$
Offline
disk
from
HW
$
Light
LED
on
the
Disk.
12. Write
operations
on
the
doc.
• Record
every
failure
of
disks/node
and
related
docs.
13. • Handbook
for
operation
• Interactive
chat
bot
• Automatic
operation
bot
How
to
build
a
automatic
bot?
14. Monitoring
&
Alerts
node-‐exporter
ceph
node
CPU Mem Disk Net
admin-‐sockets
megacli
iostat
ceph-‐exporter
pool
usage
cluster
reliability
cluster
availability
PG
status
ceph
admin
node
Prometheus
local
disk
AlertManager
16. • Handbook
for
operation
• Interactive
chat
bot
• Automatic
operation
bot
How
to
build
a
automatic
bot?
17. node-‐exporter
ceph
node
CPU Mem Disk Net
admin-‐sockets
megacli
iostat
ceph-‐exporter
pool
usage
cluster
reliability
cluster
availability
PG
status
ceph
admin
node
Prometheus
local
disk
AlertManager
18. node-‐exporter
ceph
node
CPU Mem Disk Net
admin-‐sockets
megacli
iostat
ceph-‐exporter
pool
usage
cluster
reliability
cluster
availability
PG
status
ceph
admin
node
Prometheus
local
disk
AlertManager
Task
Queue
Executor