The document discusses various high availability and disaster recovery options for deploying Kubernetes clusters across multiple data centers. It begins by covering HA layers at the application, node, control plane, and data levels for a single cluster. It then examines options for multi-data center deployments including a GitOps approach with two synchronized clusters, a stretched cluster spanning three data centers, a stretched cluster with two data centers and an arbiter node, and a stretched cluster with DR provided by IaaS VM migration. Key criteria for choosing an option include the number of available data centers, latency between sites, and data access requirements for applications. A three site stretched cluster is typically the preferred model when feasible.
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
Ceph is designed around the assumption that all components of the system (disks, hosts, networks) can fail, and has traditionally leveraged replication to provide data durability and reliability. The CRUSH placement algorithm is used to allow failure domains to be defined across hosts, racks, rows, or datacenters, depending on the deployment scale and requirements.
Recent releases have added support for erasure coding, which can provide much higher data durability and lower storage overheads. However, in practice erasure codes have different performance characteristics than traditional replication and, under some workloads, come at some expense. At the same time, we have introduced a storage tiering infrastructure and cache pools that allow alternate hardware backends (like high-end flash) to be leveraged for active data sets while cold data are transparently migrated to slower backends. The combination of these two features enables a surprisingly broad range of new applications and deployment configurations.
This talk will cover a few Ceph fundamentals, discuss the new tiering and erasure coding features, and then discuss a variety of ways that the new capabilities can be leveraged.
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
Together with my colleagues at Red Hat Storage Team, i am very proud to have worked on this reference architecture for Ceph Object Storage.
If you are building Ceph object storage at scale, this document is for you.
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
Ceph is designed around the assumption that all components of the system (disks, hosts, networks) can fail, and has traditionally leveraged replication to provide data durability and reliability. The CRUSH placement algorithm is used to allow failure domains to be defined across hosts, racks, rows, or datacenters, depending on the deployment scale and requirements.
Recent releases have added support for erasure coding, which can provide much higher data durability and lower storage overheads. However, in practice erasure codes have different performance characteristics than traditional replication and, under some workloads, come at some expense. At the same time, we have introduced a storage tiering infrastructure and cache pools that allow alternate hardware backends (like high-end flash) to be leveraged for active data sets while cold data are transparently migrated to slower backends. The combination of these two features enables a surprisingly broad range of new applications and deployment configurations.
This talk will cover a few Ceph fundamentals, discuss the new tiering and erasure coding features, and then discuss a variety of ways that the new capabilities can be leveraged.
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
Together with my colleagues at Red Hat Storage Team, i am very proud to have worked on this reference architecture for Ceph Object Storage.
If you are building Ceph object storage at scale, this document is for you.
How To Monetise & Bill CloudStack - A Practical Open ApproachShapeBlue
This talk is for ISPs looking to bill CloudStack resources, and for software developers looking to build a billing solution around CloudStack. This talk looks at covering multiple business and technical use cases (for example: plans, catalogues, flexible billing, tiered offerings, account management, etc.) for running a public cloud and how the same can be achieved using CloudStack. It does not delve into any specific billing system but instead focuses on an open approach to how ACS features can be leveraged to implement billing and monetise CloudStack.
Shiv is the Co-Founder and CTO of IndiQus Technologies Pvt. Ltd. and a CloudStack user turned evangelist since 2013. He loves tinkering on CloudStack and the possibilities it offers. He has deployed multiple public and private clouds running CloudStack in the South Asian region and has also integrated legacy systems with CloudStack. He would love to share his experiences with like-minded professionals.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
Slides from #PromCon2018 Munich.
https://promcon.io/2018-munich/talks/thanos-prometheus-at-scale/
Bartłomiej Płotka
Fabian Reinartz
The Prometheus Monitoring system has been thriving for several years. Along with its powerful data model, operational simplicity and reliability have been a key factor in its success. However, some questions were still largely unaddressed to this day. How can we store historical data at the order of petabytes in a reliable and cost-efficient way? Can we do so without sacrificing responsive query times? And what about a global view of all our metrics and transparent handling of HA setups?
Thanos takes Prometheus' strong foundations and extends it into a clustered, yet coordination free, globally scalable metric system. It retains Prometheus's simple operational model and even simplifies deployments further. Under the hood, Thanos uses highly cost-efficient object storage that's available in virtually all environments today. By building directly on top of the storage format introduced with Prometheus 2.0, Thanos achieves near real-time responsiveness even for cold queries against historical data. All while having virtually no cost overhead beyond that of the underlying object storage.
We will show the theoretical concepts behind Thanos and demonstrate how it seamlessly integrates into existing Prometheus setups.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Service Function Chaining in Openstack NeutronMichelle Holley
Service Function Chaining (SFC) uses software-defined networking (SDN) capabilities to create a service chain of connected network services (such as L4-7 like firewalls,
network address translation [NAT], intrusion protection) and connect them in a virtual chain. This capability can be used by network operators to set up suites or catalogs
of connected services that enable the use of a single network connection for many services, with different characteristics.
networking-sfc is a service plugin of Openstack neutron. The talk will go over the architecture, implementation, use-cases and latest enhancements to networking-sfc (the APIs and implementation to support service function chaining in neutron).
About the speaker: Farhad Sunavala is currently a principal architect/engineer working on Network Virtualization, Cloud service, and SDN technologies at Huawei Technology USA. He has led several wireless projects in Huawei including virtual EPC, service function chaining, etc. Prior to Huawei, he worked 17 years at Cisco. Farhad received his MS in Electrical and Computer Engineering from University of New Hampshire. His expertise includes L2/L3/L4 networking, Network Virtualization, SDN, Cloud Computing, and
mobile wireless networks. He holds several patents in platforms, virtualization, wireless, service-chaining and cloud computing. Farhad was a core member of networking-sfc.
This presentation covers the basics about OpenvSwitch and its components. OpenvSwitch is a Open Source implementation of OpenFlow by the Nicira team.
It also also talks about OpenvSwitch and its role in OpenStack Networking
In this session, Diógenes gives an introduction of the basic concepts that make OpenShift, giving special attention to its relationship with Linux containers and Kubernetes.
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
In this presentation, i have explained how Ceph Object Storage Performance can be improved drastically together with some object storage best practices, recommendations tips. I have also covered Ceph Shared Data Lake which is getting very popular.
Five common customer use cases for Virtual SAN - VMworld US / 2015Duncan Epping
This session was presented by Lee Dilworth and Duncan Epping at VMworld in the US in 2015. Five common customer use cases of the last 12-18 months are discussed in this deck.
OpenShift is Red Hat's Platform-as-a-Service (PaaS) that lets developers quickly develop, host, and scale Docker container-based applications. OpenShift enables a uniform and standardised approach to container management across all hosting options including AWS/EC2 and other private/public cloud and on/off-premise variants. At this session, you will learn how Red Hat's enterprise clients are using OpenShift to enable their digital transformation initiatives. Examples will cover how realising a hybrid cloud strategy can simplify and reduce the risk of migrating and transitioning application workloads to containers in the cloud.
Alex Smith, Solutions Architect, Amazon Web Services, ASEAN
Stephen Bylo, Senior Solution Architect, Red Hat Asia Pacific Pte Ltd
Ceph scale testing with 10 Billion ObjectsKaran Singh
In this performance testing, we ingested 10 Billion objects into the Ceph Object Storage system and measured its performance. We have observed deterministic performance, check out this presentation to know the details.
CRUSH is the powerful, highly configurable algorithm Red Hat Ceph Storage uses to determine how data is stored across the many servers in a cluster. A healthy Red Hat Ceph Storage deployment depends on a properly configured CRUSH map. In this session, we will review the Red Hat Ceph Storage architecture and explain the purpose of CRUSH. Using example CRUSH maps, we will show you what works and what does not, and explain why.
Presented at Red Hat Summit 2016-06-29.
Synadia/NATS Team Presentations for NATS Connect Live on April 16, 2020. To see the recorded event, go to our NATS YouTube Channel https://youtube.com/c/nats_messaging
Gaetano Borgione's presentation from the 2017 Open Networking Summit.
Networking is vital for cloud-native apps where distributed computing and development models require speed, simplicity, and scale for massive number of ephemeral containers. Two of the most prevalent container networking models are CNI and CNM for developers using Docker, Mesos, or Kubernetes. This session will present an overview of distributed development, how CNI and CNM models work, and how container frameworks use these models for networking. Gaetano will also discuss the additional functions users need to consider in the control plane and data plane to achieve operational scale and efficiency.
Introducing github.com/open-cluster-management – How to deliver apps across c...Michael Elder
Introducing Open Cluster Management, a community-driven project focused on multicluster and multicloud scenarios for Kubernetes apps. Open APIs are evolving within this project for cluster registration, work distribution, dynamic placement of policies and workloads and cluster and workload health management. In this session, Michael will introduce the project and demonstrate what you can do on OpenShift and Managed Kubernetes as a Service today from community operators on OperatorHub.io.
How To Monetise & Bill CloudStack - A Practical Open ApproachShapeBlue
This talk is for ISPs looking to bill CloudStack resources, and for software developers looking to build a billing solution around CloudStack. This talk looks at covering multiple business and technical use cases (for example: plans, catalogues, flexible billing, tiered offerings, account management, etc.) for running a public cloud and how the same can be achieved using CloudStack. It does not delve into any specific billing system but instead focuses on an open approach to how ACS features can be leveraged to implement billing and monetise CloudStack.
Shiv is the Co-Founder and CTO of IndiQus Technologies Pvt. Ltd. and a CloudStack user turned evangelist since 2013. He loves tinkering on CloudStack and the possibilities it offers. He has deployed multiple public and private clouds running CloudStack in the South Asian region and has also integrated legacy systems with CloudStack. He would love to share his experiences with like-minded professionals.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
Slides from #PromCon2018 Munich.
https://promcon.io/2018-munich/talks/thanos-prometheus-at-scale/
Bartłomiej Płotka
Fabian Reinartz
The Prometheus Monitoring system has been thriving for several years. Along with its powerful data model, operational simplicity and reliability have been a key factor in its success. However, some questions were still largely unaddressed to this day. How can we store historical data at the order of petabytes in a reliable and cost-efficient way? Can we do so without sacrificing responsive query times? And what about a global view of all our metrics and transparent handling of HA setups?
Thanos takes Prometheus' strong foundations and extends it into a clustered, yet coordination free, globally scalable metric system. It retains Prometheus's simple operational model and even simplifies deployments further. Under the hood, Thanos uses highly cost-efficient object storage that's available in virtually all environments today. By building directly on top of the storage format introduced with Prometheus 2.0, Thanos achieves near real-time responsiveness even for cold queries against historical data. All while having virtually no cost overhead beyond that of the underlying object storage.
We will show the theoretical concepts behind Thanos and demonstrate how it seamlessly integrates into existing Prometheus setups.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Service Function Chaining in Openstack NeutronMichelle Holley
Service Function Chaining (SFC) uses software-defined networking (SDN) capabilities to create a service chain of connected network services (such as L4-7 like firewalls,
network address translation [NAT], intrusion protection) and connect them in a virtual chain. This capability can be used by network operators to set up suites or catalogs
of connected services that enable the use of a single network connection for many services, with different characteristics.
networking-sfc is a service plugin of Openstack neutron. The talk will go over the architecture, implementation, use-cases and latest enhancements to networking-sfc (the APIs and implementation to support service function chaining in neutron).
About the speaker: Farhad Sunavala is currently a principal architect/engineer working on Network Virtualization, Cloud service, and SDN technologies at Huawei Technology USA. He has led several wireless projects in Huawei including virtual EPC, service function chaining, etc. Prior to Huawei, he worked 17 years at Cisco. Farhad received his MS in Electrical and Computer Engineering from University of New Hampshire. His expertise includes L2/L3/L4 networking, Network Virtualization, SDN, Cloud Computing, and
mobile wireless networks. He holds several patents in platforms, virtualization, wireless, service-chaining and cloud computing. Farhad was a core member of networking-sfc.
This presentation covers the basics about OpenvSwitch and its components. OpenvSwitch is a Open Source implementation of OpenFlow by the Nicira team.
It also also talks about OpenvSwitch and its role in OpenStack Networking
In this session, Diógenes gives an introduction of the basic concepts that make OpenShift, giving special attention to its relationship with Linux containers and Kubernetes.
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
In this presentation, i have explained how Ceph Object Storage Performance can be improved drastically together with some object storage best practices, recommendations tips. I have also covered Ceph Shared Data Lake which is getting very popular.
Five common customer use cases for Virtual SAN - VMworld US / 2015Duncan Epping
This session was presented by Lee Dilworth and Duncan Epping at VMworld in the US in 2015. Five common customer use cases of the last 12-18 months are discussed in this deck.
OpenShift is Red Hat's Platform-as-a-Service (PaaS) that lets developers quickly develop, host, and scale Docker container-based applications. OpenShift enables a uniform and standardised approach to container management across all hosting options including AWS/EC2 and other private/public cloud and on/off-premise variants. At this session, you will learn how Red Hat's enterprise clients are using OpenShift to enable their digital transformation initiatives. Examples will cover how realising a hybrid cloud strategy can simplify and reduce the risk of migrating and transitioning application workloads to containers in the cloud.
Alex Smith, Solutions Architect, Amazon Web Services, ASEAN
Stephen Bylo, Senior Solution Architect, Red Hat Asia Pacific Pte Ltd
Ceph scale testing with 10 Billion ObjectsKaran Singh
In this performance testing, we ingested 10 Billion objects into the Ceph Object Storage system and measured its performance. We have observed deterministic performance, check out this presentation to know the details.
CRUSH is the powerful, highly configurable algorithm Red Hat Ceph Storage uses to determine how data is stored across the many servers in a cluster. A healthy Red Hat Ceph Storage deployment depends on a properly configured CRUSH map. In this session, we will review the Red Hat Ceph Storage architecture and explain the purpose of CRUSH. Using example CRUSH maps, we will show you what works and what does not, and explain why.
Presented at Red Hat Summit 2016-06-29.
Synadia/NATS Team Presentations for NATS Connect Live on April 16, 2020. To see the recorded event, go to our NATS YouTube Channel https://youtube.com/c/nats_messaging
Gaetano Borgione's presentation from the 2017 Open Networking Summit.
Networking is vital for cloud-native apps where distributed computing and development models require speed, simplicity, and scale for massive number of ephemeral containers. Two of the most prevalent container networking models are CNI and CNM for developers using Docker, Mesos, or Kubernetes. This session will present an overview of distributed development, how CNI and CNM models work, and how container frameworks use these models for networking. Gaetano will also discuss the additional functions users need to consider in the control plane and data plane to achieve operational scale and efficiency.
Introducing github.com/open-cluster-management – How to deliver apps across c...Michael Elder
Introducing Open Cluster Management, a community-driven project focused on multicluster and multicloud scenarios for Kubernetes apps. Open APIs are evolving within this project for cluster registration, work distribution, dynamic placement of policies and workloads and cluster and workload health management. In this session, Michael will introduce the project and demonstrate what you can do on OpenShift and Managed Kubernetes as a Service today from community operators on OperatorHub.io.
[WSO2Con Asia 2018] Architecting for Container-native EnvironmentsWSO2
This slide deck explores architectural choices for making applications and integration services first class citizens in a container native environment.
Learn more: https://wso2.com/library/conference/2018/08/wso2con-asia-2018-architecting-for-container-native-environments/
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaSridhar Kumar N
https://www.youtube.com/playlist?list=PLAiEy9H6ItrKC5PbH7KiELiSEIKv3tuov
-What is Prometheus?
-Difference Between Nagios vs Prometheus
-Architecture
-Alertmanager
-Time series DB
-PromQL (Prometheus Query Language)
-Live Demo
-Grafana
Public call for Red Hat System z partners. Step through current Red Hat/IBM relationship, current System z campaigns, technical review of RHEL roadmap. Introduced several new customer references: (1) City Government of Recife, Brazil; (2) Fratelli Carli in Italy; (3) EDB in the Nordics; (4) Salt River Project in USA; (5) Bank of New Zealand.
Zero Downtime Architectures based on JEE platform. Almost every big enterprise with online business tries to design its applications in a way that they are always online. But is it also the case when we upgrade the database cluster? When we switch the whole data center? Based on a customer project we try to present common architecture principles that enable you to do all this without any service interruption and the most important: without any stress.
Pluggable Infrastructure with CI/CD and DockerBob Killen
The docker cluster ecosystem is still young, and highly modular. This presentation covers some of the challenges we faced deciding on what infrastructure to deploy, and a few tips and tricks in making both applications and infrastructure easily adaptable.
Red Hat multi-cluster management & what's new in OpenShiftKangaroot
More and more organisations are not only using container platforms but starting to run multiple clusters of containers. And with that comes new headaches of maintaining, securing, and updating those multiple clusters. In this session we'll look into how Red Hat has solved multi-cluster management, covering cluster lifecycle, app lifecycle, and governance/risk/compliance.
Introduction to containers, k8s, Microservices & Cloud NativeTerry Wang
Slides built to upskill and enable internal team and/or partners on foundational infra skills to work in a containerized world.
Topics covered
- Container / Containerization
- Docker
- k8s / container orchestration
- Microservices
- Service Mesh / Serverless
- Cloud Native (apps & infra)
- Relationship between Kubernetes and Runtime Fabric
Audiences: MuleSoft internal technical team, partners, Runtime Fabric users.
Reactive by example - at Reversim Summit 2015Eran Harel
Explaining the reactive manifesto by a real world case study.
This is a cool story about the evolution of our monitoring infrastructure. From the naive approach to a super resilient system.
How do we manage to handle 4M metrics / minute, and over 1K concurrent connections?
What strategies did we try to apply and where did it fail?
What are the techniques and technologies we use in order to achieve this?
How do we handle errors, and failures at this scale?
What can we still improve?
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
Hadoop is becoming a standard platform for building critical financial applications such as risk reporting, trading and fraud detection. These applications require high level of SLAs (service-level agreement) in terms of RPO (Recovery Point Objective) and RTO (Recovery Time Objective). To achieve these SLAs, organizations need to build a disaster recovery plan that cover several layers ranging from the infrastructure to the clients going through the platform and the applications. In this talk, we will present the different architecture blueprints for disaster recovery as well as their corresponding SLA objectives. Then, we will focus on the stretch cluster solution that Crédit Agricole CIB is using in production. We will discuss the solution’s advantages, drawbacks and the impact of this approach on the global architecture. Finally, we will explain in detail how to configure and deploy this solution and how to integrate each layer (storage layer, processing layer...) into the architecture.
Threat Modeling es el proceso sistemático de identificación, análisis y documentación de riesgos y amenazas sobre un sistema. En esta charla explicaremos cuáles son sus beneficios y cómo aplicarlo tanto en el desarrollo de aplicaciones como en el diseño de sistemas. La sesión es introductoria y abierta a diferentes perfiles.
Talk in TechParty 2019.
DevOps word in itself is a combination of two words;
One is Development and other is Operations. It is neither
an application nor a tool; instead, it is just a culture to
promote development and Operation process
collaboratively.
In other words, we can say that DevOps is the process of
alignment of IT and development operations with better
and improved communication
Quick talk about the basics of hardening containers in Kubernetes / Openshift. Hosted by Santander.
https://www.youtube.com/watch?v=UvGUKRwcHFg&list=PLwjS7M0kkf3KsE5uFtSrLzJS_IY8Ug7Yv&index=42
Por un lado Ángel Barrera (@AngelBarrera92) de Intelygenz y Fernando Álvarez (@methadata) de X by Orange, ingenieros de automatización e infraestructura, nos hablarán del camino recorrido para crear de cero una Telco en la nube (en este caso X by Orange en AWS, premio ADSL zone a la Mejor Innovación del año.
Repasamos la metodologia (Infraestructura Inmutable), tecnología (Infraestructura como código) y tooling (Terraform, Packer, Ansible, Openshift ...) con la que hemos desarrollado este proyecto.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
3. High Availability vs. Disaster Recovery
3
● HA, High Availability : is a characteristic of a system which aims to ensure
an agreed level of operational performance, usually uptime, for a higher
than normal period even if some components of the overall design are
not functional (degraded). Generally, is based in Active/Active
redundancy.
● DR, Disaster Recovery: involves a set of policies, tools and procedures to
enable the recovery or continuation of vital technology infrastructure
and systems following a natural or human-induced disaster. Generally,
involves secondary sites and Active/Passive redundancy.
4. HA layers: Applications
4
●Application availability usually involves creating
different application pods.
●Whenever one of the application pods fails, all
requests are redirected to the application pods that
are still alive, not affecting the overall service level.
●Requests must be redirected to other application
pods transparently for the final user. The
application must not maintain any local data
susceptible to be lost if the application instance
fails (stateless).
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
KubernetesOpenShift
Cluster
5. HA layers: Node
5
●Worker nodes are responsible to host application
pods.
●When one of the worker nodes fails, K8s cluster
redirects network traffic to the application pods in
other worker nodes. If necessary, other application
pods will be deployed automatically.
●The cluster must have enough system resources
(CPU/MEM) to distribute application workloads
upon worker node failures.
In a 5 nodes Kubernetes/OpenShift cluster, application workloads should not
consume more than 80% of the worker node system resources in order to be
able to allocate new pods upon worker node failures.
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
Kubernetes/OpenShift
Cluster
6. HA layers: Control plane
6
● Master nodes host administration and
management services (such as API and Console
pods).
● HA offered through quorum. Generally three master
nodes are deployed. Upon node failure, two master
nodes still alive and service is not disrupted.
● Losing the entire control plane do not affect
application services, only OpenShift management
and provisioning is affected (read-only
operations).
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
Kubernetes/OpenShift
Cluster
7. HA layers: Data
7
●Ceph introduction
○ Ceph is a Software Defined Storage system
deployed on standard x86 servers, using the
CRUSH algorithm to distribute data evenly
across the cluster.
○ Ceph provides 3-in-1 interfaces for object,
block and file level storage. Ceph aims
primarily for completely distributed operation
without a single point of failure and scalable to
the exabyte level.
○ Ceph (by default) stores 3 object replicas per
client object.
Ceph
Cluster
Ceph
Node
Ceph
Node
Ceph
Node
Block
FileSystem
Object
8. HA layers: Conclusions
8
Until here, everything works perfectly.
Combining K8s/OCP and persistent storage
as CEPH or equivalent in a single site, an
extraordinary service level can be
guaranteed,
but…
How can I protect my applications upon
natural or human-induced disasters
affecting the entire DataCenter?
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
Cluster
Storage
Node 1
Storage
Node 2
Storage
Node 3
9. Protection upon disasters
9
● Two different protection models:
○ Active/Active
■ Stretched cluster or multi-cluster.
■ Distributed applications between
clusters/DCs.
■ Data is accessible from any cluster/DC.
○ Active/Passive
■ Applications
■ Data
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
Cluster
Storage
Node 1
Storage
Node 2
Storage
Node 3
10. Multi-DataCenter deployments
What are the different alternatives?
10
2. Stretched Cluster (Active/Active)
3. Disaster Recovery (Active/Passive)
4. Recap and Conclusions
1. GitOps: Two synchronized independent clusters (Active/Active)
11. 1.- Multicluster /
GitOps
● Two independent clusters synchronized directly by the
applications (Active/Active)
● Data HA directly managed by the applications
11
12. GitOps: Synchronized Clusters
Configuration and application synchronization
12
GitOps is a way where a Git repository that always contains declarative descriptions of the
infrastructure currently desired in the production environment and an automated process to
make the production environment match the described state in the repository.
The authorization mechanism can be utilized to restrict the permissions on performing
deployments. This has a huge impact in terms of security as CI/CD applications do not need to
interact with the OpenShift cluster in the production environment.
Container
Registry
Configuration
Continuous
Integration
(Build, Test, Etc.)
Source Code
Monitor &
Apply changes
Download
Containers
from Registry
AppCtrl
AppCtrl
13. GitOps: Synchronized Clusters
13
ArgoCD is an "Application Controller" specifically
designed for Kubernetes that actively monitors
running applications and compare the current state
with the desired state (specified in the Git
repository).
An application deviated from the desired state is
considered as OutOfSync. ArgoCD informs and
shows about the differences, offering different
functionalities to synchronize (automatically or
manually) the current state with the desired state.
Any modification performed in the desired state in the Git repository can be applied in
order to automatically synchronized (per cluster/environment)the application state.
14. GitOps: Synchronized Clusters
DATA management
14
GitOps is capable to offer a model managing applications where High Availability (with no
service disruption) is guaranteed in a Multi-DC environment (No RPO or RTO).
Requirement: Data application must be shared from different cluster/environments/sites.
● Multi-master design where all the DB instances are active, RW and maintain exactly the
same information (RPO=0, RTO=0).
● Single-master design where all application access a single node, RW, where the
information is replicated to other DB instances, RO or not accessible at all. When a failure
occurs in the master instance, a secondary instance is promoted to master (RPO=0,
RTO=~0)
● Either DB are deployed in containers or externally, access to the exact same shared data
from the applications deployed in different sites must be ensured.
Further info: En.wikipedia.org EDB Crunchy MongoDB CouchBase Percona Microsoft
15. GitOps: Synchronized Clusters
Conclusion
15
As a recap, in a multi-DC environment, GitOps + Data replication (in the application or DB
layer) may offer Active/Active services with RPO=0 and RTO=0 upon natural or
human-induced disaster.
The main problem consists to identify the DB technologies (SQL and NoSQL) to use and
standardize how to deploy each of them.
The main advantages are:
- Deploy in two different sites/DC (max RTT between sites defined by the DB layer).
- Simplicity as there are not dependencies on the underlying infrastructure.
- Cost, as this solution can be deployed in bare-metal nodes reducing licensing costs.
- Cost, being Active/Active there are not infrautilized resources in standby environments.
- Ideal solution if application are stateless or using external data.
- RPO=0, RTO=0
16. GitOps: Synchronized Clusters
Conclusion
Multi-DataCenter
16
The main disadvantages are:
- Added complexity designing and deploying applications.
- Distributed DBs usually works with 3 object replicas and 3 DCs.
- External unification of incoming network traffic (routers, DNS, LBs, ...).
- Submariner is not included in this solution, so network traffic between
clusters/sites/DCs must leave OpenShift SDN.
- It is not possible to share Persistent Volumes in ODF between OpenShift clusters.
- There is not unified logging, monitoring, user access, security management, ...
External tools must be used.
- Max RTT are define per application/DB.
18. 2.- Stretched Cluster (Active/Active)
18
● In the GitOps model we have proposed the deployment of several
independent and self-managed clusters that remain in a synchronized
state.
● A Stretched Cluster is a K8s/OpenShift deployment model in which the
nodes of the cluster are distributed among several DataCenters.
● Although it is unlikely, the total loss of the master nodes does not imply
the loss of service but limiting only the management of the cluster.
19. 2.- Stretched Cluster (Active/Active)
19
Pros:
● By definition, a Stretched Cluster provides RTO=0 and RPO=0
● With 3 symmetrical DC's (same size), assuming 66% load, one DC could go down and
the cluster should not impact the service availability. 33% of the load of the down DC is
transferred to the remaining DC's (66/66/66 => 0/100/100)
● The same balancing happens internally between the worker nodes of each DataCenter.
Network latency requirements:
Latency between DataCenters ≤ 2ms (due to OCP masters requirements)
● ≤ 2ms between OCP masters
● ≤ 4ms between Ceph storage nodes (includinding arbiter) (Internal mode)
● ≤ 200ms between Ceph nodes and Arbiter node (External mode)
20. 2.1- K8s/OCP + internal storage in 3 DC
(rooms/FD)
20
Master
3
DataCenter 3
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
2
DataCenter 2
● w/without Application Nodes
● Master (it can be virtualized)
● CEPH w/ local physical storage
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
1
DataCenter 1
K8s/OpenShift Cluster
● Application Nodes
● Master (it can be virtualized)
● CEPH w/ local physical storage
CEPH
1
CEPH
CEPH
● Application Nodes
● Master (it can be virtualized)
● CEPH w/ local physical storage
PROS: Autonomous and IaaS agnostic model. Minimum underutilization of Computing and Memory (66/66/66> 0/100/100)
CONS: Requires 3 DCs, each of them well communicated with the other 2 DCs (latency ≤2ms)
- Check ETCD behavior with network isolation between DC's
- Applications may break as a result of the ETCD not being writeable
Node 1
Pod C Pod B
Node n
Pod C Pod B
21. Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
1
2.2- K8s/OCP + internal storage in 2 DC +
Arbiter
21
PROS: Autonomous and IaaS agnostic model. The DC hosting the arbiter requires very low infrastructure resources.
CONS:
- Minimal resources for a 3rd room or a DataCenter well communicated with the other 2 DC's is required.
- ≤ 4ms between Ceph nodes and Arbiter node (Internal mode) as K8s/OCP master requires lower RTT values.
- Underutilization of Computing and Memory (50/0/50> 0/0/100)
Master
3
DataCenter 3
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
2
DataCenter 2
● No Application Nodes
● Master (it can be virtualized)
● CEPH without local physical storage
DataCenter 1
K8s/OpenShift Cluster
● Application Nodes
● Master (it can be virtualized)
● CEPH w/ local physical storage
● 2 Ceph monitors
CEPH
Arbiter Node
(Metadata Only)
CEPH CEPH
● Application Nodes
● Master (it can be virtualized)
● CEPH w/ local physical storage
● 2 Ceph monitors
22. 2.3. HA based on IaaS - 2 DC’s Virtual
22
Master
3
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
2
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
1
OpenShift Cluster
CEPH
Arbiter Node
(Metadata Only)
100% virtualized environments in both DC’s
In the event of a DC1 crash all VMs are started in DC2
CEPH CEPH
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
1
Master
3
ODF 3
Arbiter Node
(Metadata Only)
ODF 1
DataCenter 1 DataCenter 2
● Masters virtualized
● Workers virtualized
● Virtualized CEPH arbiter
● Masters virtualized
● Workers virtualized
● Virtualized CEPH arbiter
PROS: In the event of a DataCenter crash, all nodes are moved to the other DataCenter
CONS: Replication bandwidth and time is required to replicate and synchronize the migrated nodes (approx 1 to 5 min)
- Requires IaaS (RHV / VMware) + synchronous cabin replication of all VM's from DC1 to DC2
24. Conclusion, decision criteria
24
As described in the previous slides, there are multiple deployment options
and alternatives on the table. A key point to be considered in the
decision-making is the number of DC's, their capacities and the network
latency between them.
The first option to be considered should be a Stretched Cluster deployed
across 3 DC's with similar computational capabilities (CPU, memory and
storage) and network latency between them ≤ 2ms.
This solution significantly reduces the costs of infrastructure, operation
and software licensing with the highest degree of service availability.
25. 25
On those scenarios where a Stretched Cluster deployment across 3 DC’s
with similar computation capabilities, etc. is not an option, an alternative
may be a Stretched Cluster in 2 DC plus an arbiter DC with limited resources.
This solution reduces the infrastructure resources for the arbiter DC that
could be allocated in a corporate building or room, and is separated from
the 2 main DCs, as long as the network latency requirements are met (≤
2ms).
Conclusion, decision criteria
26. 26
When a Stretched Cluster deployment in 3 DC’s is not possible but we can
have latency ≤ 2ms, we can consider a Stretched Cluster in 2 DC with DR
based on IaaS.
This model introduces a caveat as it maintains 2 of the 3 nodes of the OCP
and OCS control plane in one of the DCs. If this "primary" DC goes down, 1
node of each type must be transferred to the secondary DC for it to be
considered with sufficient quorum.
It requires designing and implementing the VM migration process.
Conclusion, decision criteria
27. 27
When a Stretched Cluster deployment on 3 DC’s is not an option or the
network latency will always be higher than 2ms, we can consider a
GitOps-based deployment on 2 DC’s.
This model requires that the instances of the same application, running in
any of the clusters, share the same data. Either because the architecture of
the applications allows it (for example, event-oriented), or because the
database is multimaster, or because there is a master-slave model that
allows promoting the slave without loss of service.
Conclusion, decision criteria
28. 28
Finally, when a Stretched Cluster deployment in 3 DC's can't be
accomplished, the latency is not ≤ 2ms and the data access model by
applications cannot be standardized, we can only consider a single DC with
DR.
This approach is possibly the least flexible and the one with the highest cost
per computing unit. It requires designing and implementing DR policies
and processes bounded to it as well as the operations and the maintenance
involved.
Conclusion, decision criteria
29. Conclusions, requirements per model
29
3 DC Stretched
Cluster
2 DC Stretched
Cluster + Arbiter site
2 DC managed
by Git Ops
2 DC Stretched
Cluster + IaaS
Three identical DCs with available scalability
(CPU, memory and storage) x
Two identical DCs with available scalability
(CPU, memory and storage) + Technical
room to deploy two baremetal servers
x
Two identical DCs with available scalability
(CPU, memory and storage). No technical
room available.
x x
RTT latency between DCs =< 2ms x x x
Applications deployed in two different OCP
clusters capable to share same external
storage (or replicated by application/DB)
x
IaaS infrastructure capable to migrate VMs
between DCs x
30. Conclusions, advantages per model
30
3 DC Stretched
Cluster
2 DC Stretched
Cluster + Arbiter site
2 DC managed by
Git Ops
2 DC Stretched
Cluster + IaaS
Active/Active (RTO=0, RPO=0) x x x x
Self-sufficient, do not require IaaS
environment x x x
Simple DR testing (no service
availability impact) x x x x
Efficient system resources
infra-utilization (CPU and memory)
(66/66/66 > 0/100/100)
x
No additional efforts to
implement/maintain DR x x
Do not require manual intervention
upon DC disaster x x /
Capable to be deployed on Bare Metal x x x /
31. Thanks for your attention!
31
LinkedIn: https://www.linkedin.com/in/jvherrera/
Twitter: https://twitter.com/jvicenteherrera
Email: juanvi@redhat.com