Logging at OVHcloud :
Logs Data platform est la plateforme de collecte, d'analyse et de gestion centralisée de logs d'OVHcloud. Cette plateforme a pour but de répondre aux challenges que constitue l'indexation de plus de 4000 milliards de logs par une entreprise comme OVHcloud. Cette présentation vous décrira l'architecture générale de Logs Data Platform autour de ses composants centraux Elasticsearch et Graylog et vous décrira les différentes problématiques de scalabilité, disponibilité, performance et d'évolutivité qui sont le quotidien de l'équipe Observability à OVHcloud.
Speakers: Chris Larsen (Limelight Networks) and Benoit Sigoure (Arista Networks)
The OpenTSDB community continues to grow and with users looking to store massive amounts of time-series data in a scalable manner. In this talk, we will discuss a number of use cases and best practices around naming schemas and HBase configuration. We will also review OpenTSDB 2.0's new features, including the HTTP API, plugins, annotations, millisecond support, and metadata, as well as what's next in the roadmap.
This year we'll talk about the joys of the HBase Fuzzy Row Filter, new TSDB filters, expression support, Graphite functions and running OpenTSDB on top of Google’s hosted Bigtable. AsyncHBase now includes per-RPC timeouts, append support, Kerberos auth, and a beta implementation in Go.
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
Speakers: Dominic Dwyer & Wei Shan Ang
This talk was presented in Percona Live Europe 2017. However, we did not have enough time to test against more scenario. We will be giving an updated talk with a more comprehensive tests and numbers. We hope to run it against citusDB and MongoRocks as well to provide a comprehensive comparison.
https://www.percona.com/live/e17/sessions/high-performance-json-postgresql-vs-mongodb
Another year, another talk about OpenTSDB running on HBase.
We'll discuss topics like:
Yahoo's append co-processor saving CPU resources by resolving atomic appends at compaction or query time.
The pros and cons of HBASE-15181, Date Tiered compaction for time series data.
Yahoo's experiments with unbounded secondary index on HBase.
OpenTSDB's 3.0 featuring a new query engine and API.
by Chris Larsen of Yahoo!
Back to Basics Webinar 6: Production DeploymentMongoDB
This is the final webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will guide you through production deployment.
Chris Larsen (Yahoo!)
This year we'll talk about the joys of the HBase Fuzzy Row Filter, new TSDB filters, expression support, Graphite functions and running OpenTSDB on top of Google’s hosted Bigtable. AsyncHBase now includes per-RPC timeouts, append support, Kerberos auth, and a beta implementation in Go.
Speakers: Chris Larsen (Limelight Networks) and Benoit Sigoure (Arista Networks)
The OpenTSDB community continues to grow and with users looking to store massive amounts of time-series data in a scalable manner. In this talk, we will discuss a number of use cases and best practices around naming schemas and HBase configuration. We will also review OpenTSDB 2.0's new features, including the HTTP API, plugins, annotations, millisecond support, and metadata, as well as what's next in the roadmap.
This year we'll talk about the joys of the HBase Fuzzy Row Filter, new TSDB filters, expression support, Graphite functions and running OpenTSDB on top of Google’s hosted Bigtable. AsyncHBase now includes per-RPC timeouts, append support, Kerberos auth, and a beta implementation in Go.
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
Speakers: Dominic Dwyer & Wei Shan Ang
This talk was presented in Percona Live Europe 2017. However, we did not have enough time to test against more scenario. We will be giving an updated talk with a more comprehensive tests and numbers. We hope to run it against citusDB and MongoRocks as well to provide a comprehensive comparison.
https://www.percona.com/live/e17/sessions/high-performance-json-postgresql-vs-mongodb
Another year, another talk about OpenTSDB running on HBase.
We'll discuss topics like:
Yahoo's append co-processor saving CPU resources by resolving atomic appends at compaction or query time.
The pros and cons of HBASE-15181, Date Tiered compaction for time series data.
Yahoo's experiments with unbounded secondary index on HBase.
OpenTSDB's 3.0 featuring a new query engine and API.
by Chris Larsen of Yahoo!
Back to Basics Webinar 6: Production DeploymentMongoDB
This is the final webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will guide you through production deployment.
Chris Larsen (Yahoo!)
This year we'll talk about the joys of the HBase Fuzzy Row Filter, new TSDB filters, expression support, Graphite functions and running OpenTSDB on top of Google’s hosted Bigtable. AsyncHBase now includes per-RPC timeouts, append support, Kerberos auth, and a beta implementation in Go.
MongoDB was designed for humongous amounts of data, with the ability to scale horizontally via sharding. In this session, we’ll look at MongoDB’s approach to partitioning data, and the architecture of a sharded system. We’ll walk you through configuration of a sharded system, and look at how data is balanced across servers and requests are routed.
Kafka on ZFS: Better Living Through Filesystems confluent
(Hugh O'Brien, Jet.com) Kafka Summit SF 2018
You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!)
Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka.
-Striping cheap disks to maximize instance IOPS
-Block compression to reduce disk usage by ~80% (JSON data)
-Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments
-Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free
We’ll cover:
-Basic Principles
-Adapting ZFS for cloud instances (gotchas)
-Performance tuning for Kafka
-Benchmarks
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
BlueStore is a new storage backend for Ceph OSDs that consumes block devices directly, bypassing the local XFS file system that is currently used today. It's design is motivated by everything we've learned about OSD workloads and interface requirements over the last decade, and everything that has worked well and not so well when storing objects as files in local files systems like XFS, btrfs, or ext4. BlueStore has been under development for a bit more than a year now, and has reached a state where it is becoming usable in production. This talk will cover the BlueStore design, how it has evolved over the last year, and what challenges remain before it can become the new default storage backend.
Ceph is an open source distributed storage system that provides scalable object, block, and file interfaces on a commodity hardware. Luminous, the latest stable release of Ceph, was just released in August. This talk will cover all that is new in Luminous (there is a lot!) and provide a sneak peak at the roadmap for Mimic, which is due out in the Spring.
Since a couple of years, the NoSQL movement has developed a variety of open-source document stores. They are focused on high availability, horizontal scalability, and are designed to run on commodity hardware. These products have gained great traction in the industry to store large amounts of flexible data. Arguably, the next step for the NoSQL community is on harnessing flexible data processing.
The aim of this presentation is to introduce JSONiq: the SQL of NoSQL.
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityPythian
Rene Cannao's Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability. Rene, a Senior Operational DBA at PalominoDB.com, will guide attendees through a hands-on experience in the installation, configuration management and tuning of MySQL Cluster.
Agenda:
- MySQL Cluster Concepts and Architecture: we will review the principle of a fault-tolerant shared nothing architecture, and how this is implemented into NDB;
- MySQL Cluster processes : attendees will understand the various roles and interactions between Data Nodes, API Nodes and Management Nodes;
- Installation : we will install a minimal HA solution with MySQL Cluster on 3 virtual machines;
- Configuration of a basic system : upon describing the most important configuration parameters, Data/API/Management nodes will be configured and the Cluster launched;
- Loading data: the "world" schema will be imported into NDB using "in memory" and "disk based" storages; the attendees will experience how data changes are visible across API Nodes;
- Understand the NDB Storage Engine : internal implementation details will be explained, like synchronous replication, transaction coordinator, heartbeat, communication, failure detection and handling, checkpoint, etc;
- Query and schema design : attendees will understand the execution plan of queries with NDB, how SQL and Data Nodes communicate, how indexes and partitions are implemented, condition pushdown, join pushdown, query cache;
- Management and Administration: the attendees will test High Availability of NDB when a node become unavailable will learn how to read log file, how to stop/start any component of the Cluster to perform a rolling restart with no downtime, and how to handle a degraded setup;
- Backup and Recovery: attendees will be driven through the procedure of using NDB-native online backup and restore, and how this differs from mysqldump;
- Monitor and improve performance: attendee will learn how to boost performance tweaking variables according to hardware configuration and application workload
Netflix Open Source Meetup Season 4 Episode 2aspyker
In this episode, we will take a close look at 2 different approaches to high-throughput/low-latency data stores, developed by Netflix.
The first, EVCache, is a battle-tested distributed memcached-backed data store, optimized for the cloud. You will also hear about the road ahead for EVCache it evolves into an L1/L2 cache over RAM and SSDs.
The second, Dynomite, is a framework to make any non-distributed data-store, distributed. Netflix's first implementation of Dynomite is based on Redis.
Come learn about the products' features and hear from Thomson and Reuters, Diego Pacheco from Ilegra and other third party speakers, internal and external to Netflix, on how these products fit in their stack and roadmap.
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
MongoDB was designed for humongous amounts of data, with the ability to scale horizontally via sharding. In this session, we’ll look at MongoDB’s approach to partitioning data, and the architecture of a sharded system. We’ll walk you through configuration of a sharded system, and look at how data is balanced across servers and requests are routed.
Kafka on ZFS: Better Living Through Filesystems confluent
(Hugh O'Brien, Jet.com) Kafka Summit SF 2018
You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!)
Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka.
-Striping cheap disks to maximize instance IOPS
-Block compression to reduce disk usage by ~80% (JSON data)
-Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments
-Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free
We’ll cover:
-Basic Principles
-Adapting ZFS for cloud instances (gotchas)
-Performance tuning for Kafka
-Benchmarks
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
BlueStore is a new storage backend for Ceph OSDs that consumes block devices directly, bypassing the local XFS file system that is currently used today. It's design is motivated by everything we've learned about OSD workloads and interface requirements over the last decade, and everything that has worked well and not so well when storing objects as files in local files systems like XFS, btrfs, or ext4. BlueStore has been under development for a bit more than a year now, and has reached a state where it is becoming usable in production. This talk will cover the BlueStore design, how it has evolved over the last year, and what challenges remain before it can become the new default storage backend.
Ceph is an open source distributed storage system that provides scalable object, block, and file interfaces on a commodity hardware. Luminous, the latest stable release of Ceph, was just released in August. This talk will cover all that is new in Luminous (there is a lot!) and provide a sneak peak at the roadmap for Mimic, which is due out in the Spring.
Since a couple of years, the NoSQL movement has developed a variety of open-source document stores. They are focused on high availability, horizontal scalability, and are designed to run on commodity hardware. These products have gained great traction in the industry to store large amounts of flexible data. Arguably, the next step for the NoSQL community is on harnessing flexible data processing.
The aim of this presentation is to introduce JSONiq: the SQL of NoSQL.
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityPythian
Rene Cannao's Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability. Rene, a Senior Operational DBA at PalominoDB.com, will guide attendees through a hands-on experience in the installation, configuration management and tuning of MySQL Cluster.
Agenda:
- MySQL Cluster Concepts and Architecture: we will review the principle of a fault-tolerant shared nothing architecture, and how this is implemented into NDB;
- MySQL Cluster processes : attendees will understand the various roles and interactions between Data Nodes, API Nodes and Management Nodes;
- Installation : we will install a minimal HA solution with MySQL Cluster on 3 virtual machines;
- Configuration of a basic system : upon describing the most important configuration parameters, Data/API/Management nodes will be configured and the Cluster launched;
- Loading data: the "world" schema will be imported into NDB using "in memory" and "disk based" storages; the attendees will experience how data changes are visible across API Nodes;
- Understand the NDB Storage Engine : internal implementation details will be explained, like synchronous replication, transaction coordinator, heartbeat, communication, failure detection and handling, checkpoint, etc;
- Query and schema design : attendees will understand the execution plan of queries with NDB, how SQL and Data Nodes communicate, how indexes and partitions are implemented, condition pushdown, join pushdown, query cache;
- Management and Administration: the attendees will test High Availability of NDB when a node become unavailable will learn how to read log file, how to stop/start any component of the Cluster to perform a rolling restart with no downtime, and how to handle a degraded setup;
- Backup and Recovery: attendees will be driven through the procedure of using NDB-native online backup and restore, and how this differs from mysqldump;
- Monitor and improve performance: attendee will learn how to boost performance tweaking variables according to hardware configuration and application workload
Netflix Open Source Meetup Season 4 Episode 2aspyker
In this episode, we will take a close look at 2 different approaches to high-throughput/low-latency data stores, developed by Netflix.
The first, EVCache, is a battle-tested distributed memcached-backed data store, optimized for the cloud. You will also hear about the road ahead for EVCache it evolves into an L1/L2 cache over RAM and SSDs.
The second, Dynomite, is a framework to make any non-distributed data-store, distributed. Netflix's first implementation of Dynomite is based on Redis.
Come learn about the products' features and hear from Thomson and Reuters, Diego Pacheco from Ilegra and other third party speakers, internal and external to Netflix, on how these products fit in their stack and roadmap.
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
MySQL Cluster (NDB) - Best Practices Percona Live 2017Severalnines
This presentation by Johan Andersson at Percona Live 2017 in Santa Clara, California gives detailed information on all you need to know to effectively deploy and manage MySQL Cluster technology in your environment.
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...Athens Big Data
Title: Fast and simple data exploration with ClickHouse
Speaker: Alexander Kuzmenkov (https://github.com/akuzm/)
Date: Thursday, March 5, 2020
Event: https://meetup.com/Athens-Big-Data/events/268379195/
Galaxy Big Data with MariaDB 10 by Bernard Garros, Sandrine Chirokoff and Stéphane Varoqui.
Presented 26.6.2014 at the MariaDB Roadshow in Paris, France.
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
QuestDB es una base de datos open source de alto rendimiento. Mucha gente nos comentaba que les gustaría usarla como servicio, sin tener que gestionar las máquinas. Así que nos pusimos manos a la obra para desarrollar una solución que nos permitiese lanzar instancias de QuestDB con provisionado, monitorización, seguridad o actualizaciones totalmente gestionadas.
Unos cuantos clusters de Kubernetes más tarde, conseguimos lanzar nuestra oferta de QuestDB Cloud. Esta charla es la historia de cómo llegamos ahí. Hablaré de herramientas como Calico, Karpenter, CoreDNS, Telegraf, Prometheus, Loki o Grafana, pero también de retos como autenticación, facturación, multi-nube, o de a qué tienes que decir que no para poder sobrevivir en la nube.
Accelerating hbase with nvme and bucket cacheDavid Grier
This set of slides describes some initial experiments which we have designed for discovering improvements for performance in Hadoop technologies using NVMe technology
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
AWS Big Data Demystified #1: Big data architecture lessons learned . a quick overview of a big data techonoligies, which were selected and disregard in our company
The video: https://youtu.be/l5KmaZNQxaU
dont forget to subcribe to the youtube channel
The website: https://amazon-aws-big-data-demystified.ninja/
The meetup : https://www.meetup.com/AWS-Big-Data-Demystified/
The facebook group : https://www.facebook.com/Amazon-AWS-Big-Data-Demystified-1832900280345700/
Big data challenges are common : we are all doing aggregations , machine learning , anomaly detection, OLAP ...
This presentation describe how InnerActive answer those requirements
Most database products have their own auditing functionalities or plugins but they always involve overhead which means they end up having them turned off or with the bare minimum enabled.
In this workshop we will show how to get reliable logging for mysql and mongodb servers in a scalable and non intrusive way, its drawbacks and how we can build our own open source tools to achieve results similar to most commercial products.
Tools to sniff, process and act upon queries will be shared and we will show how simple is to set up and monitor a database environment so it can be replicated and grow horizontally. All the code needed will be published.
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
Starting with version 2.10, the Apache ZooKeeper dependency has been eliminated and replaced with a pluggable framework that enables you to reduce the infrastructure footprint of Apache Pulsar by leveraging alternative metadata and coordination systems based on your deployment environment. In this talk, walk through the steps required to utilize the existing etcd service running inside Kubernetes to act as Pulsar's metadata store, thereby eliminating the need to run ZooKeeper entirely, leaving you with a Zookeeper-less Pulsar.
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Hernan Costante
Nowadays in an increasingly more complex and dynamic network its not enough to be a regex ninja and storing only the logs you think you might need. From network traffic to custom logs you won't know which logs will be crucial to stop the next attacker, and if you are not planning to spend a half of your security budget in a commercial solution we will show you a way to building you own SIEM with open source. The talk will go from how to build a powerful logging environment for your organization to scaling on the cloud and storing everything forever. We will walk through how to build such a system with open source solutions as Elasticsearch and Hadoop, and creating your own custom monitoring rules to monitor everything you need. The talk will also include how to secure the environment and allow restricted access to other teams as well as avoiding common pitfalls and ensuring compliance standards.
Eko10 workshop - OPEN SOURCE DATABASE MONITORINGPablo Garbossa
Most database products have their own auditing functionalities or plugins but they always involve overhead which means they end up having them turned off or with the bare minimum enabled.
In this workshop we will show how to get reliable logging for mysql and mongodb servers in a scalable and non intrusive way, its drawbacks and how we can build our own open source tools to achieve results similar to most commercial products.
Tools to sniff, process and act upon queries will be shared and we will show how simple is to set up and monitor a database environment so it can be replicated and grow horizontally. All the code needed will be published.
OVHcloud Startup Program : Découvrir l'écosystème au service des startups OVHcloud
L’équipe de l’OVHcloud Startup Program France Benelux a organisé, le 05 janvier dernier, son premier meetup online de l’année.
Le premier d’une longue série !
Cette première session, animée par Fanny Bouton, Startup Program Leader France Benelux, était l’occasion de découvrir toute l’ampleur de l’écosystème OVHcloud au service des startups au travers de l’OVHcloud Marketplace, l’Open Trusted Cloud Program ou encore avec l’OVHcloud Partner Program.
Ce rendez-vous a permis d’échanger directement avec l’ensemble des Program Leaders d’OVHcloud ainsi que nos partenaires tels que La BigAddress, Freelance Stack ou encore SmartGlobal.
Fine tune and deploy Hugging Face NLP modelsOVHcloud
Are you currently managing AI projects that require a lot of GPU power?
Are you tired of managing the complexity of your infrastructures, GPU instances and your Kubeflow yourself?
Need flexibility for your AI platform or SaaS solution?
OVHcloud innovates in AI by offering simple and turnkey solutions to train your models and put them into production.
How can you successfully migrate to hosted private cloud 2020OVHcloud
OVHcloud teams are pleased to offer you this webinar dedicated to migration to HPC2020 :
• What is HPC2020 & its features ?
• What are the migration paths and steps ?
• What resources are made available to you ?
• Q&A
OVHcloud Partner Webinar - Data ProcessingOVHcloud
OVHcloud vous apporte en avant-première son éclairage d’expert de l’infrastructure, alors que nous lançons une nouvelle gamme de services cloud dédiée à la Data. Celle-ci réduit drastiquement les contraintes d’infrastructure sur les étapes clés du cycle de vie de la donnée, et permet ainsi aux professionnels de la données (data engineers, data ops, data scientists…) de se concentrer sur sa valorisation.
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...OVHcloud
Nous vivons une époque où tout est connecté, de nos ampoules à notre éditeur de texte, les objets et services qui nous entourent devienne de plus en plus intelligents. Pour ce faire ils génèrent des données, elles sont nécessaires au bon fonctionnement du service ou de l'objet, mais elles sont également utiles pour faire évoluer les produits.
Ces données peuvent rapidement représenter de gros volumes, plusieurs dizaines voir centaines de gigaoctets. Une question se pose alors, comment traiter de tels volumes ? Comment en tirer du sens et de la valeur à cette échelle ?
Avec OVHcloud Data Processing, une solution basée sur le framework Apache Spark, nous répondons à ce besoin. Venez découvrir comment vous aussi, en quelques cliques, pouvez exécuter votre code sur une infrastructure taillée pour vos besoins.
Au travers de différents exemples, comme une analyse du traffic des taxis New-Yorkais, nous verrons comment Data Processing a été pensé, comment il fonctionne et comment il peut être utilisé pour valoriser vos données.
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...OVHcloud
La semaine dernière, les ministres de l'Économie de la France et l'Allemagne ont ont dévoilé les contours GAIA-X, visant à établir les bases d'un écosystème cloud Européen capable de proposer des services respectant des critères de sécurité et des normes européennes.
Mais qu'est-ce que ça veut dire concrètement pour vous, en tant que développeur, devops et/ou sysadmin ?
En quoi cette initiative gouvernementale conjointe peut vous apporter quelque chose au quotidien ? Et quand on parle de collaboration entre des acteurs franco-allemands, ça veut dire quoi en pratique ?
Pierre Gronlier, Solutions Architect chez OVHcloud et Yohann Prigent, VP Front chez Scaleway, ont été impliqués dans GAIA-X depuis des mois, au cœur notamment d'un des projets présentés jeudi dernier: le Démonstrateur GAIA-X (https://staging.gaia-x-demonstrator.eu/). Dans ce Tech Talk, ils vous ouvrent les coulisses du projet, et vous racontent non seulement le pourquoi et le comment ce démonstrateur a été conçu, avec tous les détails techniques, mais aussi et surtout, ce que ça veut dire GAIA-X pour vous en tant que Tech !
Avec Enterprise Cloud Databases, découvrez un service dédié, entièrement géré et surveillé, basé sur le système de gestion de bases de données relationnelle PostgreSQL, qui garantit une haute disponibilité pour vos charges de travail les plus critiques.
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...OVHcloud
Tout le monde parle d’intelligence artificielle de nos jours et certaines personnes imaginent cela comme complexe.
La réalité est bien différente, les concepts sont simples, et des outils existent pour cacher les complexités d’implémentation.
Dans cet épisode, Jean-Louis Queguiner, nous explique ce qu'est l'intelligence artificielle, les méthodes qui existent et vous présente les concepts basiques des réseaux de neurones.
Pas de maths, c'est promis, juste du bon sens !
Un filesystem accessible depuis le réseau, un besoin très courant sur nos serveurs, et surtout depuis la montée en puissance des containers et de la scalabilité horizontale: “je veux accéder au même Filesystem sur tous mes noeuds!”
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...OVHcloud
Infrastructure, big data, bases de données, Kubernetes, load balancing, SaaS, PaaS, IaaS… À l’image de nos OVHcloud Meetups (comme ceux de Paris, Rennes ou Brest), les sujets des OVHcloud Tech Talks sont divers et variés, basés sur le partage de connaissance et les retours d’expérience, et toujours faits par des tech pour des tech.
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilitéOVHcloud
Je m'appelle Jérémy Hennart, je suis Program Manager, Scrum Master, Facilitateur, ou encore chef de projet. J’accompagne des équipes techniques dans mon quotidien, et dans cette épisode des OVHcloud Tech Talks, je vais vous raconter comment j’ai “agilisé” une équipe de 29 développeurs, avec la Télémétrie Agile !
Chez OVHcloud, nous utilisons en interne des modèles de Machine Learning qui aident à la prise de décision, dans des domaines allant de la lutte contre la fraude à l'amélioration de la maintenance de nos infrastructures.
Tirant parti des formats Open Source standard - tels que les SavedModels de Tensorflow - ML Serving permet aux utilisateurs de déployer facilement leurs modèles tout en bénéficiant de fonctionnalités essentielles telles que l'instrumentation, l'évolutivité et la gestion des versions des modèles.
A la découverte du standard OpenStack et de ses APIs
OpenStack est la brique logicielle open source sur laquelle s'appuie OVHcloud pour proposer son offre de Public Cloud (compute, storage, network, …). OpenStack permet l’administration complète des ressources à travers une API particulièrement riche. Raison pour laquelle OVHcloud en donne un accès exhaustif aux utilisateurs de son Public Cloud, nombreux à manipuler leurs ressources en lignes de commande.
Au fil de ce meetup, nous poserons les bases de l’architecture et du fonctionnement d’OpenStack et de ses différents composants. Nous parlerons ensuite du fonctionnement des APIs OpenStack, éléments clés pour interagir avec OpenStack. Nous finirons par quelques usages de ces APIs au travers d’outils connus comme Terraform, qui permettront de mettre en évidence l’importance de proposer un standard dans l’univers du Public Cloud.
OVHcloud utilise Ceph depuis cinq ans pour certains de ses besoins de stockage, bien qu'étant composée de 2000 serveurs physiques et 20000 conteneurs, cette infrastructure est gérée au quotidien par une seule personne au RUN. Nous ferons une présentation et un retour d'expérience sur les différents moyens mis en oeuvre pour y arriver.
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...OVHcloud
Il y a deux ans, nous apprenions notre nouvelle mission : migrer les 3 millions de sites web hébergés dans notre datacentre de Paris. Sans en maitriser le code source, les migrer sans impact nous semblait totalement irréaliste.
18 mois plus tard, c'est terminé ! Pour y arriver, nous avons du configurer des proxy SQL, des tunnels réseau, migrer des IP entre nos datacentres, livrer des milliers de serveurs, bosser durant des dizaines de nuits, mais aussi s'organiser entre plusieurs équipes qui n'ont pas l'habitude de travailler ensemble. Quels sont les soucis technique et humains que nous avons rencontrés, et comment les avons nous résolu ? Retour d'expérience sur l'une des plus grosse migration que le web ai connu !
Le machine learning et l’IA sont des buzzwords qui font maintenant partie de notre quotidien. Pourtant, rares sont les projets qui osent inclure du ML dans leur cycle de vie.
Les raisons sont multiples :
- Inquiétudes sur un niveau d’expertise trop limité en DataScience
- Difficultés d’apprécier à l’avance le gap entre difficulté de mise en place et retour sur investissement
- Inquiétudes sur la pérennité des efforts investis : (dérive des modèles entrainés)
- Peur de s’engager dans un effort trop important de maintenance sur le long terme
Bien que fondées, ces raisons n’ont plus lieu d’être après la mise en place de procédés d’industrialisation spécifiques à ce genre de problème.
Venez découvrir comment nous avons fait converger les compétences des datascientists et des devops afin de créer une plate-forme de machine learning simple, scalable et accessible aux non-experts. De l’analyse des données à la mise en production de modèles nous verrons comment industrialiser les procédés d’apprentissage automatique sans le moindre effort.
Pour plus d'informations à propos de Prescience :
https://labs.ovh.com/machine-learning-platform
Enterprise Cloud Databases are fully managed and clustered databases tailored for production needs.
OVH takes care of all the infrastructure setup, you end up with you SQL access and are able to focus on your business.
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSXOVHcloud
In this workshop VMware will provide a quick reminder of the main contributions of the NSX network virtualization platform: consistent network and security management, increased application resiliency, rapid migration of workloads to and from the cloud.
VMware and OVH will then move on to practical cases with implementation of micro-segmentation, dynamic routing, automatic deployment of an application, load balancing in the OVH Hosted Private Cloud. This workshop is aimed at a technical audience.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
8. The Mission
“Provide a platform allowing OVH to collect, retrieve and analyze logs from any
infrastructure or application” (end 2014)
9. The Mission
“Provide a platform allowing OVH to collect, retrieve and analyze logs from any
infrastructure or application”.
● Available as a Service
● All OVH personas, multi-tenant
● Centralized, queryable, analytics capabilities
● Servers, network, devices
● Software from OVH and others
10. The Mission
● 2 people at start.
● First P.O.C leveraging Big Data ecosystem :
11. The Mission : POC challenges
● Complexity
● Multi tenancy
● Orchestration
12. The Mission
Too much work, so little time:
A wonderful person (@jedisct1) showed us :
14. The Mission : Graylog
✔ Elasticsearch As Backend
✔ Features : Search, Data Viz, Alerting, Extensible
✔ Built-in Multi tenancy
✔ Scalable By Design
✔ Standards formats (Syslog, Gelf)
✔ API Available
18. The Mission: Alpha
● Alpha (early 2015):
○ 1 VM for Graylog web interface
○ 3 VM for MongoDB
○ 1 HA Proxy
19. The Mission : Alpha
● The Good :
😁 Performance
😁 Practicality
😁 Stability
● The Bad:
☹️ Not Self Service
☹️ Mutualized Indexes
● The Ugly:
🤮 1 socket = 1 Graylog Server
25. The Mission: Beta
● The Good :
😁 Kafka/ZK/Flowgger/Graylog
😁 Users and use cases
● The Bad:
☹️ Retention is low
☹️ Logstash performance
● The Ugly:
🤮 Elasticsearch
26. The Mission: Beta
● Too many shards (250 indexes *160 shards = 40 000 shards):
○ Initialization and Rebalancing issues.
○ Memory consumption in data structures.
○ Big Cluster State Update (slow recovery/slow pending tasks).
● CMS GC:
○ Long STW GC Pauses => nodes out of the cluster.
○ G1GC was not deemed prod ready for Lucene (LUCENE-
5168/LUCENE-6098).
● Resources Usage:
○ Big Queries => I/O Wait => Lag
○ Indexing burst => No search performance
27. The Mission: Beta
Improvements:
● Hot-Warm architecture:
○ Nodes dedicated to indexing and “recent” data searching
○ Nodes dedicated to search only
29. The Mission: Beta
Improvements:
● Elasticsearch:
○ Upgrade to 2.X: better, faster, stronger.
○ Divide the number of shards by 2.
○ Configuration changes: breakers, threadpool, index settings,
mapping...
30. The Mission: Gamma
● From Beta to Gamma (2015-2017):
○ SSD on Hot-Nodes
○ Streams and Dashboards Sharing
○ Better performance on ES
○ Graylog upgrade and plugins
○ SailAbove to Mesos
○ Additional Features: Cold Storage, Index As a service
31. The Mission: Gamma
● But, big outages on the way:
○ Unexplained issues:
■ “ghost” indexes
■ hot spot
■ memory leaks
○ Explained issues:
■ OS, JVM, ES Settings
■ MongoDB
■ Bugs
32. The Mission: Gamma
● Problems:
○ Domain of failure
○ Different user needs
■ Low latency
■ High indexing write
■ High retention
○ Inefficiency
○ Scalability
38. Disclaimer
● “It works !™” for OUR use case : Logging with mutualized indexes.
● “It works !™” until our next upgrade or our next rendezvous.
● “It works !™” within our budget:
○ Budget == infrastructure cost + SREs time.
42. Deep Dives: Kafka
● IO scheduler: prefer deadline/mqdeadline
● Rack awareness
● Compress on producer side and on topics (ZSTD
available in 2.1).
● Keep the number of partition as low as possible
● Setup I/O threads and network threads
● Monitor partition assignment
● Use modern consumers
43. Deep Dives: MongoDB
● Primary only for R/W
● Indexes
● Journaled writes
● Write Concern
44. Deep Dives: Graylog
● Message Processing metrics
● Use Custom message processor
● Tune processbuffer+outputbuffer_processors, ring_sizes,
batch_sizes
● Enable rest gzip
● tune web+rest_selector_runners_count
● tune rest_worker+proxied_request_threadpool_size
● Rotation Strategy: prefer size
● Number of shards -> number of indexing nodes/2
45. Deep Dives: Elasticsearch
● Indexing is CPU Heavy
● Raid 0 or SSD
● SSD: use deadline
● No Swap
● Tune, net.ipv4.tcp_tw_reuse, fs.file-max, fs.nr_open, fs.aio-max-
nr, vm.max_map_count
51. Deep Dives: Elasticsearch
Indices Mapping:
● Use Templates:
● Deactivate Norms and index
● Conventions:
{ “double_suffix”: {
"mapping" : {
"type" : "double"
},
"match" : "*_double"
}
},
52. Deep Dives: Improve
● Observability
○ System metrics
○ JVM GC Logging
○ Jstack, jmap are your friends
○ Software KPI
53. Deep Dives: Improve
● Try new settings
○ Breaking a node must be easy
○ Breaking a cluster should be possible
○ Try/Fail/Try again
○ Try with real workload
56. Extra Bits
Extra Features:
● ES API to search streams
● Cold Storage on PCA
● Index as a Service
● Kibana as a Service
● Real time tail over WebSocket
57. Extra Bits: Under the Hood
● Engine: 100k LOC
● Monitoring: Ganglia, Shinken, Opsgenie
● Metrics Data Platform for business metrics
58. Extra Bits
● Low Latency Cluster for SOC
○ 100-200 logs/sec => Small cluster (4 data nodes)
○ Must answer < 200 ms on queries spanning on millions of data
○ One user login at OVH == One query
○ SSD + high cache sizes
○ Tweak queries to most efficient aggregations.
59. Extra Bits
● High Writing Cluster for DNS
○ 800k logs/sec (burst > 1.2 M)
○ Hot-Warm cluster (54 hot/14 warm)
○ Hot CPU => 2X Xeon E5-2640v3 (16c 40-60 % CPU usage)
○ 737 Billions of DNS Record
○ 150 TB of Data for primaries
60. Extra Bits
● High Writing Cluster for Mail
○ 112k logs/sec (burst > 200k)
○ Hot-Warm cluster (30 hot/22 warm)
○ Hot CPU => 2X Xeon E5-2640v3 (16c 30-50 % CPU usage)
○ 152 Billions of logs
○ 135 TB of Data for primaries
○ ~2KB by message
61. Closing
● Know your users
○ Write Workload vs Low Latency vs Read Workload
○ Expectations (retention, performance)
○ Gather Feedback
○ Teach/Document good user practices
62. Closing
● Know your stack
○ Read documentation, read blogs
○ Read Code
○ Observe software metrics and logs
○ Try, fail, try, fail, try, fail...until success
○ Upgrade your software to latest versions
63. Closing
● Know your infrastructure
○ Prefer Bare Metal for predictability
○ Prepare for failure
○ Scale only when everything else fails
○ Observe system metrics