Taking Your Database Beyond the Border of a Single Kubernetes ClusterChristopher Bradford
Deploying applications on Kubernetes is getting easier every day. From a minimal deployment to distributed service mesh enabled applications with planning and a little bit of YAML resilient cloud-native applications are the norm. In this session, Christopher Bradford and Ty Morton will help answer the following questions: - What about your data behind these apps? - Are you running those in a multi-cluster environment or sending everything back to a common location? - How do you modernize to a distributed peer-to-peer data architecture? - How do you plan for this change? - Are there pitfalls on the road to enlightened data? Join this session to explore the key concepts needed when investigating multi-cluster deployments for data. This includes: - Cluster planning - Network design - Security - Failure handling
Real-Time Integration Between MongoDB and SQL DatabasesEugene Dvorkin
Many companies have huge investment in Data Warehouse and BI tools and want to leverage those investments to process data collected by applications in MongoDB. For example, a company may need to blend clickstream data collected by distributed MongoDB data storage with personal data from Oracle into the Data Warehouse system or Analytics platform to provide timely marketing reports. Most of the time the job requires converting a MongoDB JSON document structure into a traditional relational model. Traditional ETL (Extract Transform Load) process still needed to be developed for loading and conversion of unstructured data into traditional analytical tools or Hadoop. In this talk we discuss how to develop a real-time, scalable, fault-tolerant ETL process to integrate MongoDB with traditional RDBMS storage using the open-sourced Twitter Storm project. We will be capturing data streamed by MongoDB oplog or capped collections, transforming it into tables, rows and columns and loading it into a SQL database. We will discuss mongoDB oplog and Storm architecture. The principles discussed in the talk can be used for many other applications - like advanced analytics, continuous computations and so on. We will be using Java as our language of choice but you can use the same software stack with any language.
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
Presented at PEARC20.
This talk presents expanding the IceCube’s production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise.
Taking Your Database Beyond the Border of a Single Kubernetes ClusterChristopher Bradford
Deploying applications on Kubernetes is getting easier every day. From a minimal deployment to distributed service mesh enabled applications with planning and a little bit of YAML resilient cloud-native applications are the norm. In this session, Christopher Bradford and Ty Morton will help answer the following questions: - What about your data behind these apps? - Are you running those in a multi-cluster environment or sending everything back to a common location? - How do you modernize to a distributed peer-to-peer data architecture? - How do you plan for this change? - Are there pitfalls on the road to enlightened data? Join this session to explore the key concepts needed when investigating multi-cluster deployments for data. This includes: - Cluster planning - Network design - Security - Failure handling
Real-Time Integration Between MongoDB and SQL DatabasesEugene Dvorkin
Many companies have huge investment in Data Warehouse and BI tools and want to leverage those investments to process data collected by applications in MongoDB. For example, a company may need to blend clickstream data collected by distributed MongoDB data storage with personal data from Oracle into the Data Warehouse system or Analytics platform to provide timely marketing reports. Most of the time the job requires converting a MongoDB JSON document structure into a traditional relational model. Traditional ETL (Extract Transform Load) process still needed to be developed for loading and conversion of unstructured data into traditional analytical tools or Hadoop. In this talk we discuss how to develop a real-time, scalable, fault-tolerant ETL process to integrate MongoDB with traditional RDBMS storage using the open-sourced Twitter Storm project. We will be capturing data streamed by MongoDB oplog or capped collections, transforming it into tables, rows and columns and loading it into a SQL database. We will discuss mongoDB oplog and Storm architecture. The principles discussed in the talk can be used for many other applications - like advanced analytics, continuous computations and so on. We will be using Java as our language of choice but you can use the same software stack with any language.
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
Presented at PEARC20.
This talk presents expanding the IceCube’s production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise.
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Igor Sfiligoi
The San Diego Supercomputer Center (SDSC) and the Wisconsin IceCube Particle Astrophysics Center (WIPAC) at the University of Wisconsin–Madison successfully completed a computational experiment as part of a multi-institution collaboration that marshalled all globally available for sale GPUs (graphics processing units) across Amazon Web Services, Microsoft Azure, and the Google Cloud Platform.
In all, some 51,500 GPU processors were used during the approximately 2-hour experiment conducted on November 16 and funded under a National Science Foundation EAGER grant.
The experiment – completed just prior to the opening of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19) in Denver, CO – was coordinated by Frank Würthwein, SDSC Lead for High-Throughput Computing, and Benedikt Riedel, Computing Manager for the IceCube Neutrino Observatory and Global Computing Coordinator at WIPAC. Igor Sfiligoi, SDSC’s lead scientific software developer for high-throughput computing, and David Schultz, a production software manager with IceCube, conducted the actual run.
This presentation was given at several booths during SC19 by Frank Würthwein.
Open Source Cloud Sync and Share software provides synchronisation layer on top of a variety of backend storages such as local filesystems and object storage. In case of some software stacks, such as ownCloud, a SQL database is used to support the synchronisation requirements.
We tested how different technology stacks impact the ownCloud HTTP-based Synchronisation Protocol. Efficiency and scalability analysis was performed based on benchmarking results. The results have been produced using the ClawIO framework prototype.
ClawIO is a Cloud Synchronisation Benchmarking Framework. The software provides a base
architecture to stress different storage solutions against different cloud synchronisation protocols.
Such architecture is based on the IETF Storage Sync draft specification and CERN EOS.
The synchronisation logic is divided into control and data servers.
This separation is done by the use of highly-decoupled micro services connected to each other using high performance communication protocols such a gRPC and HTTP/2.
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
OSDC 2016, Berlin: Talk by Florian Lautenschlager (@flolaut, Senior Software Engineer at QAware)
Abstract: How to store billions of time series points and access them within a few milliseconds? Chronix! Chronix is a young but mature open source project that allows one for example to store about 15 GB (csv) of time series in 238 MB with average query times of 21 ms. Chronix is built on top of Apache Solr a bulletproof distributed NoSQL database with impressive search capabilities. In this code-intense session we show how Chronix achieves its efficiency in both respects by means of an ideal chunking, by selecting the best compression technique, by enhancing the stored data with (pre-computed) attributes, and by specialized query functions.
Managing Data and Operation Distribution In MongoDBJason Terpko
In a sharded MongoDB cluster, scale and data distribution are defined by your shard keys. Even when choosing the correct shards key, ongoing maintenance and review can still be required to maintain optimal performance.
This presentation will review shard key selection and how the distribution of chunks can create scenarios where you may need to manually move, split, or merge chunks in your sharded cluster. Scenarios requiring these actions can exist with both optimal and sub-optimal shard keys. Example use cases will provide tips on selection of shard key, detecting an issue, reasons why you may encounter these scenarios, and specific steps you can take to rectify the issue.
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
We ran a 50k GPU multi-cloud simulation to support the IceCube science. This talk provided an overview of what happened to the associated data.
Presented at the Internet2 booth at SC19.
Object multifunctional indexing with an open API akvalex
NBITSearch is a search engine with an open API for local stations, LAN and Internet. Advantages over counterparts:
1. Object indexing. It allows to index objects S of any types T.
2. Multifunctional indexing. It allows to index objects simultaneously by a set of any functions F (S).
3. Very fast search. It allows to save time and money.
For IceCube, large amount of photon propagation simulation is needed to properly calibrate natural Ice. Simulation is compute intensive and ideal for GPU compute. This Cloud run was more data intensive than precious ones, producing 130 TB of output data. To keep egress costs in check, we created dedicated network links via the Internet2 Cloud Connect Service.
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...Igor Sfiligoi
NRP Engagement webinar: Description of the 380 PFLOP32S , 51k GPU multi-cloud burst using HTCondor to run IceCube photon propagation simulation.
Presented January 27th, 2020.
NBITSearch is a search engine with an open API for local stations, LAN and Internet. Advantages over counterparts:
1. Object indexing. It allows to index objects S of any types T.
2. Multifunctional indexing. It allows to index objects simultaneously by set any functions F (S).
3. Very fast search. It allows to save time and money.
A database trigger is a stored procedure that is executed when specific actions occur within a database. Triggers fit perfectly on a relational schema (foreign keys) and are implemented as a built-in functionality on popular relational database like MySQL.
MongoDB does not have any support for triggers, mainly due to the lack of support for foreign keys. Even if it usually considered an antipattern, there are use cases in MongoDB that benefit from a partially-relational schema. The lack of triggers is an obstacle for a partially-relational schema but there can be workarounds for simulating trigger behavior.
This presentation will guide you through different ways to implement triggers in MongoDB. We will cover the topics streams, tailable cursors, and hooks. We will demonstrate coding examples for each topic and we will explain pros and cons of each implementation.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing and scale-out architecture to ensure compute resources grow with your dataset size, and columnar, direct-attached storage to dramatically reduce I/O time. Learn how top online retailer RetailMeNot moved their largest Vertica cluster on Amazon EC2 to Amazon Redshift. See how they gain insights from clickstream, location, merchant, marketing, and operational data across desktop and mobile properties.
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackShapeBlue
Adam's slides from his talk at the CloudStack European User group meetup, March 13, London. To provide tighter integration between the S3 compatible object store and CloudStack, Cloudian has developed a connector to allow users and their applications to utilize the object store directly from within the CloudStack platform in a single sign-on manner with self-service provisioning. Additionally, CloudStack templates and snapshots are centrally stored within the object store and managed through the CloudStack service. The object store offers protection of these templates and snapshots across data centres using replication or erasure coding.
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Igor Sfiligoi
The San Diego Supercomputer Center (SDSC) and the Wisconsin IceCube Particle Astrophysics Center (WIPAC) at the University of Wisconsin–Madison successfully completed a computational experiment as part of a multi-institution collaboration that marshalled all globally available for sale GPUs (graphics processing units) across Amazon Web Services, Microsoft Azure, and the Google Cloud Platform.
In all, some 51,500 GPU processors were used during the approximately 2-hour experiment conducted on November 16 and funded under a National Science Foundation EAGER grant.
The experiment – completed just prior to the opening of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19) in Denver, CO – was coordinated by Frank Würthwein, SDSC Lead for High-Throughput Computing, and Benedikt Riedel, Computing Manager for the IceCube Neutrino Observatory and Global Computing Coordinator at WIPAC. Igor Sfiligoi, SDSC’s lead scientific software developer for high-throughput computing, and David Schultz, a production software manager with IceCube, conducted the actual run.
This presentation was given at several booths during SC19 by Frank Würthwein.
Open Source Cloud Sync and Share software provides synchronisation layer on top of a variety of backend storages such as local filesystems and object storage. In case of some software stacks, such as ownCloud, a SQL database is used to support the synchronisation requirements.
We tested how different technology stacks impact the ownCloud HTTP-based Synchronisation Protocol. Efficiency and scalability analysis was performed based on benchmarking results. The results have been produced using the ClawIO framework prototype.
ClawIO is a Cloud Synchronisation Benchmarking Framework. The software provides a base
architecture to stress different storage solutions against different cloud synchronisation protocols.
Such architecture is based on the IETF Storage Sync draft specification and CERN EOS.
The synchronisation logic is divided into control and data servers.
This separation is done by the use of highly-decoupled micro services connected to each other using high performance communication protocols such a gRPC and HTTP/2.
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
OSDC 2016, Berlin: Talk by Florian Lautenschlager (@flolaut, Senior Software Engineer at QAware)
Abstract: How to store billions of time series points and access them within a few milliseconds? Chronix! Chronix is a young but mature open source project that allows one for example to store about 15 GB (csv) of time series in 238 MB with average query times of 21 ms. Chronix is built on top of Apache Solr a bulletproof distributed NoSQL database with impressive search capabilities. In this code-intense session we show how Chronix achieves its efficiency in both respects by means of an ideal chunking, by selecting the best compression technique, by enhancing the stored data with (pre-computed) attributes, and by specialized query functions.
Managing Data and Operation Distribution In MongoDBJason Terpko
In a sharded MongoDB cluster, scale and data distribution are defined by your shard keys. Even when choosing the correct shards key, ongoing maintenance and review can still be required to maintain optimal performance.
This presentation will review shard key selection and how the distribution of chunks can create scenarios where you may need to manually move, split, or merge chunks in your sharded cluster. Scenarios requiring these actions can exist with both optimal and sub-optimal shard keys. Example use cases will provide tips on selection of shard key, detecting an issue, reasons why you may encounter these scenarios, and specific steps you can take to rectify the issue.
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
We ran a 50k GPU multi-cloud simulation to support the IceCube science. This talk provided an overview of what happened to the associated data.
Presented at the Internet2 booth at SC19.
Object multifunctional indexing with an open API akvalex
NBITSearch is a search engine with an open API for local stations, LAN and Internet. Advantages over counterparts:
1. Object indexing. It allows to index objects S of any types T.
2. Multifunctional indexing. It allows to index objects simultaneously by a set of any functions F (S).
3. Very fast search. It allows to save time and money.
For IceCube, large amount of photon propagation simulation is needed to properly calibrate natural Ice. Simulation is compute intensive and ideal for GPU compute. This Cloud run was more data intensive than precious ones, producing 130 TB of output data. To keep egress costs in check, we created dedicated network links via the Internet2 Cloud Connect Service.
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...Igor Sfiligoi
NRP Engagement webinar: Description of the 380 PFLOP32S , 51k GPU multi-cloud burst using HTCondor to run IceCube photon propagation simulation.
Presented January 27th, 2020.
NBITSearch is a search engine with an open API for local stations, LAN and Internet. Advantages over counterparts:
1. Object indexing. It allows to index objects S of any types T.
2. Multifunctional indexing. It allows to index objects simultaneously by set any functions F (S).
3. Very fast search. It allows to save time and money.
A database trigger is a stored procedure that is executed when specific actions occur within a database. Triggers fit perfectly on a relational schema (foreign keys) and are implemented as a built-in functionality on popular relational database like MySQL.
MongoDB does not have any support for triggers, mainly due to the lack of support for foreign keys. Even if it usually considered an antipattern, there are use cases in MongoDB that benefit from a partially-relational schema. The lack of triggers is an obstacle for a partially-relational schema but there can be workarounds for simulating trigger behavior.
This presentation will guide you through different ways to implement triggers in MongoDB. We will cover the topics streams, tailable cursors, and hooks. We will demonstrate coding examples for each topic and we will explain pros and cons of each implementation.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing and scale-out architecture to ensure compute resources grow with your dataset size, and columnar, direct-attached storage to dramatically reduce I/O time. Learn how top online retailer RetailMeNot moved their largest Vertica cluster on Amazon EC2 to Amazon Redshift. See how they gain insights from clickstream, location, merchant, marketing, and operational data across desktop and mobile properties.
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackShapeBlue
Adam's slides from his talk at the CloudStack European User group meetup, March 13, London. To provide tighter integration between the S3 compatible object store and CloudStack, Cloudian has developed a connector to allow users and their applications to utilize the object store directly from within the CloudStack platform in a single sign-on manner with self-service provisioning. Additionally, CloudStack templates and snapshots are centrally stored within the object store and managed through the CloudStack service. The object store offers protection of these templates and snapshots across data centres using replication or erasure coding.
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
Watch full webinar here: https://bit.ly/3ohtRqm
Companies with corporate data lakes also need a strategy for how to best integrate them with their overall data fabric. To take full advantage of a data lake, data architects must determine what data belongs in the Lake vs. other sources, how end users are going to find and connect to the data they need as well as the best way to leverage the processing power of the data lake. This webinar will provide you with a deep dive look at how the Denodo Platform for data virtualization enables companies to maximize their investment in their corporate data lake.
Watch on-demand this webinar to learn:
- How to create a logical data fabric with Denodo
- How to leverage the a data lake for MPP Acceleration and Summary Views
- How to leverage Presto with Denodo for file based data lakes (ie. S3, ADLS, HDFS, etc.)
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Amazon Web Services
AWS gives designers of enterprise storage systems a completely new set of options. Aimed at enterprise storage specialists and managers of cloud-integration teams, this session gives you the tools and perspective to confidently integrate your storage workloads with AWS. We show working use cases, a thorough TCO model, and detailed customer blueprints. Throughout we analyze how data-tiering options measure up to the design criteria that matter most: performance, efficiency, cost, security, and integration.
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Big Data Spain
Session presented at Big Data Spain 2015 Conference
15th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmadigital.com
Abstract: http://www.bigdataspain.org/program/thu/slot-7.html
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low CostDatabricks
The Weather Company (TWC) collects weather data across the globe at the rate of 34 million records per hour, and the TWC History on Demand application serves that historical weather data to users via an API, averaging 600,000 requests per day. Users are increasingly consuming large quantities of historical data to train analytics models, and require efficient asynchronous APIs in addition to existing synchronous ones which use ElasticSearch. We present our architecture for asynchronous data retrieval and explain how we use Spark together with leading edge technologies to achieve an order of magnitude cost reduction while at the same time boosting performance by several orders of magnitude and tripling weather data coverage from land only to global.
In Data Engineer’s Lunch #23, Rahul Singh will be covering the topics of Thanos and Cortex.
Accompanying Blog: https://blog.anant.us/data-engineers-lunch-23-thanos-and-cortex/
Accompanying YouTube: https://youtu.be/uCD7cetQ8z4
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday:
https://www.meetup.com/Data-Wranglers-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Deploying prometheus is easy and running single instance can be sufficient for most deployments. We will talk about scalability limits of prometheus instance, when and how use shardIng, what is trickster and why you should use it, too and how thanos can help you when all hope is lost.
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2mAKgJi.
Ian Nowland and Joel Barciauskas talk about the challenges Datadog faces as the company has grown its real-time metrics systems that collect, process, and visualize data to the point they now handle trillions of points per day. They also talk about how the architecture has evolved, and what they are looking to in the future as they architect for a quadrillion points per day. Filmed at qconnewyork.com.
Ian Nowland is the VP Engineering Metrics and Alerting at Datadog. Joel Barciauskas currently leads Datadog's distribution metrics team, providing accurate, low latency percentile measures for customers across their infrastructure.
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...NETWAYS
The pg_stat_monitor is the statistics collection tool based on PostgreSQL’s contrib module pg_stat_statements. PostgreSQL’s pg_stat_statements provides only basic statistics, which is sometimes not enough. The major shortcoming in pg_stat_statements is that it accumulates all the queries and statistics, but does not provide aggregated statistics or histogram information. In this case, a user needs to calculate the aggregate, which is quite expensive. Pg_stat_monitor provides the pre-calculated aggregates. pg_stat_monitor collects and aggregates data on a bucket basis. The size and number of buckets should be configured using GUC (Grand Unified Configuration). The buckets are used to collect the statistics and aggregate them in a bucket. The talk will cover the usage of pg_stat_monitor and how it is better than pg_stat_statements.
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...SoftwareONEPresents
Slides from webinar demonstrating the disaster recovery and storage management capabilities of Microsoft Azure and StoreSimple.
The webinar was hosted on Friday 14th November 2014 and the recording can be viewed here:
http://1drv.ms/1vovwKF
Nesta apresentação mostramos como o popular software de backup NetBackup da Symantec se integra perfeitamente à nuvem AWS, fornecendo uma infraestrutura robusta e confiável para backup, tanto de servidores on-premises, como na nuvem, oferecendo uma solução completa para arquiteturas híbridas.
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...InfluxData
In this session, Tim will cover principles, learnings, and practical advice from operating multiple cloud services at scale, including of course our InfluxDB Cloud service. What do we monitor, what do we alert on, and how did we architect it all? What are our underlying architectural and operational principles?
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffTimescale
The earliest relational databases were monolithic on-premise systems that were powerful and full-featured. Fast forward to the Internet and NoSQL: BigTable, DynamoDB and Cassandra. These distributed systems were built to scale out for ballooning user bases and operations. As more and more companies vied to be the next Google, Amazon, or Facebook, they too "required" horizontal scalability.
But in a real way, NoSQL and even NewSQL have forgotten single node performance where scaling out isn't an option. And single node performance is important because it allows you to do more with much less. With a smaller footprint and simpler stack, overhead decreases and your application can still scale.
In this talk, we describe TimescaleDB's methods for single node performance. The nature of time-series workloads and how data is partitioned allows users to elastically scale up even on single machines, which provides operational ease and architectural simplicity, especially in cloud environments.
Learn about how to reduce public cloud storage costs on the AWS and Azure marketplaces with SoftNAS Senior Director of Product Marketing, John Bedrick.
Similar to ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019 (20)
ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps» GO DevOpsUA DevOps Conference
Free Online DevOps Conference 2020
ІЛЛЯ ЛУБЕНЕЦЬ
«DevSecOps наступний етап розвитку DevOps»
Сайт: www.devopsconf.org
Ми в
www.facebook.com/godevopsevent
www.t.me/GoDevOpsEvent
www.linkedin.com/showcase/go-devops
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...UA DevOps Conference
Free Online DevOps Conference 2020
АРТЕМ КОБРІН
«Achieve Networking at Scale with a Self-Service Network Solution for AWS»
Сайт: www.devopsconf.org
Ми в
www.facebook.com/godevopsevent
www.t.me/GoDevOpsEvent
www.linkedin.com/showcase/go-devops
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...UA DevOps Conference
Lviv DevOps Conference 2019
СТАНІСЛАВ КОЛЕНКІН
«Cilium – Network security for microservices. Let’s see how it works with Istio»
Телеграм канал: https://t.me/GoDevOpsEvent
Фейсбук сторінці: https://www.facebook.com/godevopsevent/
Сайт: https://devopsconf.org/
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
4. INTRODUCTION
Thanos is a set of components that can be composed into a highly available metric system with
unlimited storage capacity, which can be added seamlessly on top of existing Prometheus
deployments.
Curren release 0.5.0 is designed to store old metrics (which reached retention period on
Prometheus nodes) on some S3 like storage for long-term.
Collected metrics can be accessed for reviewing via Grafana. Prometheus query dashboard will
show only data stored on Prometheus instances.
VictoriaMetrics is fast, cost-effective and scalable time-series database. It can be used as long-
term remote storage for Prometheus. It uses own data compression, it allows to store more data
on the same disk size.
Cortex provides horizontally scalable, highly available, multi-tenant, long term storage for
Prometheus.
5. Prometheus
node
Monitored service 1
Monitored service 2
Monitored service ...
Monitored service N
Storage
Grafana
Long-Term Storage
DataSource 1
DataSource 2
Alerts
Alertmanager
Store data after retention is reached
6. Why do we need Long-Term storage:
To store a historical data about your workloads
To review an incidents
To plan a scaling based on seasonal load
To find a bottlenecks into infrastructure during continuous run/load
What solutions can be used for storing Long-Term historical timeseries:
Cortex, InfluxDB, Kafka, Graphite, …, Thanos, VictoriaMetrics *
* https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage
LONG-TERM STORAGE OVERVIEW
7. THANOS ARCHITECTURE
Prometheus POD
Prometheus Prometheus config reloader
Configmap reloader Thanos sidecar
Thanos query POD
Thanos query
Thanos compact
Thanos compact
Thanos Store Gateway POD
Thanos store gateway
9. Thanos query POD
Thanos query
Thanos Store Gateway POD
Thanos store gateway
Prometheus 2
Grafana or Thanos UI
Prometheus 1
Bucket
10. AVANTAGES AND DISAVANTAGES
- Infinity retention without reconfiguring
srorage
- Collected data is available even if
infrastucture recreated (data is into bucket)
- Global query view over data collected from
multiple Prometheus instances and bucket
- Horizontal scalability
- Metrics compaction
- Full monitoring stack
- Complicated infrastructure
11. HOW IT WAS TESTED
NODE_0
NODE_2
NODE...
NODE_498
NODE_499
METRIC_0
METRIC_1
METRIC_2
METRIC_...
METRIC_999
NODE_49
9
NODE_
4
500
NODES
1000 METRIC PER
NODE
each 15 seconds
12. 24 Hours
Scroll Bar (500 reporters)
500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points
17. MEMORY USAGE STABILIZATION ON CLUSTER NODES
SCRAPE DURATION
GKE CLUSTER DETAILS
pay attention on allocation )))
Between scrapes 30 sec, during this time we have 2 15-sec intervals,
So 4.37 sec prometheus needs to scrape 1 000 000 metrics
21. AVANTAGES AND DISAVANTAGES
- Infinity retention with reconfiguring storage
- Global query view over data collected from
storage
- Horizontal scalability
- Metrics compaction (multpile times better)
(floating to integer)
- Simple infrastructure
- No integration with Alert Manager
- Cloud storages are not supported yet
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/129
- More load on hosts
22.
23. 24 Hours
Scroll Bar (500 reporters)
500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points
24.
25.
26. THANOS VICTORIAMETRICS
- 12-15 GiB metrics per 1 day (2.88Bil)
- 16GiB memory used on nodes
- 2.1 – 2.4 CPU cores are used on nodes
- 2.8-3 GiB metrics per 1 day on each
storage(2.88Bil)
- 16GiB memory used on nodes
- 2.8 – 4 CPU cores are used on nodes
Storage price (Cloud Storage*):
15*365=5475 ~5500Gib
Storage total: $126.50 per month;
~$1500 in 1 year
* Based on retention we can move data to a
cold line storage class
Storage price (Persistent Disk Standard):
3*365=1095 ~1100Gib
$52.80 per month * Numer_of_Storages
Storage total: 52.8*3=158.4 per 1 month
* https://github.com/VictoriaMetrics/VictoriaMetrics/issues/134
If one of the storages lost – some part of data became unavailable
PRICE COMPARISON
27. Thanos Vicrotiametrics
Instance’s price 3* N-standard-4 4vCPU 15GB memory $97.49 monthly estimate
3*97.49=$292 Standard Provisioned Space: 1,500 GB - $60
CPU usage
Memory usage
50%
16GB
65%
16GB
Metrics per day 15GB 9GB
Metrics per minute 2 000 000 2 000 000
Metrics per one day 2 880 000 000 2 880 000 000
Scrape interval (1M metrics) 4.373 s 4.553 s
Historical data access 303-525 ms
(500 timeseies)
179-492 ms
(500 timeseies)
31. To produce downsampled data, the Compactor continuously aggregates series down to five
minute and one hour resolutions. For each raw chunk, encoded with TSDB’s XOR
compression, it stores different types of aggregations, e.g. min, max, or sum in a single block.
This allows Querier to automatically choose the aggregate that is appropriate for a given
PromQL query.
34. VM Gorilla compression analysis
The only problem is the result may exceed 64 bits — default integer size used in modern computers.
How to deal with it? Normalize the integer by dividing by 10^M where M is the minimum value that
allows fitting all the time series values into 64 bits and removing common trailing decimal zeros.