Migration strategies for a mission critical cluster

•Download as PPTX, PDF•

1 like•166 views

The document outlines a migration plan to improve the performance and scalability of an Elasticsearch cluster. The current cluster has performance issues due to a large inverted index, outdated software version, and lack of document purge policies. The plan involves defining requirements, measuring the new infrastructure needs, installing an updated version, defining index structures, performing a remote reindex to migrate data, and adding logic to avoid downtime during migration. The new cluster will have dedicated roles, monthly indices of optimal size, and policies to retain only one year of data.

whoami
Fram Souza
I’ve been working with IT for about 6 years;
I have a BsC in Computer Networks;
I’ve been working with Elasticsearch for about 2 years;
Currenty, I’m IT Specialist at Nextel;
Migration strategies for a mission-critical
cluster

Environment
● version 2.4.6
● 5 instances (master, node, ingest)
● 2 availability zones to distribute a shards
● using lambda to ingest document
● cluster state very large
● only just a big index (2.4 TB)

The problems
● inverted index very big
○ response time very high
● late version
○ important to keep the environment in a current version
● It wasn’t created any purge policy
○ There are documents that aren’t necessary
● A instances haven’t dedicated roles
○ This is bad for this environment

Two major problems
● Shards very large
sds
● Many rejected search

The plan
1. Step 01 [ Definition ]
● Define business requirement
○ Define purge policies (keep only one year documents)
● Understand all process (insert and search)
1. Step 02 [ Definition ]
● measure a size for the new cluster (elastic rally)
○ measure search/index rate
○ load test
○ measure infrastructure requirement
● install and configure a environment (v. 6.3) (Infrastructure as a code +
automation)

The plan
1. Step 03 [ Implementation ]
● Define indice structure
○ one index by month
● Create templates/mappings into new cluster
● Define plan for monitoring/alert
1. Step 04 [ Implementation ]
● Remote reindex (with query)
1. Step 05 [ Implementation to avoid a downtime ]
● Add two outputs in lambda (old and new cluster) and add logstash at
Stack

Cluster details
● version 6.3
● x-pack basic
● dedicated role
○ 1 master (why just one master?) / 6 dados ( 500GB disk / 32GB RAM / 16GB
HEAP / 16cores ) / 1 logstash
● 6 shards and 1 replica
● A index by month
○ index: 100GB/month
■ shard: 16GB/shard
● Keep one year of data
○ in 12 months:
■ 2.4TB data
■ 144 total shards

Cluster advanced details
● dynamic template
● alias
● API Shrink
● performance query using filters
● curator
● distributed awareness
● change watermark to 98% (2.94 TB)
● interval index refresh (30s)
● persistent queue logstash
● send notifications

example - persistent queue
● When an input has events ready to process, it writes them to the queue;
● When the write to the queue is successful, the input can send an acknowledgement to its data
source;
● An event is recorded as processed if, and only if, the event has been processed completely
by the Logstash pipeline.
ACK
ok
ACK - event completed

improvements
● disaster recovery plan
● coordinators node
● kafka / kinesis
● upgrade x-pack license
long-term planning
● tunning pages queue logstash
● control circuit breaker (why?)
● hot/warm

Finish :D
Contact: fram.souza14@gmail.com
LinkedIn: https://www.linkedin.com/in/francismarasouza/

Netflix’s Big Data Platform team manages data warehouse in Amazon S3 with over 60 petabytes of data and writes hundreds of terabytes of data every day. With a data warehouse at this scale, it is a constant challenge to keep improving performance. This talk will focus on Iceberg, a new table metadata format that is designed for managing huge tables backed by S3 storage. Iceberg decreases job planning time from minutes to under a second, while also isolating reads from writes to guarantee jobs always use consistent table snapshots. In this session, you'll learn: • Some background about big data at Netflix • Why Iceberg is needed and the drawbacks of the current tables used by Spark and Hive • How Iceberg maintains table metadata to make queries fast and reliable • The benefits of Iceberg's design and how it is changing the way Netflix manages its data warehouse • How you can get started using Iceberg Speaker Ryan Blue, Software Engineer, Netflix

PTD and beyond

Johan Gustavsson

The Dark Side Of Go -- Go runtime related problems in TiDB in production

PingCAP

Golang in TiDB (GopherChina 2017)

PingCAP

This is the speech Shen Li gave at GopherChina 2017. TiDB is an open source distributed database. Inspired by the design of Google F1/Spanner, TiDB features in infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for data storage and analysis. In this talk, we will mainly cover the following topics: - What is TiDB - TiDB Architecture - SQL Layer Internal - Golang in TiDB - Next Step of TiDB

Presto Summit 2018 - 09 - Netflix Iceberg

kbajda

The evolution of Netflix's S3 data warehouse (Strata NY 2018)

Ryan Blue

In the last few years, Netflix’s S3 data warehouse has grown to more than 100 PB. In that time, the company has shared several techniques and released open source tools for working around S3’s quirks, including s3mper to work around eventual consistency, S3 multipart committers to commit data without renames, and the batchid pattern for cross-partition atomic commits. Ryan Blue and Daniel Weeks share lessons learned, the tools Netflix currently uses and those it has retired, and the improvements it is rolling out, including Iceberg, a new table format for S3 that is replacing many of the company’s current tools. Iceberg enables a new generation of improvements, including: * Snapshot isolation with no directory listing or file renames * Distributed planning to relieve metastore bottlenecks * Improved data layout for S3 performance * Immediately available writes from streaming applications * Opportunistic compaction and data optimization

Rust in TiKV

PingCAP

A Brief Introduction of TiDB (Percona Live)

PingCAP

This is the speech Max Liu gave at Percona Live Open Source Database Conference 2016. Max Liu: Co-founder and CEO, a hacker with a free soul The slide covered the following topics: - Why another database? - What kind of database we want to build? - How to design such a database, including the principles, the architecture, and design decisions? - How to develop such a database, including the architecture and the core technologies for TiKV and TiDB? - How to test the database to ensure the quality and stability?

Big data processing systems research

Vasia Kalavri

TiDB for Big Data

PingCAP

Shen Li, VP engineering at PingCAP, shares the slides about TiDB with the Big Data Ecosystem. Enjoy~ TiDB, an open source distributed HTAP database. Inspired by Google Spanner/F1, PingCAP develops TiDB, an open source distributed Hybrid Transactional/Analytical Processing (HTAP) database. TiDB features infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for online transactions and analysis.

m2r2: A Framework for Results Materialization and Reuse

Vasia Kalavri

TiDB as an HTAP Database

PingCAP

This is the speech Shen Li gave at Cloud Connect Event Shanghai·China 2017. TiDB is an open source distributed database. Inspired by the design of Google F1/Spanner, TiDB features in infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for data storage and analysis. In this talk, we will mainly cover the following topics: (1) The overall architecture of TiDB and implementation details (2) How TiDB stores large volumes of data and empowers computation (3) How TiDB embraces the big data ecosystem and reduces the cost of big data analysis and the user threshold

Building a transactional key-value store that scales to 100+ nodes (percona l...

PingCAP

Scale Relational Database with NewSQL

PingCAP

TiDB Introduction

Morgan Tocker

Block Sampling: Efficient Accurate Online Aggregation in MapReduce

Vasia Kalavri

MapReduce: Optimizations, Limitations, and Open Issues

Vasia Kalavri

Geo data analytics

Daniel Marcous

Why Spark for large scale data analysis

Nithish Sankaranarayanan

Nikhil summer internship 2016

Nikhil Shekhar

Apache flink

pranay kumar

Production-Ready BIG ML Workflows - from zero to hero

Daniel Marcous

Data science isn't an easy task to pull of. You start with exploring data and experimenting with models. Finally, you find some amazing insight! What now? How do you transform a little experiment to a production ready workflow? Better yet, how do you scale it from a small sample in R/Python to TBs of production data? Building a BIG ML Workflow - from zero to hero, is about the work process you need to take in order to have a production ready workflow up and running. Covering : * Small - Medium experimentation (R) * Big data implementation (Spark Mllib /+ pipeline) * Setting Metrics and checks in place * Ad hoc querying and exploring your results (Zeppelin) * Pain points & Lessons learned the hard way (is there any other way?)

Open stack @ iiit hyderabad openstackindia

Data pipelines from zero to solid

Lars Albertsson

This presentation is an attempt do demystify the practice of building reliable data processing pipelines. We go through the necessary pieces needed to build a stable processing platform: data ingestion, processing engines, workflow management, schemas, and pipeline development processes. The presentation also includes component choice considerations and recommendations, as well as best practices and pitfalls to avoid, most learnt through expensive mistakes.

Towards Data Operations

Andrea Monacchi

Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview

Apache Flink Taiwan User Group

Elasticsearch as a time series database

felixbarny

AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...

Amazon Web Services

At Librato, a Solarwinds company, we run hundreds of Cassandra instances across multiple rings and use it as our primary data store. In the past year, we embarked on a process to upgrade our fleet of Cassandra Amazon EC2 instances from instance store to instances using Amazon EBS and attached elastic network interfaces (ENIs). We find running Cassandra on EBS gives us the flexibility to choose the best instances for the best performance of our workload while saving us significant costs on infrastructure. In this session, we discuss how Librato operates Cassandra on EBS. Topics include how we chose the right instance for our workload, use detached EBS volumes and ENI mobility to reduce MTTR, use mixed EBS storage types for the best cost/performance tradeoff, debug performance issues, and continuously monitor Cassandra to get the most from AWS. We also look at performance tradeoffs made in the implementation of storage engines of large data systems like Cassandra.

Batch Processing at Scale with Flink & Iceberg

Flink Forward

Flink Forward San Francisco 2022. Goldman Sachs's Data Lake platform serves as the firm's centralized data platform, ingesting 140K (and growing!) batches per day of Datasets of varying shape and size. Powered by Flink and using metadata configured by platform users, ingestion applications are generated dynamically at runtime to extract, transform, and load data into centralized storage where it is then exported to warehousing solutions such as Sybase IQ, Snowflake, and Amazon Redshift. Data Latency is one of many key considerations as producers and consumers have their own commitments to satisfy. Consumers range from people/systems issuing queries, to applications using engines like Spark, Hive, and Presto to transform data into refined Datasets. Apache Iceberg allows our applications to not only benefit from consistency guarantees important when running on eventually consistent storage like S3, but also allows us the opportunity to improve our batch processing patterns with its scalability-focused features. by Andreas Hailu

What's hot

How to build TiDB

PingCAP

Big data processing systems research

Vasia Kalavri

TiDB for Big Data

PingCAP

m2r2: A Framework for Results Materialization and Reuse

Vasia Kalavri

TiDB as an HTAP Database

PingCAP

Building a transactional key-value store that scales to 100+ nodes (percona l...

PingCAP

Scale Relational Database with NewSQL

PingCAP

TiDB Introduction

Morgan Tocker

Block Sampling: Efficient Accurate Online Aggregation in MapReduce

Vasia Kalavri

MapReduce: Optimizations, Limitations, and Open Issues

Vasia Kalavri

Geo data analytics

Daniel Marcous

Why Spark for large scale data analysis

Nithish Sankaranarayanan

Nikhil summer internship 2016

Nikhil Shekhar

Apache flink

pranay kumar

Production-Ready BIG ML Workflows - from zero to hero

Daniel Marcous

Open stack @ iiit hyderabad openstackindia

Data pipelines from zero to solid

Lars Albertsson

Towards Data Operations

Andrea Monacchi

Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview

Apache Flink Taiwan User Group

Elasticsearch as a time series database

felixbarny

What's hot (20)

How to build TiDB

Big data processing systems research

TiDB for Big Data

m2r2: A Framework for Results Materialization and Reuse

TiDB as an HTAP Database

Building a transactional key-value store that scales to 100+ nodes (percona l...

Scale Relational Database with NewSQL

TiDB Introduction

Block Sampling: Efficient Accurate Online Aggregation in MapReduce

MapReduce: Optimizations, Limitations, and Open Issues

Geo data analytics

Why Spark for large scale data analysis

Nikhil summer internship 2016

Apache flink

Production-Ready BIG ML Workflows - from zero to hero

Open stack @ iiit hyderabad

Data pipelines from zero to solid

Towards Data Operations

Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview

Elasticsearch as a time series database

Similar to Migration strategies for a mission critical cluster

AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...

Amazon Web Services

Batch Processing at Scale with Flink & Iceberg

Flink Forward

Logs @ OVHcloud

OVHcloud

Logging at OVHcloud : Logs Data platform est la plateforme de collecte, d'analyse et de gestion centralisée de logs d'OVHcloud. Cette plateforme a pour but de répondre aux challenges que constitue l'indexation de plus de 4000 milliards de logs par une entreprise comme OVHcloud. Cette présentation vous décrira l'architecture générale de Logs Data Platform autour de ses composants centraux Elasticsearch et Graylog et vous décrira les différentes problématiques de scalabilité, disponibilité, performance et d'évolutivité qui sont le quotidien de l'équipe Observability à OVHcloud.

Apache Cassandra at Target - Cassandra Summit 2014

Dan Cundiff

Our journey with druid - from initial research to full production scale

Itai Yaffe

Here at the Nielsen Marketing Cloud we use druid.io (http://druid.io/) as one of our main data stores, both for simple counts and for approximate count-distinct (DataSketches). It’s been more than a year since we started using it, injecting billions of events each day to multiple druid clusters for different use-cases. In this meet-up, we will share our journey, the challenges we had, the way we overcame them (at least most of them) and the steps we made to optimize the process around Druid to keep the solution cost effective. Before diving into Druid, we will briefly present our data pipeline architecture, starting from the front-end serving system, deployed in number of geo-locations, to a centralized Kafka cluster in the cloud, and give some examples of the different processes that consume from Kafka and feed our different data sources.

Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas

Flink Forward

Keystone Data Pipeline manages several thousand Flink pipelines, with variable workloads. These pipelines are simple routers which consume from Kafka and write to one of three sinks. In order to alleviate our operational overhead, we’ve implemented autoscaling for our routers. Autoscaling has reduced our resource usage by 25% - 45% (varying by region and time), and has reduced our on call burden. This talk will take an in depth look at the mathematics, algorithms, and infrastructure details for implementing autoscaling of simple pipelines at scale. It will also discuss future work for autoscaling complex pipelines.

Dynamics CRM high volume systems - lessons from the field

Stéphane Dorrekens

Scaling Monitoring At Databricks From Prometheus to M3

LibbySchulze

Webinar: Building a multi-cloud Kubernetes storage on GitLab

MayaData Inc

Ledingkart Meetup #2: Scaling Search @Lendingkart

Mukesh Singh

Piano Media - approach to data gathering and processing

MartinStrycek

Big data real time architectures

Daniel Marcous

QuestDB: ingesting a million time series per second on a single instance. Big...

javier ramirez

Security sizing meetup

Daliya Spasova

Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...

Hernan Costante

Nowadays in an increasingly more complex and dynamic network its not enough to be a regex ninja and storing only the logs you think you might need. From network traffic to custom logs you won't know which logs will be crucial to stop the next attacker, and if you are not planning to spend a half of your security budget in a commercial solution we will show you a way to building you own SIEM with open source. The talk will go from how to build a powerful logging environment for your organization to scaling on the cloud and storing everything forever. We will walk through how to build such a system with open source solutions as Elasticsearch and Hadoop, and creating your own custom monitoring rules to monitor everything you need. The talk will also include how to secure the environment and allow restricted access to other teams as well as avoiding common pitfalls and ensuring compliance standards.

Gluster overview & future directions vault 2015

Vijay Bellur

How to Design for Database High Availability

EDB

Highly available databases are essential to organizations depending on mission-critical, 24/7 access to data. Postgres is widely recognized as an excellent open-source database, with critical maturity and features that allow organizations to scale and achieve high availability. This webinar will explore: - Evolution of replication in Postgres - Streaming replication - Logical replication - Replication for high availability - Important high availability parameters - Options to monitor high availability - HA infrastructure to patch the database with minimal downtime - EDB Postgres Failover Manager (EFM) - EDB tools to create a highly available Postgres architecture

Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka

confluent

The number of deployments of Apache Kafka at enterprise scale has greatly increased in the years since Kafka’s original development in 2010. Along with this rapid growth has come a wide variety of use cases and deployment strategies that transcend what Kafka’s creators imagined when they originally developed the technology. As the scope and reach of streaming data platforms based on Apache Kafka has grown, the need to understand monitoring and troubleshooting strategies has as well. Dustin Cote and Ryan Pridgeon share their experience supporting Apache Kafka at enterprise-scale and explore monitoring and troubleshooting techniques to help you avoid pitfalls when scaling large-scale Kafka deployments. Topics include: - Effective use of JMX for Kafka - Tools for preventing small problems from becoming big ones - Efficient architectures proven in the wild - Finding and storing the right information when it all goes wrong Visit www.confluent.io for more information.

[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...

Anna Ossowski

Introduction to Big Data

Mike Frampton

Similar to Migration strategies for a mission critical cluster (20)

AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...

Batch Processing at Scale with Flink & Iceberg

Logs @ OVHcloud

Apache Cassandra at Target - Cassandra Summit 2014

Our journey with druid - from initial research to full production scale

Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas

Dynamics CRM high volume systems - lessons from the field

Scaling Monitoring At Databricks From Prometheus to M3

Webinar: Building a multi-cloud Kubernetes storage on GitLab

Ledingkart Meetup #2: Scaling Search @Lendingkart

Piano Media - approach to data gathering and processing

Big data real time architectures

QuestDB: ingesting a million time series per second on a single instance. Big...

Security sizing meetup

Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...

Gluster overview & future directions vault 2015

How to Design for Database High Availability

Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka

[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...

Introduction to Big Data

Recently uploaded

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Product School

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

Essentials of Automations: Optimizing FME Workflows with Parameters

Safe Software

Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place. Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects. Here’s what you’ll gain: - Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows. - Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy. - Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency. - Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity. We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic. Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

FIDO Alliance Osaka Seminar: Overview.pdf

FIDO Alliance

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

Elevating Tactical DDD Patterns Through Object Calisthenics

Dorra BARTAGUIZ

After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!

JMeter webinar - integration with InfluxDB and Grafana

RTTS

Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application. In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics. Length: 30 minutes Session Overview ------------------------------------------- During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana: - What out-of-the-box solutions are available for real-time monitoring JMeter tests? - What are the benefits of integrating InfluxDB and Grafana into the load testing stack? - Which features are provided by Grafana? - Demonstration of InfluxDB and Grafana using a practice web application To view the webinar recording, go to: https://www.rttsweb.com/jmeter-integration-webinar

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Paul Groth

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Product School

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Accelerate your Kubernetes clusters with Varnish Caching

Thijs Feryn

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Jeffrey Haguewood

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams. Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

PCI PIN Basics Webinar from the Controlcase Team

ControlCase

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Product School

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Product School

Recently uploaded (20)

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Assuring Contact Center Experiences for Your Customers With ThousandEyes

Essentials of Automations: Optimizing FME Workflows with Parameters

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

FIDO Alliance Osaka Seminar: Overview.pdf

Leading Change strategies and insights for effective change management pdf 1.pdf

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

Elevating Tactical DDD Patterns Through Object Calisthenics

JMeter webinar - integration with InfluxDB and Grafana

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

GraphRAG is All You need? LLM & Knowledge Graph

Accelerate your Kubernetes clusters with Varnish Caching

Neuro-symbolic is not enough, we need neuro-*semantic*

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Securing your Kubernetes cluster_ a step-by-step guide to success !

PCI PIN Basics Webinar from the Controlcase Team

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Migration strategies for a mission critical cluster

1. whoami Fram Souza I’ve been working with IT for about 6 years; I have a BsC in Computer Networks; I’ve been working with Elasticsearch for about 2 years; Currenty, I’m IT Specialist at Nextel; Migration strategies for a mission-critical cluster

2. Current architecture

3. Environment ● version 2.4.6 ● 5 instances (master, node, ingest) ● 2 availability zones to distribute a shards ● using lambda to ingest document ● cluster state very large ● only just a big index (2.4 TB)

4. The problems ● inverted index very big ○ response time very high ● late version ○ important to keep the environment in a current version ● It wasn’t created any purge policy ○ There are documents that aren’t necessary ● A instances haven’t dedicated roles ○ This is bad for this environment

5. Two major problems ● Shards very large sds ● Many rejected search

6. Requirement no downtime.

7. The plan 1. Step 01 [ Definition ] ● Define business requirement ○ Define purge policies (keep only one year documents) ● Understand all process (insert and search) 1. Step 02 [ Definition ] ● measure a size for the new cluster (elastic rally) ○ measure search/index rate ○ load test ○ measure infrastructure requirement ● install and configure a environment (v. 6.3) (Infrastructure as a code + automation)

8. The plan 1. Step 03 [ Implementation ] ● Define indice structure ○ one index by month ● Create templates/mappings into new cluster ● Define plan for monitoring/alert 1. Step 04 [ Implementation ] ● Remote reindex (with query) 1. Step 05 [ Implementation to avoid a downtime ] ● Add two outputs in lambda (old and new cluster) and add logstash at Stack

9. Avoid a downtime

10. New logic structure

11. Cluster details ● version 6.3 ● x-pack basic ● dedicated role ○ 1 master (why just one master?) / 6 dados ( 500GB disk / 32GB RAM / 16GB HEAP / 16cores ) / 1 logstash ● 6 shards and 1 replica ● A index by month ○ index: 100GB/month ■ shard: 16GB/shard ● Keep one year of data ○ in 12 months: ■ 2.4TB data ■ 144 total shards

12. Cluster advanced details ● dynamic template ● alias ● API Shrink ● performance query using filters ● curator ● distributed awareness ● change watermark to 98% (2.94 TB) ● interval index refresh (30s) ● persistent queue logstash ● send notifications

13. example- dynamic template

14. example - awareness

15. example - persistent queue ● When an input has events ready to process, it writes them to the queue; ● When the write to the queue is successful, the input can send an acknowledgement to its data source; ● An event is recorded as processed if, and only if, the event has been processed completely by the Logstash pipeline. ACK ok ACK - event completed

16. example - notifications

17. improvements ● disaster recovery plan ● coordinators node ● kafka / kinesis ● upgrade x-pack license long-term planning ● tunning pages queue logstash ● control circuit breaker (why?) ● hot/warm

18. structure .v2

19. Finish :D Contact: fram.souza14@gmail.com LinkedIn: https://www.linkedin.com/in/francismarasouza/

Migration strategies for a mission critical cluster

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Migration strategies for a mission critical cluster

Similar to Migration strategies for a mission critical cluster (20)

Recently uploaded

Recently uploaded (20)

Migration strategies for a mission critical cluster