This document provides an overview of OpenSearch, including what it is, its benefits and uses, how to use it, and its key features. OpenSearch is an open source search and analytics engine. It was created as a fork of Elasticsearch 7.10.2 and is powered by the Apache Lucene library. The document discusses how to migrate from Elasticsearch to OpenSearch using various approaches like snapshots, rolling upgrades, or full cluster restarts. It also covers OpenSearch concepts like clusters, nodes, mappings, aggregations, data streams, and plugins.
OpenSearch is a community-driven, open source search and analytics suite derived from Elasticsearch 7.10.2 and Kibana 7.10.2. It was initiated by AWS and consists of the OpenSearch search engine and OpenSearch Dashboards visualization interface. OpenSearch aims to provide a true open source search and analytics engine following licensing changes by Elastic that removed Elasticsearch's open source status.
OpenSearch is a specification that describes how to build search services for the open web. It uses XML to describe search services and publishes search results in RSS or Atom formats. Key features include autodiscovery of services, search result descriptions, and URL templates that clients use to compose search requests to the service. Many websites and browsers support OpenSearch to allow discovery and use of new search services.
The document provides an overview of Red Hat OpenShift Container Platform, including:
- OpenShift provides a fully automated Kubernetes container platform for any infrastructure.
- It offers integrated services like monitoring, logging, routing, and a container registry out of the box.
- The architecture runs everything in pods on worker nodes, with masters managing the control plane using Kubernetes APIs and OpenShift services.
- Key concepts include pods, services, routes, projects, configs and secrets that enable application deployment and management.
Training for AWS Solutions Architect at http://zekelabs.com/courses/amazon-web-services-training-bangalore/.This slide describes about features of EC2, EC2 Options, family type, storage, EBS Volumes, EC2 Instance Store, Security Groups, Volumes and Snapshots, Amazon Machine Image (AMI), Elastic load balancer, Classic load balancer, Application load balancer, Network load balancer, AWS CLI and EC2 Metadata
___________________________________________________
zekeLabs is a Technology training platform. We provide instructor led corporate training and classroom training on Industry relevant Cutting Edge Technologies like Big Data, Machine Learning, Natural Language Processing, Artificial Intelligence, Data Science, Amazon Web Services, DevOps, Cloud Computing and Frameworks like Django,Spring, Ruby on Rails, Angular 2 and many more to Professionals.
Reach out to us at www.zekelabs.com or call us at +91 8095465880 or drop a mail at info@zekelabs.com
This document summarizes Fernando Rodriguez Olivera's presentation on Amazon Kinesis. Kinesis is a real-time data streaming service that can ingest large amounts of data from distributed producers. It partitions ingested data into shards that can be processed in parallel by consumer applications. The document outlines how to use the Kinesis APIs and SDKs to produce and consume data streams, and how the Kinesis Client Library (KCL) can help balance processing across consumer nodes.
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...Data Con LA
Seth Muthukaruppan, Consultant at Instacluster
Data Engineering
OpenSearch is an incredibly powerful search engine and analytics suite for ingesting, searching, visualizing, and analyzing your data and it is fully open source. This Apache 2.0-licensed and community-driven collection of technologies harnesses an architecture that combines the powers of Elasticsearch 7.10.2, Kibana 7.10.2 and Apache Lucene. With OpenSearch, users gain a distributed framework featuring particularly powerful scalability, high availability, and database-like capabilities. Attendees at this DataCon LA presentation will come away understanding OpenSearch's architecture and its building-block technology components, including: -- Apache Lucene utilization. Learn how this high-performance Java-based search library utilizes Lucene's inverted search index to delivers incredibly fast search results (while supporting natural language, wildcard, fuzzy, and proximity searches). -- OpenSearch cluster architecture. An OpenSearch cluster is a distributed and horizontally-scalable collection of nodes, which are differentiated based on the operations they perform. Attendees will learn the specific functions of master, master-eligible, data, client, ingest nodes. -- Data organization. Understand how OpenSearch organizes data into indices (which contain documents, which contain fields). -- Internal data structures. Get an in-depth look at how OpenSearch achieves scalability and reliability by breaking up indices into shards and segments, and utilizes translogs. -- Aggregations. See how OpenSearch enables its advanced built-in analytics capabilities through the power of aggregations.
Kafka is becoming an ever more popular choice for users to help enable fast data and Streaming. Kafka provides a wide landscape of configuration to allow you to tweak its performance profile. Understanding the internals of Kafka is critical for picking your ideal configuration. Depending on your use case and data needs, different settings will perform very differently. Lets walk through performance essentials of Kafka. Let's talk about how your Consumer configuration, can speed up or slow down the flow of messages to Brokers. Lets talk about message keys, their implications and their impact on partition performance. Lets talk about how to figure out how many partitions and how many Brokers you should have. Let's discuss consumers and what effects their performance. How do you combine all of these choices and develop the best strategy moving forward? How do you test performance of Kafka? I will attempt a live demo with the help of Zeppelin to show in real time how to tune for performance.
Learn tips and techniques that will improve the performance of your applications and databases running on Amazon EC2 instance storage and/or Amazon Elastic Block Store (EBS). This advanced session discusses when to use HI1, HS1, and Amazon EBS. We will share an "under the hood" view to tune the performance of your Elastic Block Store and best practices for running workloads on Amazon EBS, such as relational databases (MySQL, Oracle, SQL Server, Postgres) and NoSQL data stores, such as MongoDB and Riak.
OpenSearch is a community-driven, open source search and analytics suite derived from Elasticsearch 7.10.2 and Kibana 7.10.2. It was initiated by AWS and consists of the OpenSearch search engine and OpenSearch Dashboards visualization interface. OpenSearch aims to provide a true open source search and analytics engine following licensing changes by Elastic that removed Elasticsearch's open source status.
OpenSearch is a specification that describes how to build search services for the open web. It uses XML to describe search services and publishes search results in RSS or Atom formats. Key features include autodiscovery of services, search result descriptions, and URL templates that clients use to compose search requests to the service. Many websites and browsers support OpenSearch to allow discovery and use of new search services.
The document provides an overview of Red Hat OpenShift Container Platform, including:
- OpenShift provides a fully automated Kubernetes container platform for any infrastructure.
- It offers integrated services like monitoring, logging, routing, and a container registry out of the box.
- The architecture runs everything in pods on worker nodes, with masters managing the control plane using Kubernetes APIs and OpenShift services.
- Key concepts include pods, services, routes, projects, configs and secrets that enable application deployment and management.
Training for AWS Solutions Architect at http://zekelabs.com/courses/amazon-web-services-training-bangalore/.This slide describes about features of EC2, EC2 Options, family type, storage, EBS Volumes, EC2 Instance Store, Security Groups, Volumes and Snapshots, Amazon Machine Image (AMI), Elastic load balancer, Classic load balancer, Application load balancer, Network load balancer, AWS CLI and EC2 Metadata
___________________________________________________
zekeLabs is a Technology training platform. We provide instructor led corporate training and classroom training on Industry relevant Cutting Edge Technologies like Big Data, Machine Learning, Natural Language Processing, Artificial Intelligence, Data Science, Amazon Web Services, DevOps, Cloud Computing and Frameworks like Django,Spring, Ruby on Rails, Angular 2 and many more to Professionals.
Reach out to us at www.zekelabs.com or call us at +91 8095465880 or drop a mail at info@zekelabs.com
This document summarizes Fernando Rodriguez Olivera's presentation on Amazon Kinesis. Kinesis is a real-time data streaming service that can ingest large amounts of data from distributed producers. It partitions ingested data into shards that can be processed in parallel by consumer applications. The document outlines how to use the Kinesis APIs and SDKs to produce and consume data streams, and how the Kinesis Client Library (KCL) can help balance processing across consumer nodes.
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...Data Con LA
Seth Muthukaruppan, Consultant at Instacluster
Data Engineering
OpenSearch is an incredibly powerful search engine and analytics suite for ingesting, searching, visualizing, and analyzing your data and it is fully open source. This Apache 2.0-licensed and community-driven collection of technologies harnesses an architecture that combines the powers of Elasticsearch 7.10.2, Kibana 7.10.2 and Apache Lucene. With OpenSearch, users gain a distributed framework featuring particularly powerful scalability, high availability, and database-like capabilities. Attendees at this DataCon LA presentation will come away understanding OpenSearch's architecture and its building-block technology components, including: -- Apache Lucene utilization. Learn how this high-performance Java-based search library utilizes Lucene's inverted search index to delivers incredibly fast search results (while supporting natural language, wildcard, fuzzy, and proximity searches). -- OpenSearch cluster architecture. An OpenSearch cluster is a distributed and horizontally-scalable collection of nodes, which are differentiated based on the operations they perform. Attendees will learn the specific functions of master, master-eligible, data, client, ingest nodes. -- Data organization. Understand how OpenSearch organizes data into indices (which contain documents, which contain fields). -- Internal data structures. Get an in-depth look at how OpenSearch achieves scalability and reliability by breaking up indices into shards and segments, and utilizes translogs. -- Aggregations. See how OpenSearch enables its advanced built-in analytics capabilities through the power of aggregations.
Kafka is becoming an ever more popular choice for users to help enable fast data and Streaming. Kafka provides a wide landscape of configuration to allow you to tweak its performance profile. Understanding the internals of Kafka is critical for picking your ideal configuration. Depending on your use case and data needs, different settings will perform very differently. Lets walk through performance essentials of Kafka. Let's talk about how your Consumer configuration, can speed up or slow down the flow of messages to Brokers. Lets talk about message keys, their implications and their impact on partition performance. Lets talk about how to figure out how many partitions and how many Brokers you should have. Let's discuss consumers and what effects their performance. How do you combine all of these choices and develop the best strategy moving forward? How do you test performance of Kafka? I will attempt a live demo with the help of Zeppelin to show in real time how to tune for performance.
Learn tips and techniques that will improve the performance of your applications and databases running on Amazon EC2 instance storage and/or Amazon Elastic Block Store (EBS). This advanced session discusses when to use HI1, HS1, and Amazon EBS. We will share an "under the hood" view to tune the performance of your Elastic Block Store and best practices for running workloads on Amazon EBS, such as relational databases (MySQL, Oracle, SQL Server, Postgres) and NoSQL data stores, such as MongoDB and Riak.
Intro to open source observability with grafana, prometheus, loki, and tempo(...LibbySchulze
This document provides an introduction to open source observability tools including Grafana, Prometheus, Loki, and Tempo. It summarizes each tool and how they work together. Prometheus is introduced as a time series database that collects metrics. Loki is described as a log aggregation system that handles logs at scale without high costs. Tempo is explained as a tracing system that allows tracing from logs, metrics, and between services. The document emphasizes that these tools can be run together to gain observability across an entire system from logs to metrics to traces.
Cloud Storage Comparison: AWS vs Azure vs Google vs IBMRightScale
The document provides a comparison of cloud storage options across AWS, Azure, Google, and IBM. It summarizes the key services and features of block/disk storage, object storage, and file storage for each cloud. It includes details on pricing, performance characteristics, replication, availability, and encryption capabilities. Example scenarios are also provided to illustrate the monthly costs for common configurations on each cloud platform.
This document summarizes Netflix's global cloud edge architecture. Key points include:
- Netflix uses edge services and a global cloud infrastructure to deliver content to over 1000 device types in over 40 countries.
- Zuul is an open source framework that Netflix uses for dynamic routing, authentication, testing, and security across its edge services.
- Netflix's edge scripting tier allows device teams to rapidly deploy scripts that control endpoints, content formatting, and APIs for different devices.
- RxJava and Hystrix help make the edge service API asynchronous, fault tolerant, and able to handle high concurrency.
- Netflix's delivery pipeline uses techniques like canary analysis, debugging, and load testing to continuously and automatically deploy changes
This document provides release notes for OpenShift Container Platform 4.12. It includes information on new features such as default consoles determined by installation platform for RHCOS nodes, IBM Secure Execution technology preview for s390x architecture, and RHCOS now using RHEL 8.6 packages. It also covers installation and upgrade topics like specifying load balancer type in AWS and extending worker nodes to AWS Local Zones. Additional sections cover security updates, components, features, known issues and more.
Hybrid cloud : why and how to connect your datacenters to OVHcloud ? OVHcloud
Across our products or between OVHcloud and your own datacenters, Oliver Bédouet will detail network architectures you can build and their advantages, from vRack to OVHcloud Connect.
by Darin Briskman, Technical Evangelist, AWS
Elasticsearch is the most popular open-source search and analytics engine - it's easy to use, but not always easy to configure an manage. Learn about Amazon's fully managed service that provides easier deployment, operation, and scale for Elasticsearch. Level: 200
What is Yaml:
Human friendly, cross language, Unicode based data serialization language.
Pronounced in such a way as to rhyme with “camel”
Acronym for
YAML
Ain’t
A language used to convert or represent structured data or objects as a series of characters that can be stored on a disk.
Examples:
CSV – Comma separated values
XML – Extensible markup language
JSON – JavaScript object notation
YAML – YAML ain’t markup language
Markup
Language
** DevOps Training: https://www.edureka.co/devops **
This Edureka tutorial on "Jenkins pipeline Tutorial" will help you understand the basic concepts of a Jenkins pipeline. Below are the topics covered in the tutorial:
1. The need for Continuous Delivery
2. What is Continuous Delivery?
3. Features before the Jenkins Pipeline
4. What is a Jenkins Pipeline?
5. What is a Jenkinsfile?
6. Pipeline Concepts
7. Hands-On
Check our complete DevOps playlist here (includes all the videos mentioned in the video): http://goo.gl/O2vo13
This document provides an overview of Kubernetes including:
1) Kubernetes is an open-source platform for automating deployment, scaling, and operations of containerized applications. It provides container-centric infrastructure and allows for quickly deploying and scaling applications.
2) The main components of Kubernetes include Pods (groups of containers), Services (abstract access to pods), ReplicationControllers (maintain pod replicas), and a master node running key components like etcd, API server, scheduler, and controller manager.
3) The document demonstrates getting started with Kubernetes by enabling the master on one node and a worker on another node, then deploying and exposing a sample nginx application across the cluster.
This document provides an overview and introduction to Terraform, including:
- Terraform is an open-source tool for building, changing, and versioning infrastructure safely and efficiently across multiple cloud providers and custom solutions.
- It discusses how Terraform compares to other tools like CloudFormation, Puppet, Chef, etc. and highlights some key Terraform facts like its versioning, community, and issue tracking on GitHub.
- The document provides instructions on getting started with Terraform by installing it and describes some common Terraform commands like apply, plan, and refresh.
- Finally, it briefly outlines some key Terraform features and example use cases like cloud app setup, multi
Low Code Integration with Apache Camel.pdfClaus Ibsen
Design your integration flows using Camel and JBang for a better developer experience, and make it easily production grade using Quarkus.
Claus Ibsen, Apache Camel lead & Senior Principal Software Engineer, Red Hat
The document discusses Netflix's use of Elasticsearch for querying log events. It describes how Netflix evolved from storing logs in files to using Elasticsearch to enable interactive exploration of billions of log events. It also summarizes some of Netflix's best practices for running Elasticsearch at scale, such as automatic sharding and replication, flexible schemas, and extensive monitoring.
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
Adapting the capacity of your compute infrastructure to the demands of your applications is the domain of Auto Scaling. Adding and removing Amazon EC2 instances is only part of the story, though – there is more to it than first meets the eye. This session introduces the basics of how to use Auto Scaling before moving on to more advanced topics such as mixing Spot and On-Demand instances to optimize cost or strategies for blue/green deployments. If you have used Auto Scaling before, you can learn about useful new features like lifecycle hooks and step scaling policies that make Auto Scaling even more widely applicable.
Chaos Engineering is the technique to simulate chaos and havoc in the live environment, which the engineers use to create a fault tolerant application. In the session we will talk about some of these scenarios and Chaos Mesh, a tool for Chaos engineering on Kubernetes.
Learning Objectives:
- Learn how to make decisions about the service and share best practices and useful tips for success
- Learn about Content based routing, HTTP/2, WebSockets
- Secure your web applications using TLS termination, AWS WAF on Application Load Balancer
The document discusses Kubernetes cluster autoscaler, including how it works, deployment steps, configuration, and limitations. It describes setting up the cluster autoscaler manager, preparing extra nodes, protecting nodes from scale down, and installing the autoscaler using Helm. Some key points are that it can automatically scale a cluster from 0 to 1000 nodes handling 30 pods each, but has restrictions like not supporting regional instance groups and taking 10 minutes to scale nodes down.
This document provides an introduction to Amazon Web Services (AWS) and AWS Lambda. It begins with an overview of cloud computing and an explanation of what AWS is and its core services. It then discusses AWS Lambda in more detail, describing it as a serverless computing service that allows users to run code without provisioning or managing servers. The document outlines benefits of AWS Lambda like scaling, availability, and low cost. It also provides examples of common use cases for Lambda and demonstrates its capabilities through an example application.
This document discusses how StreamSets and Spark can be used together for analytics insights in retail. Some key points:
- StreamSets Data Collector (SDC) is used to ingest IoT and sensor data from various sources into a common format and process the data in real-time using Spark evaluators.
- The Spark evaluator allows running Spark transformations on batches of data within SDC pipelines to do tasks like anomaly detection, sentiment analysis, and fraud detection.
- SDC can also be used to move data to and from Spark for end-of-batch processing using a Spark executor, such as running jobs on Databricks after files land in S3.
- Together, SDC and Spark
Analytic Insights in Retail Using Apache Spark with Hari ShreedharanDatabricks
Streamsets Data Collector is designed to make data ingest and processing easy. SDC integrates at several levels with Apache Spark to make data analysis using Spark very easy. SDC works with Databricks Cloud to trigger jobs based on incoming data.
In this talk, you will learn how a larger retail player with thousands of outlets is utilizing StreamSets to power Spark jobs on the Databricks cloud, combining real-time foot traffic data and historic behavioral & transaction data for analytic insights that improve revenue per square foot.
Intro to open source observability with grafana, prometheus, loki, and tempo(...LibbySchulze
This document provides an introduction to open source observability tools including Grafana, Prometheus, Loki, and Tempo. It summarizes each tool and how they work together. Prometheus is introduced as a time series database that collects metrics. Loki is described as a log aggregation system that handles logs at scale without high costs. Tempo is explained as a tracing system that allows tracing from logs, metrics, and between services. The document emphasizes that these tools can be run together to gain observability across an entire system from logs to metrics to traces.
Cloud Storage Comparison: AWS vs Azure vs Google vs IBMRightScale
The document provides a comparison of cloud storage options across AWS, Azure, Google, and IBM. It summarizes the key services and features of block/disk storage, object storage, and file storage for each cloud. It includes details on pricing, performance characteristics, replication, availability, and encryption capabilities. Example scenarios are also provided to illustrate the monthly costs for common configurations on each cloud platform.
This document summarizes Netflix's global cloud edge architecture. Key points include:
- Netflix uses edge services and a global cloud infrastructure to deliver content to over 1000 device types in over 40 countries.
- Zuul is an open source framework that Netflix uses for dynamic routing, authentication, testing, and security across its edge services.
- Netflix's edge scripting tier allows device teams to rapidly deploy scripts that control endpoints, content formatting, and APIs for different devices.
- RxJava and Hystrix help make the edge service API asynchronous, fault tolerant, and able to handle high concurrency.
- Netflix's delivery pipeline uses techniques like canary analysis, debugging, and load testing to continuously and automatically deploy changes
This document provides release notes for OpenShift Container Platform 4.12. It includes information on new features such as default consoles determined by installation platform for RHCOS nodes, IBM Secure Execution technology preview for s390x architecture, and RHCOS now using RHEL 8.6 packages. It also covers installation and upgrade topics like specifying load balancer type in AWS and extending worker nodes to AWS Local Zones. Additional sections cover security updates, components, features, known issues and more.
Hybrid cloud : why and how to connect your datacenters to OVHcloud ? OVHcloud
Across our products or between OVHcloud and your own datacenters, Oliver Bédouet will detail network architectures you can build and their advantages, from vRack to OVHcloud Connect.
by Darin Briskman, Technical Evangelist, AWS
Elasticsearch is the most popular open-source search and analytics engine - it's easy to use, but not always easy to configure an manage. Learn about Amazon's fully managed service that provides easier deployment, operation, and scale for Elasticsearch. Level: 200
What is Yaml:
Human friendly, cross language, Unicode based data serialization language.
Pronounced in such a way as to rhyme with “camel”
Acronym for
YAML
Ain’t
A language used to convert or represent structured data or objects as a series of characters that can be stored on a disk.
Examples:
CSV – Comma separated values
XML – Extensible markup language
JSON – JavaScript object notation
YAML – YAML ain’t markup language
Markup
Language
** DevOps Training: https://www.edureka.co/devops **
This Edureka tutorial on "Jenkins pipeline Tutorial" will help you understand the basic concepts of a Jenkins pipeline. Below are the topics covered in the tutorial:
1. The need for Continuous Delivery
2. What is Continuous Delivery?
3. Features before the Jenkins Pipeline
4. What is a Jenkins Pipeline?
5. What is a Jenkinsfile?
6. Pipeline Concepts
7. Hands-On
Check our complete DevOps playlist here (includes all the videos mentioned in the video): http://goo.gl/O2vo13
This document provides an overview of Kubernetes including:
1) Kubernetes is an open-source platform for automating deployment, scaling, and operations of containerized applications. It provides container-centric infrastructure and allows for quickly deploying and scaling applications.
2) The main components of Kubernetes include Pods (groups of containers), Services (abstract access to pods), ReplicationControllers (maintain pod replicas), and a master node running key components like etcd, API server, scheduler, and controller manager.
3) The document demonstrates getting started with Kubernetes by enabling the master on one node and a worker on another node, then deploying and exposing a sample nginx application across the cluster.
This document provides an overview and introduction to Terraform, including:
- Terraform is an open-source tool for building, changing, and versioning infrastructure safely and efficiently across multiple cloud providers and custom solutions.
- It discusses how Terraform compares to other tools like CloudFormation, Puppet, Chef, etc. and highlights some key Terraform facts like its versioning, community, and issue tracking on GitHub.
- The document provides instructions on getting started with Terraform by installing it and describes some common Terraform commands like apply, plan, and refresh.
- Finally, it briefly outlines some key Terraform features and example use cases like cloud app setup, multi
Low Code Integration with Apache Camel.pdfClaus Ibsen
Design your integration flows using Camel and JBang for a better developer experience, and make it easily production grade using Quarkus.
Claus Ibsen, Apache Camel lead & Senior Principal Software Engineer, Red Hat
The document discusses Netflix's use of Elasticsearch for querying log events. It describes how Netflix evolved from storing logs in files to using Elasticsearch to enable interactive exploration of billions of log events. It also summarizes some of Netflix's best practices for running Elasticsearch at scale, such as automatic sharding and replication, flexible schemas, and extensive monitoring.
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
Adapting the capacity of your compute infrastructure to the demands of your applications is the domain of Auto Scaling. Adding and removing Amazon EC2 instances is only part of the story, though – there is more to it than first meets the eye. This session introduces the basics of how to use Auto Scaling before moving on to more advanced topics such as mixing Spot and On-Demand instances to optimize cost or strategies for blue/green deployments. If you have used Auto Scaling before, you can learn about useful new features like lifecycle hooks and step scaling policies that make Auto Scaling even more widely applicable.
Chaos Engineering is the technique to simulate chaos and havoc in the live environment, which the engineers use to create a fault tolerant application. In the session we will talk about some of these scenarios and Chaos Mesh, a tool for Chaos engineering on Kubernetes.
Learning Objectives:
- Learn how to make decisions about the service and share best practices and useful tips for success
- Learn about Content based routing, HTTP/2, WebSockets
- Secure your web applications using TLS termination, AWS WAF on Application Load Balancer
The document discusses Kubernetes cluster autoscaler, including how it works, deployment steps, configuration, and limitations. It describes setting up the cluster autoscaler manager, preparing extra nodes, protecting nodes from scale down, and installing the autoscaler using Helm. Some key points are that it can automatically scale a cluster from 0 to 1000 nodes handling 30 pods each, but has restrictions like not supporting regional instance groups and taking 10 minutes to scale nodes down.
This document provides an introduction to Amazon Web Services (AWS) and AWS Lambda. It begins with an overview of cloud computing and an explanation of what AWS is and its core services. It then discusses AWS Lambda in more detail, describing it as a serverless computing service that allows users to run code without provisioning or managing servers. The document outlines benefits of AWS Lambda like scaling, availability, and low cost. It also provides examples of common use cases for Lambda and demonstrates its capabilities through an example application.
This document discusses how StreamSets and Spark can be used together for analytics insights in retail. Some key points:
- StreamSets Data Collector (SDC) is used to ingest IoT and sensor data from various sources into a common format and process the data in real-time using Spark evaluators.
- The Spark evaluator allows running Spark transformations on batches of data within SDC pipelines to do tasks like anomaly detection, sentiment analysis, and fraud detection.
- SDC can also be used to move data to and from Spark for end-of-batch processing using a Spark executor, such as running jobs on Databricks after files land in S3.
- Together, SDC and Spark
Analytic Insights in Retail Using Apache Spark with Hari ShreedharanDatabricks
Streamsets Data Collector is designed to make data ingest and processing easy. SDC integrates at several levels with Apache Spark to make data analysis using Spark very easy. SDC works with Databricks Cloud to trigger jobs based on incoming data.
In this talk, you will learn how a larger retail player with thousands of outlets is utilizing StreamSets to power Spark jobs on the Databricks cloud, combining real-time foot traffic data and historic behavioral & transaction data for analytic insights that improve revenue per square foot.
Scaling Search at Lendingkart discusses how Lendingkart scaled their search capabilities to handle large increases in data volume. They initially tried scaling databases vertically and horizontally, but searches were still slow at 8 seconds. They implemented ElasticSearch for its near real-time search, high scalability, and out-of-the-box functionality. Logstash was used to seed data from MySQL and MongoDB into ElasticSearch. Custom analyzers and mappings were developed. Searches then reduced to 230ms and aggregations to 200ms, allowing the business to scale as transactional data grew 3000% and leads 250%.
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
Speakers: Dominic Dwyer & Wei Shan Ang
This talk was presented in Percona Live Europe 2017. However, we did not have enough time to test against more scenario. We will be giving an updated talk with a more comprehensive tests and numbers. We hope to run it against citusDB and MongoRocks as well to provide a comprehensive comparison.
https://www.percona.com/live/e17/sessions/high-performance-json-postgresql-vs-mongodb
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaMushfekur Rahman
This document discusses building a unified logging layer using Fluentd, Elasticsearch, and Kibana. It describes what a unified logging layer is and why it is needed to collect, format, filter, and forward logs from multiple sources to storage. It then provides overviews of Fluentd for log collection, Elasticsearch for storage and querying, and Kibana for real-time visualization. Details are given on the architectures, plugins, and configurations of each tool and how they can work together in a scalable and highly available logging system.
Out of the box, Accumulo's strengths are difficult to appreciate without first building an application that showcases its capabilities to handle massive amounts of data. Unfortunately, building such an application is non-trivial for many would-be users, which affects Accumulo's adoption.
In this talk, we introduce Datawave, a complete ingest, query, and analytic framework for Accumulo. Datawave, recently open-sourced by the National Security Agency, capitalizes on Accumulo's capabilities, provides an API for working with structured and unstructured data, and boasts a robust, flexible, and scalable backend.
We'll do a deep dive into Datawave's project layout, table structures, and APIs in addition to demonstrating the Datawave quickstart—a tool that makes it incredibly easy to hit the ground running with Accumulo and Datawave without having to develop a complete application.
This summary provides an overview of the key points from the document in 3 sentences:
The document outlines the agenda for Season 3 Episode 1 of the Netflix OSS podcast, which includes lightning talks on 8 new projects including Atlas, Prana, Raigad, Genie 2, Inviso, Dynomite, Nicobar, and MSL. Representatives from Netflix, IBM Watson, Nike Digital, and Pivotal then each provide a 3-5 minute presentation on their featured project. The presentations describe the motivation, features and benefits of each project for observability, integration with the Netflix ecosystem, automation of Elasticsearch deployments, job scheduling, dynamic scripting for Java, message security, and developing microservices
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2lGNybu.
Stefan Krawczyk discusses how his team at StitchFix use the cloud to enable over 80 data scientists to be productive. He also talks about prototyping ideas, algorithms and analyses, how they set up & keep schemas in sync between Hive, Presto, Redshift & Spark and make access easy for their data scientists, etc. Filmed at qconsf.com..
Stefan Krawczyk is Algo Dev Platform Lead at StitchFix, where he’s leading development of the algorithm development platform. He spent formative years at Stanford, LinkedIn, Nextdoor & Idibon, working on everything from growth engineering, product engineering, data engineering, to recommendation systems, NLP, data science and business intelligence.
Initial presentation of swift (for montreal user group)Marcos García
Swift is an open source object storage system that provides scalable storage and retrieval of any amount of unstructured data over HTTP. It is designed to be scalable, reliable, and inexpensive for storing large amounts of unstructured data. Some key uses of Swift include storing backups, web content like images, and large scientific data objects. Swift uses a ring architecture to distribute and replicate data across multiple servers for high availability.
Serverless Clojure and ML prototyping: an experience reportMetosin Oy
This document summarizes a serverless Clojure and machine learning application for prototyping. It describes the team building the application, the serverless AWS infrastructure using Terraform, CI/CD pipelines with GitHub Actions, the Clojure applications including a dashboard SPA and event processor, and natural language processing using Haystack and SageMaker batch transforms. The application allows users to upload documents and questionnaires, select questions, and trigger inferences to find answers from the documents using ML models.
This document summarizes a presentation about using PostgreSQL's native full text search capabilities and the Sphinx search engine. It discusses when each option may be preferable, how to configure and use Sphinx to index PostgreSQL data, and some key Sphinx features like distributed searching, misspelling corrections, and autocompletion. Sphinx can be used to offload text searches for improved performance and scalability compared to native PostgreSQL searching.
This presentation introduces Google App Engine (GAE), covering its architecture, pricing, development process, and key services. GAE is a platform for building scalable web applications on Google's infrastructure. It provides automatic scaling, lower costs than traditional hosting, and services for user authentication, asynchronous tasks, storage, and more. Development involves using GAE's SDKs for local testing and the admin console for deploying and managing apps. Data can be stored in GAE's NoSQL datastore or relational Cloud SQL. Key services demonstrated include the datastore, blobstore, task queues, and transactions.
Introduction to Gatling performance testing tool and how we used it for testing Zonky's REST API. Example of running distributed performance tests in AWS Fargate with real-time monitoring with Logstash/ElasticSearch/Kibana stack.
This document discusses log management and analysis. It begins by explaining why logs are important for troubleshooting, security, and understanding infrastructure issues. It then discusses how tools like Logstash and Elasticsearch can be used to efficiently process, search, and visualize logs. The document provides examples of how these tools work together in a log management pipeline to ingest logs, parse and filter them, and store them in Elasticsearch for analysis and visualization. It concludes by discussing how these tools can be used to detect patterns and anomalies in logs.
High performance json- postgre sql vs. mongodbWei Shan Ang
PostgreSQL and MongoDB were benchmarked for performance on common operations like inserts, updates, and selects using a JSON document format. The key findings were:
1) PostgreSQL generally had lower latency but required extensive tuning to achieve high performance, while MongoDB delivered reasonable performance out of the box.
2) MongoDB showed unstable throughput and latency over time due to a cache eviction bug.
3) PostgreSQL did not scale well to large connection loads without connection pooling, while MongoDB scaled horizontally more easily.
4) Both databases had pros and cons for their data models, query capabilities, and upgrade processes. The optimal choice depends on an application's specific requirements.
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
This document summarizes a typical day for a Druid architect. It describes common tasks like evaluating production clusters, analyzing data and queries, and recommending optimizations. The architect asks stakeholders questions to understand usage and helps evaluate if Druid is a good fit. When advising on Druid, the architect considers factors like data sources, query types, and technology stacks. The document also provides tips on configuring clusters for performance and controlling segment size.
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
What is Master Data Management by PiLog Groupaymanquadri279
PiLog Group's Master Data Record Manager (MDRM) is a sophisticated enterprise solution designed to ensure data accuracy, consistency, and governance across various business functions. MDRM integrates advanced data management technologies to cleanse, classify, and standardize master data, thereby enhancing data quality and operational efficiency.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
Odoo ERP software
Odoo ERP software, a leading open-source software for Enterprise Resource Planning (ERP) and business management, has recently launched its latest version, Odoo 17 Community Edition. This update introduces a range of new features and enhancements designed to streamline business operations and support growth.
The Odoo Community serves as a cost-free edition within the Odoo suite of ERP systems. Tailored to accommodate the standard needs of business operations, it provides a robust platform suitable for organisations of different sizes and business sectors. Within the Odoo Community Edition, users can access a variety of essential features and services essential for managing day-to-day tasks efficiently.
This blog presents a detailed overview of the features available within the Odoo 17 Community edition, and the differences between Odoo 17 community and enterprise editions, aiming to equip you with the necessary information to make an informed decision about its suitability for your business.
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfUndress Baby
The quest for the best AI face swap solution is marked by an amalgamation of technological prowess and artistic finesse, where cutting-edge algorithms seamlessly replace faces in images or videos with striking realism. Leveraging advanced deep learning techniques, the best AI face swap tools meticulously analyze facial features, lighting conditions, and expressions to execute flawless transformations, ensuring natural-looking results that blur the line between reality and illusion, captivating users with their ingenuity and sophistication.
Web:- https://undressbaby.com/
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
2. Agenda
● OpenSearch
○ What is it?
○ Benefits/ Uses
○ How to use it
○ Features
● Migrate from Elastic to OpenSearch
● Tools & Plugins
3. About Me
● Lead Dev
● Located in Florida
● Trainer
● Presenter
● .NET Developer
● Youtuber: Coach4Dev
● Husband/ Father
4. Amazon Elasticsearch
● Launched in 2015
● Gained popularity for log analytics usage
● Used open-source Elastic under Apache License v2
● Jan 2021
○ Elastic NV changed licensing strategy
○ After ElasticSearch 7.10.2 & Kibana 7.10.2
■ Not release under Apache License v2
■ Release under Elastic License
5. OpenSearch
● Sep 2021:
○ Renamed from ElasticSearch to OpenSearch
● OpenSource fork from Elastic 7.10.2 and Kibana 7.10.2
● Highly scalable
● Fast access & response to large volumes of data
● Powered by Apache Lucene Search library
6. Apache Lucene
● Apache Lucene project develops open-source search software
○ Releases a core search library named Lucene core
● Lucene Core
○ Java Library providing powerful indexing and search features
8. Solr vs ElasticSearch
● Similar performance mostly.
● ES has better support for scalability
○ due to horizontal scaling
■ Better cloud support too
● ES can support multiple doc types in a single index better
○ More difficult to do this in Solr
● ES supports native DSL (Domain Specific Language)
○ Need to program queries in Solr
● https://mindmajix.com/elasticsearch-vs-solr
9. Why OpenSearch
● Huge amount of machine generated data these days
○ Growing exponentially
● Getting insights is important
● Interactive log analytics
● Real-time application monitoring
● Website Search, etc.
10. OpenSearch Features
● Easy to set-up and configure
● In-place upgrades
● Enables data monitoring & setting alerts based on thresholds
● Supports authentication, encryption & compliance requirements
11. OpenSearch vs ElasticSearch
● OpenSearch was forked from Elastic Search
○ Now they are separate from each other
● Each is adding features separately
● OpenSearch
○ Inbuilt support from AWS
12. OpenSearch features not in ES (free version)
● Centralized user accounts / access control
● Cross-cluster replication
● IP filtering
● Configurable retention period
● Anomaly detection
● Tableau connector
● JDBC driver
● ODBC driver
● Machine learning features such as regression and classification
● Link
18. OpenSearch Cluster
● Synonymous to domain
● Domains are clusters with
○ settings,
○ instance types,
○ instance counts,
○ and storage resources that you specify.
● Group of nodes
○ With same cluster.name attribute
20. Getting Started
● Create a domain
● Size the domain appropriately for your workload
● Control access to your domain using a domain access policy or fine-grained
access control
● Index data manually or from other AWS services
● Use OpenSearch Dashboards to search your data and create visualizations
21. Custom Endpoint
● If we want easier to read or custom domain name
● Can use Https
○ Upload SSL certificate
22. Run OpenSearch locally
● Install docker
● wsl -d docker-desktop
● sysctl -w vm.max_map_count=262144
● Ctrl+C
● docker-compose up
● Visit http://localhost:5601/
● Use admin/admin to login and explore
● Link
28. Searching Data - URI
● GET Request
● https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.am
azonaws.com/movies/_search?q=rebel&pretty=true
● Searches all the indices and properties
29. URI Search Specific fields
● Search movies index and title property
● GET
https://search-my-domain.us-west-1.es.amazonaws.com/movies/_search?q=ti
tle:house
30. Get Search Results - Command Line
● curl -XGET -u "master:XXXXX"
"https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.a
mazonaws.com/movies/_search?q=rebel&pretty=true"
31. Query DSL
● For more complex queries
○ OpenSearch Domain Specific Language (DSL)
● POST request with query body
●
32. Get Search Results - Dev Tools
● https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.am
azonaws.com/_dashboards/app/dev_tools#/console
○ GET _search
○ {
○ "query": {
○ "match_all": {}
○ }
○ }
33. Search on only specific fields
GET _search
{
"size": 20,
"query": {
"multi_match": {
"query": "U.S.",
"fields": ["title", "actor", "director"]
}
}
}
38. Dashboard Query Language
● Use DQL in Dashboards
○ Search for data and visualizations
● Terms Query
○ Search for any text
■ E.g. www.example.com
○ Access object’s nested field
■ E.g. coordinates.lat:43.7102
○ Leading and trailing wildcards
■ host.keyword:*.example.com/*
● Operators
○ AND
○ OR
39. Dashboard Query Language
● Date and range Queries
○ bytes >= 15 and memory < 15
○ @timestamp > "2020-12-14T09:35:33"
● Nested field query
○ superheroes: {hero-name: Superman}
50. Snapshot Method
● Generate snapshot in ElasticSearch
● Save in shared directory
● Restore in OpenSearch
● Snapshot
○ Backup of entire cluster state
○ Useful for recovery from failure and migration
● Link
51. Snapshot Method
● Check Index compatibility
○ E.g.: Cant restore 7.6.0 snapshot into 7.5.0 cluster
● Link
● Fastest
● Easiest
● Most efficient
●
52. Rolling Upgrade
● Official way to migrate cluster
● Without interruption
● Rolling upgrades are supported:
○ Between minor versions
○ From 5.6 to 6.8
○ From 6.8 to 7.14.1
○ From any version since 7.14.0 to 7.14.1
56. OpenSearch Mapping
● Dynamic
○ When you index a document
○ Opensearch adds fields automatically
○ It deduces their types by itself
● Explicit
○ If you know your data types
○ Preferred way of doing things
57. OpenSearch Mapping
● If you do not define a mapping ahead of time, OpenSearch dynamically
creates a mapping for you.
● If you do decide to define your own mapping, you can do so at index creation.
● ONE mapping is defined per index. Once the index has been created, we can
only add new fields to a mapping. We CANNOT change the mapping of an
existing field.
● If you must change the type of an existing field, you must create a new index
with the desired mapping, then reindex all documents into the new index.
58. Text vs keyword data types
● Text type
○ Full text searches
● Keyword type
○ Exact searches
○ Aggregations
○ Sorting
65. Data Streams in OpenSearch
● Ingesting time series data
○ Logs
○ Events
○ Metrics, etc.
● Number of documents grows rapidly
● Append Only data
● Don't need to update older documents (Very rarely)
66. Rollover
● If data is growing rapidly
● Write to index upto certain threshold
○ Then create a new index
○ And start writing to it
● Optimize the active index for high ingest rates on high-performance hot
nodes.
● Optimize for search performance on warm nodes.
● Shift older, less frequently accessed data to less expensive cold nodes,
● Delete data according to your retention policies by removing entire indices.
67. Index Template
● Data Stream requires an index template
● A name or wildcard (*) pattern for the data stream.
● The data stream’s timestamp field. This field must be mapped as a date or
date_nanos field data type and must be included in every document indexed
to the data stream.
● The mappings and settings applied to each backing index when it’s created.
68. ILM Policy
● Index Lifecycle Management Policy
● Can be applied to any number of indices
● Usage
○ Allocate
○ Delete
○ Rollover
○ Read Only
○ Wait for snapshot
73. Index Template
● Tells ElasticSearch how to configure an index when it is created
● For data streams
○ Configures the stream’s backing indices
○ Configured prior to index creation
74. Templates Types
● Component Templates
○ Reusable building blocks that configure
■ mappings,
■ settings, and
■ Aliases
○ Not directly applied to indices
● Index Template
○ Collection of component templates
○ Directly applied to indices
○ Some defaults: metrics-*-*, logs-*-*
78. Create data stream
● Documents must contain timestamp field
● PUT _data_stream/my-data-stream
● Stream’s name must match one of your index template’s index patterns
79. Get Info About Data Stream
● GET _data_stream/my-data-stream
82. Cross Cluster Replication
● Cross Cluster replication plugin
○ Replicates indexes, mapping & metadata from one cluster to another
● Advantages
○ Continue to handle search requests if there is an outage
○ Can help reduce latency in application
■ Replicating data across geographically distant data centers
83. Replication
● Active passive model
○ Follower index pulls data from leader index
● It can be
○ Started
○ Paused
○ Stopped
○ Resumed
● Can be secured
○ Security plugin
○ Encrypt cross cluster traffic
88. Opensearch plugins
● Standalone components
○ That add features and capabilities
● Huge number of plugins available
● E.g.
○ Replication Plugin
○ Security plugin
○ Notification plugin
89. SQL Plugin
● Lets you run SQL queries on ESDB
● Add data
○ PUT movies/_doc/1
○ { "title": "Spirited Away" }
● Query data
○ POST _plugins/_sql
○ {
○ "query": "SELECT * FROM movies LIMIT 50"
○ }
○
90. SQL Plugin
● Delete data from ESDB Index
● Enable Delete via SQL plugin
○ PUT _plugins/_query/settings
○ {
○ "transient": {
○ "plugins.sql.delete.enabled": "true"
○ }
○ }
○
91. SQL PLugin - Delete
● To Delete the data
○ POST _plugins/_sql
○ {
○ "query": "DELETE FROM movies"
○ }
○
92. Asynchronous Search
● Large volumes of data
● Can take longer to search
● Async
○ Run searches in the background
○ Monitor progress of these searches
○ Get back partial results as they become available
93. Asynchronous Search
● POST _plugins/_asynchronous_search
● Response contents:
○ ID
■ Can be used to track the state of the search
■ Get partial results
○ State
■ Running
■ Completed
■ Persisted
● Link
100. Grafana
● An open source visualization tool
● Various sources can be used as data source:
○ InfluxDB
○ MySQL
○ ElasticSearch
○ PostgreSQL
● Better suited for metrics visualizations
● Does not allow full text data querying
101. Logstash
● Free/ Open-Source
● Data processing pipeline
● Ingests data from multitude of sources
● Transforms it
● Sends it to your favorite stash
102. Logstash - Ingestion
● Data of all shapes/ sizes/ source
○ Can be ingested
● It can parse/ transform your data
104. AWS OpenSearch Security
● Use multi-factor authentication (MFA) with each account.
● Use SSL/TLS to communicate with AWS resources. We recommend TLS 1.2
or later.
● Set up API and user activity logging with AWS CloudTrail.
● Use AWS encryption solutions, along with all default security controls within
AWS services.
● Use advanced managed security services such as Amazon Macie, which
assists in discovering and securing personal data that is stored in Amazon S3.
● If you require FIPS 140-2 validated cryptographic modules when accessing
AWS through a command line interface or an API, use a FIPS endpoint.
105. Summary
● Opensearch
○ Open Source Search solution
● Upcoming and supported by AWS
● Caters to most search use cases
○ Great Query performance
● Powerful tools
● Community Support
106. Connect with me
● Trainings on various tech topics
● For any questions:
○ https://linkedin.com/in/coach4dev