This document summarizes a presentation about RediSearch and CRDT.
The presentation covered:
1. An overview of RediSearch and how it can be used for full text search and as a secondary index.
2. A demonstration of RediSearch benchmarking where it indexed a Wikipedia dataset faster than Elasticsearch and returned search results faster.
3. How RediSearch supports a multi-tenant search application with isolated indexes for each tenant, and how it outperformed Elasticsearch in indexing 25 million documents across 50,000 tenants.
4. An explanation of CRDT and how it allows for consensus-free replication between RediSearch instances for an active-active multi-site search engine with
Got hundreds of millions of documents to search? DataImportHandler blowing up while indexing? Random thread errors thrown by Solr Cellduring document extraction? Query performance collapsing? Then you've searching at Big Data scale. This talk will focus on the underlying principles of Big Data, and how to apply them to Solr. This talk isn't a deep dive into SolrCloud, though we'll talk about it. It also isn't meant to be a talk on traditional scaling of Solr.
Chris Bradford & Matt Overstreet review several Cassandra use cases we’ve encountered in state and federal government. C* solves many big data problems when storing, enriching and improving access to data.
Hadoop 2.x Cluster Architecture
Technological Geeks:- Video 3
Technological Geeks Hindi :- Video 3
Namenode ,Datanode, SecondaryNAmenode,
High availibility in Hadoop2
Federation in Hadoop2
What is Namespace
HeartBeat Signal
Yarn architecture
The United States Patent and Trademark Office wanted a simple, lightweight, yet modern and rich discovery interface for Chinese patent data. This is the story of the Global Patent Search Network, the next generation multilingual search platform for the USPTO. GPSN, http://gpsn.uspto.gov, was the first public application deployed in the cloud, and allowed a very small development team to build a discovery interface across millions of patents.
This case study will cover:
• How we leveraged Amazon Web Services platform for data ingestion, auto scaling, and deployment at a very low price compared to traditional data centers.
• We will cover some of the innovative methods for converting XML formatted data to usable information.
• Parsing through 5 TB of raw TIFF image data and converting them to modern web friendly format.
• Challenges in building a modern Single Page Application that provides a dynamic, rich user experience.
• How we built “data sharing” features into the application to allow third party systems to build additional functionality on top of GPSN.
Got hundreds of millions of documents to search? DataImportHandler blowing up while indexing? Random thread errors thrown by Solr Cellduring document extraction? Query performance collapsing? Then you've searching at Big Data scale. This talk will focus on the underlying principles of Big Data, and how to apply them to Solr. This talk isn't a deep dive into SolrCloud, though we'll talk about it. It also isn't meant to be a talk on traditional scaling of Solr.
Chris Bradford & Matt Overstreet review several Cassandra use cases we’ve encountered in state and federal government. C* solves many big data problems when storing, enriching and improving access to data.
Hadoop 2.x Cluster Architecture
Technological Geeks:- Video 3
Technological Geeks Hindi :- Video 3
Namenode ,Datanode, SecondaryNAmenode,
High availibility in Hadoop2
Federation in Hadoop2
What is Namespace
HeartBeat Signal
Yarn architecture
The United States Patent and Trademark Office wanted a simple, lightweight, yet modern and rich discovery interface for Chinese patent data. This is the story of the Global Patent Search Network, the next generation multilingual search platform for the USPTO. GPSN, http://gpsn.uspto.gov, was the first public application deployed in the cloud, and allowed a very small development team to build a discovery interface across millions of patents.
This case study will cover:
• How we leveraged Amazon Web Services platform for data ingestion, auto scaling, and deployment at a very low price compared to traditional data centers.
• We will cover some of the innovative methods for converting XML formatted data to usable information.
• Parsing through 5 TB of raw TIFF image data and converting them to modern web friendly format.
• Challenges in building a modern Single Page Application that provides a dynamic, rich user experience.
• How we built “data sharing” features into the application to allow third party systems to build additional functionality on top of GPSN.
A deep dive on RediSearch, the search engine built as a Redis Module. Originally given at the Silicon Valley Redis / Silicon Valley Big Data joint meetup
A preponderance of data from NASA's Earth Observing System (EOS) is archived in the HDF Version 4 (HDF4) format. The long-term preservation of these data is critical for climate and other scientific studies going many decades into the future. HDF4 is very effective for working with the large and complex collection of EOS data products. Unfortunately, because of the complex internal byte layout of HDF4 files, future readability of HDF4 data depends on preserving a complex software library that can interpret that layout. Having a way to access HDF4 data independent of a library could improve its viability as an archive format, and consequently give confidence that HDF4 data will be readily accessible forever, even if the HDF4 library is gone.
To address the need to simplify long-term access to EOS data stored in HDF4, a collaborative project between The HDF Group and NASA Earth Science Data Centers is implementing an approach to accessing data in HDF4 files based on the use of independent maps that describe the data in HDF4 files and tools that can use these maps to recover data from those files. With this approach, relatively simple programs will be able to extract the data from an HDF4 file, bypassing the need for the HDF4 library.
A demonstration project has shown that this approach is feasible. This involved an assessment of NASA�s HDF4 data holdings, and development of a prototype XML-based layout mapping language and tools to read layout maps and read HDF4 files using layout maps. Future plans call for a second phase of the project, in which the mapping tools and XML schema are made production quality, the mapping schema are integrated with existing XML metadata files in several data centers, and outreach activities are carried out to encourage and facilitate acceptance of the technology.
Learning to Rank Presentation (v2) at LexisNexis Search GuildSujit Pal
An introduction to Learning to Rank, with case studies using RankLib with and without plugins provided by Solr and Elasticsearch. RankLib is a library of learning to rank algorithms, which includes some popular LTR algorithms such as LambdaMART, RankBoost, RankNet, etc.
Amazon Web Services offers a quick and easy way to build a scalable search platform, a flexibility is especially useful when an initial data load is required but the hardware is no longer needed for day-to-day searching and adding new documents. This presentation will cover one such approach capable of enlisting hundreds of worker nodes to ingest data, track their progress, and relinquish them back to the cloud when the job is done. The data set that will be discussed is the collection of published patent grants available through Google Patents. A single Solr instance can easily handle searching the roughly 1 million patents issued between 2010 and 2005, but up to 50 worker nodes were necessary to load that data in a reasonable amount of time. Also, the same basic approach was used to make three sizes of PNG thumbnails of the patent grant TIFF images. In that case 150 worker nodes were used to generate 1.6 Tb of data over the course of three days. In this session, attendees will learn how to leverage EC2 as a scalable indexer and tricks for using XSLT on very large XML documents.
BDT204 Awesome Applications of Open Data - AWS re: Invent 2012Amazon Web Services
Dive into the world of big data as we discuss how open, public datasets can be harnessed using the AWS cloud. With a lot of large data collections (such as the 1000 Genomes Project and the Common Crawl), join this session to find out how you can process billions of web pages and trillions of genes to find new insights into society.
See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011
For CareerBuilder, a 1% deviance in search relevancy can mean millions of missed job opportunities
for our users. When CareerBuilder moved to Solr from an expensive, proprietary search vendor, our
top priorities were maintaining the quality of our search results and drastically improving our agility.
This talk will describe how we addressed both needs. For search quality, we’ll cover some of our
internal studies and resulting methods for dealing with multi-lingual content across dozens of
languages, as well as customizing and experimenting with relevancy calculations. For platform agility,
we’ll discuss CareerBuilder’s cloud-like search API framework which seamlessly handles millions of
searches an hour, processes hundreds of millions of documents, and is powered by hundreds of
globally-distributed servers. Come hear the results of our studies and some best practices for quality
and performance. Learn how our framework has lead to staggering improvements in both
maintainability and technology innovation, allowing us to learn from our content, not just find it.
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Caserta
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using Solr sponsored by O'Reilly Media!
Caserta Concepts shared one of their innovative DW projects using Solr. See how open source search technology can serve high performance analytic use cases. Presentation and solution walk-through given by Caserta Concepts' Joe Caserta and Elliott Cordo.
For more information, visit www.casertaconcepts.com
A deep dive on RediSearch, the search engine built as a Redis Module. Originally given at the Silicon Valley Redis / Silicon Valley Big Data joint meetup
A preponderance of data from NASA's Earth Observing System (EOS) is archived in the HDF Version 4 (HDF4) format. The long-term preservation of these data is critical for climate and other scientific studies going many decades into the future. HDF4 is very effective for working with the large and complex collection of EOS data products. Unfortunately, because of the complex internal byte layout of HDF4 files, future readability of HDF4 data depends on preserving a complex software library that can interpret that layout. Having a way to access HDF4 data independent of a library could improve its viability as an archive format, and consequently give confidence that HDF4 data will be readily accessible forever, even if the HDF4 library is gone.
To address the need to simplify long-term access to EOS data stored in HDF4, a collaborative project between The HDF Group and NASA Earth Science Data Centers is implementing an approach to accessing data in HDF4 files based on the use of independent maps that describe the data in HDF4 files and tools that can use these maps to recover data from those files. With this approach, relatively simple programs will be able to extract the data from an HDF4 file, bypassing the need for the HDF4 library.
A demonstration project has shown that this approach is feasible. This involved an assessment of NASA�s HDF4 data holdings, and development of a prototype XML-based layout mapping language and tools to read layout maps and read HDF4 files using layout maps. Future plans call for a second phase of the project, in which the mapping tools and XML schema are made production quality, the mapping schema are integrated with existing XML metadata files in several data centers, and outreach activities are carried out to encourage and facilitate acceptance of the technology.
Learning to Rank Presentation (v2) at LexisNexis Search GuildSujit Pal
An introduction to Learning to Rank, with case studies using RankLib with and without plugins provided by Solr and Elasticsearch. RankLib is a library of learning to rank algorithms, which includes some popular LTR algorithms such as LambdaMART, RankBoost, RankNet, etc.
Amazon Web Services offers a quick and easy way to build a scalable search platform, a flexibility is especially useful when an initial data load is required but the hardware is no longer needed for day-to-day searching and adding new documents. This presentation will cover one such approach capable of enlisting hundreds of worker nodes to ingest data, track their progress, and relinquish them back to the cloud when the job is done. The data set that will be discussed is the collection of published patent grants available through Google Patents. A single Solr instance can easily handle searching the roughly 1 million patents issued between 2010 and 2005, but up to 50 worker nodes were necessary to load that data in a reasonable amount of time. Also, the same basic approach was used to make three sizes of PNG thumbnails of the patent grant TIFF images. In that case 150 worker nodes were used to generate 1.6 Tb of data over the course of three days. In this session, attendees will learn how to leverage EC2 as a scalable indexer and tricks for using XSLT on very large XML documents.
BDT204 Awesome Applications of Open Data - AWS re: Invent 2012Amazon Web Services
Dive into the world of big data as we discuss how open, public datasets can be harnessed using the AWS cloud. With a lot of large data collections (such as the 1000 Genomes Project and the Common Crawl), join this session to find out how you can process billions of web pages and trillions of genes to find new insights into society.
See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011
For CareerBuilder, a 1% deviance in search relevancy can mean millions of missed job opportunities
for our users. When CareerBuilder moved to Solr from an expensive, proprietary search vendor, our
top priorities were maintaining the quality of our search results and drastically improving our agility.
This talk will describe how we addressed both needs. For search quality, we’ll cover some of our
internal studies and resulting methods for dealing with multi-lingual content across dozens of
languages, as well as customizing and experimenting with relevancy calculations. For platform agility,
we’ll discuss CareerBuilder’s cloud-like search API framework which seamlessly handles millions of
searches an hour, processes hundreds of millions of documents, and is powered by hundreds of
globally-distributed servers. Come hear the results of our studies and some best practices for quality
and performance. Learn how our framework has lead to staggering improvements in both
maintainability and technology innovation, allowing us to learn from our content, not just find it.
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Caserta
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using Solr sponsored by O'Reilly Media!
Caserta Concepts shared one of their innovative DW projects using Solr. See how open source search technology can serve high performance analytic use cases. Presentation and solution walk-through given by Caserta Concepts' Joe Caserta and Elliott Cordo.
For more information, visit www.casertaconcepts.com
Using Perforce Data in Development at TableauPerforce
Data plays a big role at Tableau—not just for our customers, but also throughout our company. Using our own products is not only one of our fundamental company values, but the analysis and discoveries we make are important to track as they shape our development processes and influence our day-to-day decisions. In this talk, we present and analyze a variety of data visualizations based on Perforce data from our development organization and share how it has influenced our infrastructure and development practices.
Presentation for Spark::Red Insight Conference in Cambridge, MA on August 25, 2015. This deck summarizes tools, considerations, and common issues with Oracle Endeca Guided Search performance.
We Provide Hadoop training institute in Hyderabad and Bangalore with corporate training by 12+ Experience faculty.
Real-time industry experts from MNCs
Resume Preparation by expert Professionals
Lab exercises
Interview Preparation
Experts advice
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Jethro data meetup index base sql on hadoop - oct-2014Eli Singer
JethroData Index based SQL on Hadoop engine.
Architecture comparison of MPP / Full-Scan sql engines such as Impala and Hive to index-based access such as Jethro.
SQL and NoSQL NYC meetup Oct 20 2014
Boaz Raufman
Teradata Partners Conference Oct 2014 Big Data Anti-PatternsDouglas Moore
Big Data Anti-Patterns: Lessons from the Front Lines
Drawn from over 50 client engagements, big data anti-patterns are common practices that make for bad solutions.
SQL-based databases have been around for decades and they power a wide range of applications. So what exactly do NoSQL databases bring to the table? In this webcast, you'll find out how NoSQL can liberate your development cycle, allow your application to scale and improve your system's uptime.
Data scientists spend too much of their time collecting, cleaning and wrangling data as well as curating and enriching it. Some of this work is inevitable due to the variety of data sources, but there are tools and frameworks that help automate many of these non-creative tasks. A unifying feature of these tools is support for rich metadata for data sets, jobs, and data policies. In this talk, I will introduce state-of-the-art tools for automating data science and I will show how you can use metadata to help automate common tasks in Data Science. I will also introduce a new architecture for extensible, distributed metadata in Hadoop, called Hops (Hadoop Open Platform-as-a-Service), and show how tinker-friendly metadata (for jobs, files, users, and projects) opens up new ways to build smarter applications.
Similar to RedisSearch / CRDT: Kyle Davis, Meir Shpilraien (20)
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
JMeter webinar - integration with InfluxDB and Grafana
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
1. PRESENTED BY
RediSearch / CRDT
Kyle Davis (@stockholmux)
Redis Labs, Head of Developer Advocacy
Meir Shpilraien (@Meir_Shpilraien)
Redis Labs, Senior Software Engineer
5. PRESENTED BY
• Create a schema using four types
– Text
– Numeric
– Tag
– Geospatial
• Add Documents in Real Time
– Directly
– From Hash
– Index only
• Search & Aggregate
• Delete documents as needed
• Drop the whole index
Data Lifecycle in RediSearch / Search and Aggregate
6. PRESENTED BY
• Goals
– Intentionally not SQL
– But familiar
– Exposable to end-users
• Simple
– No knowledge of data/structure needed
• Powerful
– With knowledge, zero in on data
Query Language
9. PRESENTED BY
AND / OR / NOT / Exact Phrase / Geospatial /
Tags / Prefix / Number Ranges / Optional
Terms & more
Query Syntax – more advanced
And combine them all into one query:
(chev*|ford) -explorer ~truck @year:[2001
2011] @location:[74 40 100 km] @condition:{
good | verygood }
10. PRESENTED BY
•Stop words:
–”a fox in the woods” -> “fox woods”
•Stemming:
–Query “going” -> find “going” ”go” “gone”
•Slop:
–Query: “glass pitcher”, slop 2 -> “glass gallon beer pitcher”
•With or without content:
–Query: “To be or not to be” -> Hamlet (without the whole play)
Matched text highlight/summary:
–Query – “To be or not to be” -> Hamlet. <b>To be, or not to be</b>-
that is the question
Full-text Search
11. PRESENTED BY
•Synonyms
–Query “Bob” -> Find documents with “Robert”
•Query Spell Check
–”a fxo in the woods” -> Did you mean “a fox in the
woods”
•Phonetic Search
–“John Smith” -> “Jon Smyth”
Full-text Search
12. PRESENTED BY
• Each field can have a weight which influences the rank in the returned result
• Each document can have a score to influence rank
• Built-in Scoring Functions
– Default: TF-IDF / term frequency–inverse document frequency
• Variant: DOCNORM
• Variant: BM25
– DISMAX (Solr’s default)
– DOCSCORE
– HAMMING for binary payloads
• Fields can be independently sortable, which trumps any in-built scorer
Scoring, Weights, and Sorting
14. PRESENTED BY
Aggregations
• Processes and transforms
• Same query language as search
• Can group, sort and apply transformations
• Follows pipeline of composable actions:
Filter Group Apply Sort Apply
Reduce
Reduce
18. PRESENTED BY
• In the module, but separate storage
• Radix trie-based, optimized for
real-time, as-you-type completions
• Simple API
– Add a suggestion (FT.SUGADD)
– Get a suggestion (FT.SUGGET)
– Delete a suggestion(FT.SUGDEL)
• Specify or increment “score” of each
item to create custom sortings
Autocomplete/Suggestions
26. PRESENTED BY
• Serving a multi-tenant application,
• Each tenant has its own dedicated and isolated search index
• Number of docs per index - 500
• Total number of tenants - 50k
• Total number of indexed documents – 25M
What is a multi-tenant search ?
29. PRESENTED BY
• Natively in memory (*Elasticsearch was running with cache enabled)
• C (RediSearch) vs. Java (Elasticsearch)
• Extremely optimized built from the ground-up search engine vs. less optimized
20yro Lucene search engine
• Redis RESP light protocol vs Elasticsearch HTTP based protocol
Multi-tenant Results
30. PRESENTED BY
• Client & Server – AWS c4.8xlarge (36 vCPU and 60GB RAM)
Setup
client elastic client redis
RediSearch
31. PRESENTED BY
Elasticsearch:
• shards: 5
• JVM settings (Xms and Xmx)
• indices.memory.index_buffer_size
• index.refresh_interval (triggers flushes)
• index.number_of_replicas
Redisearch:
• Doc table size 10M
• No threads concurrency (handle using enterprise cluster)
Configuration Settings
RediSearch
33. PRESENTED BY
Multi site Active-Active replication
Consensus based Replication
A single instance needs to know that a majority of parties
agreed on an operation before applying it
Advantage:
- Secured strong consistency
- Known algorithms Paxos, Raft...
Disadvantage :
- Takes time to reach an agreement (especially on a worldwide scale)
*Shapiro, Marc; Preguiça, Nuno; Baquero, Carlos; Zawirski, Marek (2011), Conflict-Free
Replicated Data Types, Lecture Notes in Computer Science 6976
34. PRESENTED BY
Conflict Free Replicated Data-Types
• Consensus free technique that satisfy the “Eventual Consistency” properties
- No need to coordinate with other parties in advance → increases performance
- Waiting long enough, all parties state will be aligned → strong eventual consistency
What is CRDT
INCRBY 5 DECRBY 3x = 2
SyncSync
* Shapiro, Marc; Preguiça, Nuno; Baquero, Carlos; Zawirski, Marek (2011),
Conflict-Free Replicated Data Types, Lecture Notes in Computer Science 6976
36. PRESENTED BY
RediSearch and CRDB (Redis CRDT) → a Multi site Active-Active search engine
• RediSearch (FT.ADD) saves the raw data as a Hash.
• CRDT replicates the Hash between the sites.
• On Hash received, CRDT notifies RediSearch causing new data reindex.
• Only after conflicts being resolved by CRDT, RediSearch is being notified.
RediSearch & CRDT
ft.add idx doc1 name Danny
hset doc1 name Danny
Replicating data
to the other replica
notification on new data arrive
CRDT CRDT
RediSearch RediSearch
Site 1 Site 2