A deep dive on RediSearch, the search engine built as a Redis Module. Originally given at the Silicon Valley Redis / Silicon Valley Big Data joint meetup
RedisDay London 2018 - RedisSearch Deep DiveRedis Labs
RediSearch can be used for full text search and secondary indexing. It allows users to create schemas, add documents in real time, search and aggregate data, and delete or drop indexes. RediSearch supports autocomplete, fuzzy search, spell checking, scoring, aggregations, and a query language that is familiar but not SQL.
This document discusses CMD2RDF, a project that converts metadata in the Component Metadata Infrastructure (CMDI) format to the Resource Description Framework (RDF) format to make the metadata queryable as Linked Open Data. The conversion includes mapping the CMDI schema to RDFS classes and properties, mapping specific CMDI profiles and components to RDFS subclasses and subproperties, and converting individual CMDI records to RDF instances. The converted RDF data is loaded into a Virtuoso triplestore and made available for SPARQL and REST queries through a browser interface. This allows linking between CMDI metadata and other Linked Data sources.
There is a lot of confusion out there about the various kinds of NoSQL, and NewSQL, technologies. Document stores, graph databases, columnar databases, graph databases, and the list goes on. This confusion has lead to a good deal of less than optimal deployments, pain, and, ultimately, antipathy.
In this talk, Dan will walk us through a high-level explanation of the various NoSQL technologies available to us, how they work, and provide some dos and don'ts for their implementation.
RDF.rb is a Ruby library for working with RDF and SPARQL. It allows querying RDF stores using SPARQL and supports various RDF serialization formats and storage backends. SPARQL queries can be run against SPARQL endpoints using the sparql-client gem. RDF.rb is pure Ruby, open source under the UNLICENSE, and supports CRuby and JRuby.
Apache Spark's Built-in File Sources in DepthDatabricks
This document summarizes a presentation about Apache Spark's built-in file sources. It discusses various file formats including Parquet, ORC, Avro, JSON, CSV, text and binary. It explains the differences between column-oriented and row-oriented formats. It also covers data layout techniques like partitioning and bucketing. Regarding file readers, it describes how Spark analyzes data to skip unneeded portions. For writers, it explains how Spark uses a distributed, transactional approach by writing to temporary locations and committing outputs.
RedisSearch / CRDT: Kyle Davis, Meir ShpilraienRedis Labs
This document summarizes a presentation about RediSearch and CRDT.
The presentation covered:
1. An overview of RediSearch and how it can be used for full text search and as a secondary index.
2. A demonstration of RediSearch benchmarking where it indexed a Wikipedia dataset faster than Elasticsearch and returned search results faster.
3. How RediSearch supports a multi-tenant search application with isolated indexes for each tenant, and how it outperformed Elasticsearch in indexing 25 million documents across 50,000 tenants.
4. An explanation of CRDT and how it allows for consensus-free replication between RediSearch instances for an active-active multi-site search engine with
RedisDay London 2018 - RedisSearch Deep DiveRedis Labs
RediSearch can be used for full text search and secondary indexing. It allows users to create schemas, add documents in real time, search and aggregate data, and delete or drop indexes. RediSearch supports autocomplete, fuzzy search, spell checking, scoring, aggregations, and a query language that is familiar but not SQL.
This document discusses CMD2RDF, a project that converts metadata in the Component Metadata Infrastructure (CMDI) format to the Resource Description Framework (RDF) format to make the metadata queryable as Linked Open Data. The conversion includes mapping the CMDI schema to RDFS classes and properties, mapping specific CMDI profiles and components to RDFS subclasses and subproperties, and converting individual CMDI records to RDF instances. The converted RDF data is loaded into a Virtuoso triplestore and made available for SPARQL and REST queries through a browser interface. This allows linking between CMDI metadata and other Linked Data sources.
There is a lot of confusion out there about the various kinds of NoSQL, and NewSQL, technologies. Document stores, graph databases, columnar databases, graph databases, and the list goes on. This confusion has lead to a good deal of less than optimal deployments, pain, and, ultimately, antipathy.
In this talk, Dan will walk us through a high-level explanation of the various NoSQL technologies available to us, how they work, and provide some dos and don'ts for their implementation.
RDF.rb is a Ruby library for working with RDF and SPARQL. It allows querying RDF stores using SPARQL and supports various RDF serialization formats and storage backends. SPARQL queries can be run against SPARQL endpoints using the sparql-client gem. RDF.rb is pure Ruby, open source under the UNLICENSE, and supports CRuby and JRuby.
Apache Spark's Built-in File Sources in DepthDatabricks
This document summarizes a presentation about Apache Spark's built-in file sources. It discusses various file formats including Parquet, ORC, Avro, JSON, CSV, text and binary. It explains the differences between column-oriented and row-oriented formats. It also covers data layout techniques like partitioning and bucketing. Regarding file readers, it describes how Spark analyzes data to skip unneeded portions. For writers, it explains how Spark uses a distributed, transactional approach by writing to temporary locations and committing outputs.
RedisSearch / CRDT: Kyle Davis, Meir ShpilraienRedis Labs
This document summarizes a presentation about RediSearch and CRDT.
The presentation covered:
1. An overview of RediSearch and how it can be used for full text search and as a secondary index.
2. A demonstration of RediSearch benchmarking where it indexed a Wikipedia dataset faster than Elasticsearch and returned search results faster.
3. How RediSearch supports a multi-tenant search application with isolated indexes for each tenant, and how it outperformed Elasticsearch in indexing 25 million documents across 50,000 tenants.
4. An explanation of CRDT and how it allows for consensus-free replication between RediSearch instances for an active-active multi-site search engine with
Why MongoDB over other Databases - HabilelabsHabilelabs
MongoDB is the faster-growing database. It is an open-source document and leading NoSQL database with the scalability and flexibility that you want with the querying and indexing that you need. In this Document, I presented why to choose MongoDB is over another database.
Performance is an important key in the success of a good user experienceCaching information is often the best way to achieve the performance.
Redis is far for the traditional cache which deals only with key-value pairs. Build from an open-sourceproject, it is accessible from multiple languages and supports atomic operations such as appending to a string, incrementing the value in a hash, pushing to a list, computing set intersection, union and difference, or getting the member with highest ranking in a sorted set.
This session will introduce many features of the Azure Redis Cache service through a demo application.
Apache Solr is an open source search engine based on Apache Lucene. It provides features like hit highlighting, faceted search, indexing and searching large amounts of data from databases or XML files. Solr is written in Java and can be run on Tomcat, JBoss or Jetty. It allows searching and filtering of data through queries and faceted navigation. Solr supports sharding and replication across multiple servers for high performance and scalability.
Apache Drill is an open source SQL query engine for analysis of large scale datasets. It is modeled after Google's Dremel system and allows for interactive analysis of data stored in Hadoop files. Drill uses a columnar data format and execution engine to enable fast query performance on large datasets. It also supports pluggable components for different data formats, query languages, and data sources to provide a flexible and extensible platform.
This document provides an overview of the Apache Solr search engine. It begins with an introduction to full-text search and how it differs from basic SQL queries. It then covers the basic and advanced features of Solr, highlighting facets, language-specific processing, and geographic search. The document reviews how Solr uses Lucene for indexing and search capabilities. It concludes with discussing ways to get started with Solr, including downloading the software and importing sample data for testing.
AWS Senior Product Manager, Tina Adams, discusses Redshift's new feature, User Defined Functions.
Learn how the new User Defined Functions for Amazon Redshift works with Chartio for quick and dynamic data analysis.
This document provides an overview of SolrCloud on Hadoop. It discusses how SolrCloud allows for distributed, highly scalable search capabilities on Hadoop clusters. Key components that work with SolrCloud are also summarized, including HDFS for storage, MapReduce for processing, and ZooKeeper for coordination services. The document demonstrates how SolrCloud can index and query large datasets stored in Hadoop.
The document provides an overview and agenda for an Apache Solr crash course. It discusses topics such as information retrieval, inverted indexes, metrics for evaluating IR systems, Apache Lucene, the Lucene and Solr APIs, indexing, searching, querying, filtering, faceting, highlighting, spellchecking, geospatial search, and Solr architectures including single core, multi-core, replication, and sharding. It also provides tips on performance tuning, using plugins, and developing a Solr-based search engine.
This document provides an overview and instructions for setting up Elasticsearch. It discusses:
- How to set up the Elasticsearch workshop environment by installing required software and cloning the GitHub repository.
- Key concepts about Elasticsearch including its distributed and schema-free nature, how it is document oriented, and how indexes, types, documents, and fields relate to a relational database.
- Core components like clusters, nodes, shards, and replicas. It also distinguishes between filters and queries.
- Steps for connecting to Elasticsearch, inserting, searching, updating and deleting data.
- Advanced search techniques including filters, multi-field search, and human language processing using analyzers, stop words, synonyms and normalization
This document discusses information retrieval techniques. It begins by defining information retrieval as selecting the most relevant documents from a large collection based on a query. It then discusses some key aspects of information retrieval including document representation, indexing, query representation, and ranking models. The document also covers specific techniques used in information retrieval systems like parsing documents, tokenization, removing stop words, normalization, stemming, and lemmatization.
Effective and Efficient Entity Search in RDF dataRoi Blanco
Triple stores have long provided RDF storage as well as data access using expressive, formal query languages such as SPARQL. The new end users of the Semantic Web, however, are mostly unaware of SPARQL and overwhelmingly prefer imprecise, informal keyword queries for searching over data. At the same time, the amount of data on the Semantic Web is approaching the limits of the architectures that provide support for the full expressivity of SPARQL. These factors combined have led to an increased interest in semantic search, i.e. access to RDF data using Information Retrieval methods. In this work, we propose a method for effective and efficient entity search over RDF data. We describe an adaptation of the BM25F ranking function for RDF data, and demonstrate that it outperforms other state-of-the-art methods in ranking RDF resources. We also propose a set of new index structures for efficient retrieval and ranking of results. We implement these results using the open-source MG4J framework.
This document discusses file organizations and indexing in databases. It describes different file organization approaches like heap files, sorted files, and clustered files. It also discusses alternative representations for data entries in indexes and different types of indexes like clustered vs unclustered indexes and single key vs composite indexes. The document provides analysis of the costs for different database operations under various file organizations and indexes.
Jose portillo dev con presentation 1138Jose Portillo
This document discusses best practices for implementing Solr sharding in Alfresco. It defines what sharding is and explains that it involves splitting a single index into multiple parts or shards to improve search performance, distribute indexing load, and scale horizontally. The document outlines different types of sharding, considerations for the number of shards, high availability, backup procedures, and common configuration settings when using Solr sharding in Alfresco.
Redis is an open source, advanced key-value store that can be used as a data structure server. It supports various data structures like strings, hashes, lists, sets, and sorted sets. Redis has features like speed, simplicity, reliability and versatility. It can be installed by downloading from the Redis website. The Redis server listens on port 6379 by default. Popular programming languages have client libraries available to connect to Redis. Redis supports different data structures that can be used for applications like caching, sessions, messaging, and more. Data sets in Redis are stored in memory for fast access but can also be persisted to disk for durability.
This document discusses Apache Drill, an open source interactive analysis engine for large scale datasets. It was designed to provide interactive query performance over large datasets like Google's Dremel. Drill uses a columnar data format and distributed execution model to enable fast analysis. It supports pluggable query languages, data formats, and data sources to provide a flexible platform for interactive exploration of large datasets.
Apache Drill is a new open source Apache Incubator project for interactive analysis of large-scale datasets, inspired by Google's Dremel. It enables users to query terabytes of data in seconds. Apache Drill supports a broad range of data formats, including Protocol Buffers, Avro and JSON, and leverages Hadoop and HBase as data sources. Drill's primary query language, DrQL, is compatible with Google BigQuery. In this talk we provide an overview of the Drill project, including its design goals and architecture.
Presenter: Jason Frantz, Software Architect, MapR Technologies
1. The document summarizes new features in Oracle Text 11g and the roadmap for Oracle's search products, including Oracle Text and Secure Enterprise Search.
2. Key new features in Oracle Text 11g include composite domain indexes, automatic language recognition with context-sensitive stemming, and offline index creation. Oracle Text 11.2.0.2 introduces entity extraction, name search, and a result set interface that returns XML results.
3. The roadmap discusses merging Oracle Text and Secure Enterprise Search and bringing additional natural language processing, partitioning, faceted navigation, and performance improvements to Oracle's search products.
Important work-arounds for making ASS multi-lingualAxel Faust
Slides from my Alfresco DevCon 2018 Lightning Talk (5 min, 15s per main slide, auto-advancing) about the Alfresco Search Services product, its current limitations with regards to usage in an organisation with mixed user locales, and the work-arounds (as well as long-term solution) to making it work nonetheless. The recording of the Lightning Talk session will be uploaded to the Alfresco YouTube channel sometime in the next days / weeks.
How Solr Search Works - A tech Talk at Atlogys Delhi Office by our Senior Technologist Rajat Jain. The lecture takes a deep dive into Solr - what it is, how it works, what it does and its inbuilt architecture. A wonderful technical session with many live examples, a sneak peak into solr code and config files and a live demo. Part of Atlogys Academy Series.
Why MongoDB over other Databases - HabilelabsHabilelabs
MongoDB is the faster-growing database. It is an open-source document and leading NoSQL database with the scalability and flexibility that you want with the querying and indexing that you need. In this Document, I presented why to choose MongoDB is over another database.
Performance is an important key in the success of a good user experienceCaching information is often the best way to achieve the performance.
Redis is far for the traditional cache which deals only with key-value pairs. Build from an open-sourceproject, it is accessible from multiple languages and supports atomic operations such as appending to a string, incrementing the value in a hash, pushing to a list, computing set intersection, union and difference, or getting the member with highest ranking in a sorted set.
This session will introduce many features of the Azure Redis Cache service through a demo application.
Apache Solr is an open source search engine based on Apache Lucene. It provides features like hit highlighting, faceted search, indexing and searching large amounts of data from databases or XML files. Solr is written in Java and can be run on Tomcat, JBoss or Jetty. It allows searching and filtering of data through queries and faceted navigation. Solr supports sharding and replication across multiple servers for high performance and scalability.
Apache Drill is an open source SQL query engine for analysis of large scale datasets. It is modeled after Google's Dremel system and allows for interactive analysis of data stored in Hadoop files. Drill uses a columnar data format and execution engine to enable fast query performance on large datasets. It also supports pluggable components for different data formats, query languages, and data sources to provide a flexible and extensible platform.
This document provides an overview of the Apache Solr search engine. It begins with an introduction to full-text search and how it differs from basic SQL queries. It then covers the basic and advanced features of Solr, highlighting facets, language-specific processing, and geographic search. The document reviews how Solr uses Lucene for indexing and search capabilities. It concludes with discussing ways to get started with Solr, including downloading the software and importing sample data for testing.
AWS Senior Product Manager, Tina Adams, discusses Redshift's new feature, User Defined Functions.
Learn how the new User Defined Functions for Amazon Redshift works with Chartio for quick and dynamic data analysis.
This document provides an overview of SolrCloud on Hadoop. It discusses how SolrCloud allows for distributed, highly scalable search capabilities on Hadoop clusters. Key components that work with SolrCloud are also summarized, including HDFS for storage, MapReduce for processing, and ZooKeeper for coordination services. The document demonstrates how SolrCloud can index and query large datasets stored in Hadoop.
The document provides an overview and agenda for an Apache Solr crash course. It discusses topics such as information retrieval, inverted indexes, metrics for evaluating IR systems, Apache Lucene, the Lucene and Solr APIs, indexing, searching, querying, filtering, faceting, highlighting, spellchecking, geospatial search, and Solr architectures including single core, multi-core, replication, and sharding. It also provides tips on performance tuning, using plugins, and developing a Solr-based search engine.
This document provides an overview and instructions for setting up Elasticsearch. It discusses:
- How to set up the Elasticsearch workshop environment by installing required software and cloning the GitHub repository.
- Key concepts about Elasticsearch including its distributed and schema-free nature, how it is document oriented, and how indexes, types, documents, and fields relate to a relational database.
- Core components like clusters, nodes, shards, and replicas. It also distinguishes between filters and queries.
- Steps for connecting to Elasticsearch, inserting, searching, updating and deleting data.
- Advanced search techniques including filters, multi-field search, and human language processing using analyzers, stop words, synonyms and normalization
This document discusses information retrieval techniques. It begins by defining information retrieval as selecting the most relevant documents from a large collection based on a query. It then discusses some key aspects of information retrieval including document representation, indexing, query representation, and ranking models. The document also covers specific techniques used in information retrieval systems like parsing documents, tokenization, removing stop words, normalization, stemming, and lemmatization.
Effective and Efficient Entity Search in RDF dataRoi Blanco
Triple stores have long provided RDF storage as well as data access using expressive, formal query languages such as SPARQL. The new end users of the Semantic Web, however, are mostly unaware of SPARQL and overwhelmingly prefer imprecise, informal keyword queries for searching over data. At the same time, the amount of data on the Semantic Web is approaching the limits of the architectures that provide support for the full expressivity of SPARQL. These factors combined have led to an increased interest in semantic search, i.e. access to RDF data using Information Retrieval methods. In this work, we propose a method for effective and efficient entity search over RDF data. We describe an adaptation of the BM25F ranking function for RDF data, and demonstrate that it outperforms other state-of-the-art methods in ranking RDF resources. We also propose a set of new index structures for efficient retrieval and ranking of results. We implement these results using the open-source MG4J framework.
This document discusses file organizations and indexing in databases. It describes different file organization approaches like heap files, sorted files, and clustered files. It also discusses alternative representations for data entries in indexes and different types of indexes like clustered vs unclustered indexes and single key vs composite indexes. The document provides analysis of the costs for different database operations under various file organizations and indexes.
Jose portillo dev con presentation 1138Jose Portillo
This document discusses best practices for implementing Solr sharding in Alfresco. It defines what sharding is and explains that it involves splitting a single index into multiple parts or shards to improve search performance, distribute indexing load, and scale horizontally. The document outlines different types of sharding, considerations for the number of shards, high availability, backup procedures, and common configuration settings when using Solr sharding in Alfresco.
Redis is an open source, advanced key-value store that can be used as a data structure server. It supports various data structures like strings, hashes, lists, sets, and sorted sets. Redis has features like speed, simplicity, reliability and versatility. It can be installed by downloading from the Redis website. The Redis server listens on port 6379 by default. Popular programming languages have client libraries available to connect to Redis. Redis supports different data structures that can be used for applications like caching, sessions, messaging, and more. Data sets in Redis are stored in memory for fast access but can also be persisted to disk for durability.
This document discusses Apache Drill, an open source interactive analysis engine for large scale datasets. It was designed to provide interactive query performance over large datasets like Google's Dremel. Drill uses a columnar data format and distributed execution model to enable fast analysis. It supports pluggable query languages, data formats, and data sources to provide a flexible platform for interactive exploration of large datasets.
Apache Drill is a new open source Apache Incubator project for interactive analysis of large-scale datasets, inspired by Google's Dremel. It enables users to query terabytes of data in seconds. Apache Drill supports a broad range of data formats, including Protocol Buffers, Avro and JSON, and leverages Hadoop and HBase as data sources. Drill's primary query language, DrQL, is compatible with Google BigQuery. In this talk we provide an overview of the Drill project, including its design goals and architecture.
Presenter: Jason Frantz, Software Architect, MapR Technologies
1. The document summarizes new features in Oracle Text 11g and the roadmap for Oracle's search products, including Oracle Text and Secure Enterprise Search.
2. Key new features in Oracle Text 11g include composite domain indexes, automatic language recognition with context-sensitive stemming, and offline index creation. Oracle Text 11.2.0.2 introduces entity extraction, name search, and a result set interface that returns XML results.
3. The roadmap discusses merging Oracle Text and Secure Enterprise Search and bringing additional natural language processing, partitioning, faceted navigation, and performance improvements to Oracle's search products.
Important work-arounds for making ASS multi-lingualAxel Faust
Slides from my Alfresco DevCon 2018 Lightning Talk (5 min, 15s per main slide, auto-advancing) about the Alfresco Search Services product, its current limitations with regards to usage in an organisation with mixed user locales, and the work-arounds (as well as long-term solution) to making it work nonetheless. The recording of the Lightning Talk session will be uploaded to the Alfresco YouTube channel sometime in the next days / weeks.
How Solr Search Works - A tech Talk at Atlogys Delhi Office by our Senior Technologist Rajat Jain. The lecture takes a deep dive into Solr - what it is, how it works, what it does and its inbuilt architecture. A wonderful technical session with many live examples, a sneak peak into solr code and config files and a live demo. Part of Atlogys Academy Series.
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)Rebecca Bilbro
To honor ten years of PyData London, join Dr. Rebecca Bilbro as she takes us back in time to reflect on a little over ten years working as a data scientist. One of the many renegade PhDs who joined the fledgling field of data science of the 2010's, Rebecca will share lessons learned the hard way, often from watching data science projects go sideways and learning to fix broken things. Through the lens of these canon events, she'll identify some of the anti-patterns and red flags she's learned to steer around.
Generative Classifiers: Classifying with Bayesian decision theory, Bayes’ rule, Naïve Bayes classifier.
Discriminative Classifiers: Logistic Regression, Decision Trees: Training and Visualizing a Decision Tree, Making Predictions, Estimating Class Probabilities, The CART Training Algorithm, Attribute selection measures- Gini impurity; Entropy, Regularization Hyperparameters, Regression Trees, Linear Support vector machines.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of March 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
4. • Create a schema using four types
–Text
–Numeric
–Tag
–Geospatial
• Add Documents in Real Time
–Directly
–From Hash
–Index only
• Search & Aggregate
• Delete documents as needed
• Drop the whole index
Data Lifecycle in RediSearch / Search and Aggregate
6. • Goals
–Intentionally not SQL
–But familiar
–Exposable to end-users
• Simple
–No knowledge of data/structure needed
• Powerful
–With knowledge, zero in on data
Query Language
7. AND / OR / NOT / Exact Phrase / Geospatial /
Tags / Prefix / Number Ranges / Optional Terms
& more
Query Syntax
And combine them all into one query:
(chev*|ford) -explorer ~truck @year:[2001
2011] @location:[74 40 100 km] @condition:{
good | verygood }
8. • Stop words:
–”a fox in the woods” -> “fox woods”
• Stemming:
–Query “going” -> find “going” ”go” “gone”
–Arabic, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese,
Romanian, Russian, Spanish, Swedish, Tamil, Turkish, Chinese
• Slop:
–Query: “glass pitcher”, slop 2 -> “glass gallon beer pitcher”
• With or without content:
–Query: “To be or not to be” -> Hamlet (without the whole play)
• Matched text highlight/summary:
–Query – “To be or not to be” -> Hamlet. <b>To be, or not to be</b>- that is the question
• Synonyms
–Query “Bob” -> Find documents with “Robert”
Full-text Search
10. • Each field can have a weight which influences the rank in the returned result
• Each document can have a score to influence rank
• Built-in Scoring Functions
–Default: TF-IDF / term frequency–inverse document frequency
• Variant: DOCNORM
• Variant: BM25
–DISMAX (Solr’s default)
–DOCSCORE
–HAMMING for binary payloads
• Fields can be independently sortable, which trumps any in-built scorcer
Scoring, Weights, and Sorting
12. Aggregations
• Processes and transforms
• Same query language as search
• Can group, sort and apply transformations
• Follows pipeline of composable actions:
Filter Group Apply Sort Apply
Reduce
Reduce
16. • In the module, but separate storage
• Radix tree-based, optimized for real-
time, as-you-type completions
• Simple API
–Add a suggestion (FT.SUGADD)
–Get a suggestion (FT.SUGGET)
–Delete a suggestion(FT.SUGDEL)
• Specify or increment “score” of each
item to create custom sortings
Autocomplete/Suggestions
The query will find a
chevy or a ford
not an explorer
optionally a truck
from 2001 to 2011
100 km from that locaiton (NYC)
with the condition tags of good or very good
Stop words – remove insignifigant words
Stemming – query ”going” matches “go” and “gone”
Slop – intervening words
Highlighting – retrieve the fragment of the full text surrounding the found words
Autocomplete is based on a Radix tree (illustrated)
Separate from the search indexes – completely optional and customizable based on behaviour or data
Scoring goes beyond simple alpha autocomplete
Has payloads to be able to have richer autocompletes
Three basic commands (SUGADD, SUGGET, SUGDEL) yields a very easy implementation