Infinispan isn't just a scalable key/value grid platform: it simplifies execution of distributed query tasks in Map/Reduce style, and it integrates a powerful indexing engine to run full-text searches and efficiently extract information from your largest data collections.
Infinispan's search engine is built around on Apache Lucene and directly exposes the Lucene API to its users to build powerful search applications which run on data sets residing in-memory.
In this talk Navin - author of the Infinispan Query module - will demonstrate how you can leverage the strength of full-text queries to efficiently perform searches on your data.
4. What are we talking about?
• What is Infinispan?
• The Query module
• Backend tech Hibernate Search & Apache Lucene
• Setup and configuration
• Demo and code walkthrough
5.
6. What is Infinispan?
• Distributed in-memory key/value data store
• Extension of java.util.Map
• Modes
• Library Embed into EE/SE application
• Server Connect remotely
7. Some features
• Fully transactional (JTA, XA)
• Hibernate 2nd level caching
• Full-text querying
• Non-JVM clients for server mode
8. How do I use it?
• Cache Sit in front of your NoSQL data store
• In-memory DB Primary data store is in memory
• Clusterability Manage state that is distributed
9. … but we have a problem here
• How do I find my data?
• I don’t want to give out
keys
• I might not know what I
need to find
10. Query module to the rescue
• Allows searching of values in the cache
• Original project: JBoss Cache Searchable in 2008
• Integration between Infinispan and Hibernate Search
• Became Query module in 2009
11. Full-text search
• Library example:
• Is author name: Surname, Name?
• Name, Surname?
• How do I deal with …
• Special characters?
• Typos?
12.
13.
14. Lucene
• Scalable high-performance indexing
• Small RAM requirement ~ 1MB heap
• Index size ~ 20-30% size of data
• 100% open source and written in Java
• Apache Licensing
• Ports to other languages exist
15. Lucene
• Optimised for searching and querying
• Rich feature-set for query types
• Typo-tolerant searches
• Similar keywords
• Document structure
• Unstructured data
• Documents stored in-memory or on disk
16. Two features we will look at
Facets
• Obtain counts, or frequencies
of a result
• O(1) to obtain counts
• EBay counts
Filters
• Filters are:
• Declarative
• Stacking
• Reusable
20. Programmatic Configuration
ConfigurationBuilder cb = new ConfigurationBuilder();
cb.indexing()
.enable()
.indexLocalOnly() // Will only index local node
.withProperties(properties);
EmbeddedCacheManager cm = new DefaultCacheManager(cb.build());
// My key is an int and value is of type Person
Cache<int, Person> cache = cm.getCache();
22. Running queries
// I have a cache instance which is not empty
SearchManager sm = Search.getSearchManager(cache);
QueryBuilder qb = sm.buildQueryForClass(Person.class)
.get();
Query q = qb.keyword().onField(“name”).matching(“Surtani”)
.createQuery();
CacheQuery cq = sm.getQuery(q, Person.class);
23.
24. How it all ties together …
• Web-application using Infinispan running on WildFly 9 CR
• App-server ships with Query module
• Use a web-form to vote in an ‘election’
• One vote for governor
• One vote for senator
25. Flow I: Query ‘warm-up’
• Story: ‘We don’t know who is running in the election’
• WebSocket endpoint to delegate to Worker object
• Worker object executes on CandidateCacheDao
• Returns results through WebSocket endpoint
26. Flow II: Voting form
• Story: ‘This is our ballot paper’
• Front-end creates JSON to go to WebSocket endpoint
• JSON gets parsed by BallotWorker object
• BallotWorker puts parsed JSON into Cache through VotingCacheDao
27. Flow III: Faceted search
• Story: ‘We want to know who has won the election’
• Front-end asks for the result of an election (governor or senator)
• ElectionResultWorker object runs a query through the
VotingCacheDao
• Result passed back to web-page as JSON
28. Flow III: Faceted search with Filter
• Story: ‘We would like to know who has received the most votes
in a particular region’
• Essentially the same workflow as III but we also pass a Filter to our query
• We are using the same query code, except we also filter out our results.