SlideShare a Scribd company logo
1 of 57
Turning Search Upside Down 
with open source software 
Charlie Hull - Managing Director 
28th November 2014 
charlie@flax.co.uk 
www.flax.co.uk/blog 
+44 (0) 8700 118334 
Twitter: @FlaxSearch
Who are Flax? 
We design, build and support open source powered 
search applications
Who are Flax? 
We design, build and support open source powered 
search applications 
Based in Cambridge U.K., technology agnostic & 
independent – but open source exponents & committers
Who are Flax? 
We design, build and support open source powered 
search applications 
Based in Cambridge U.K., technology agnostic & 
independent – but open source exponents & committers 
UK Authorized Partner of
Who are Flax? 
We design, build and support open source powered 
search applications 
Based in Cambridge U.K., technology agnostic & 
independent – but open source exponents & committers 
UK Authorized Partner of 
Customers include Reed Specialist Recruitment, Mydeco, 
NLA, Gorkana, Financial Times, News UK, EMBL-EBI, 
Accenture, University of Cambridge, UK Government...
Who are Flax? 
We design, build and support open source powered 
search applications 
Based in Cambridge U.K., technology agnostic & 
independent – but open source exponents & committers 
UK Authorized Partner of 
Customers in recruitment, government, e-commerce, 
news & media, bioinformatics, consulting, law...
Who are Flax? 
We design, build and support open source powered 
search applications 
Based in Cambridge U.K., technology agnostic & 
independent – but open source exponents & committers 
UK Authorized Partner of 
Customers in recruitment, government, e-commerce, 
news & media, bioinformatics, consulting, law...
What I'll cover today 
Monitoring & classifying data with stored queries 
Media monitoring - how it used to be done 
What we can't change 
Turning search upside down with Flax's Luwak 
Case studies: 
Media monitoring in Scandinavia 
Who else is using Luwak? 
Conclusions
Monitoring & classifying with 
stored queries 
'Saved searches' / 'alerting' / 'stored profiles'
Monitoring & classifying with 
stored queries 
'Saved searches' / 'alerting' / 'stored profiles' 
Use cases: 
– Monitor news for client interests 
– Tag/classify incoming data according to certain rules
Monitoring & classifying with 
stored queries 
'Saved searches' / 'alerting' / 'stored profiles' 
Use cases: 
– Monitor news for client interests 
– Tag/classify incoming data according to certain rules 
Volume & speed 
– 10k → 1m stored queries 
– 10k → 1m new items a day 
– Latencies as low as 50-100ms
Media monitoring 
Provide a news monitoring service to clients
Media monitoring 
Provide a news monitoring service to clients 
Usually also provide a searchable news archive
Media monitoring 
Provide a news monitoring service to clients 
Usually also provide a searchable news archive 
People translate requirements & verify output...
Media monitoring 
Provide a news monitoring service to clients 
Usually also provide a searchable news archive 
People translate requirements & verify output... 
..but volume is handled by software
Media monitoring 
Provide a news monitoring service to clients 
Usually also provide a searchable news archive 
People translate requirements & verify output... 
..but volume is handled by software 
UK/Europe – Gorkana, Precise, Cision; 
USA – BurellesLuce
Media monitoring 
– how it used to be done 
Manual process of translating client interests into stored 
queries
Media monitoring 
– how it used to be done 
Manual process of translating client interests into stored 
queries 
– many pages of Boolean terms 
– 10k-250k characters 
– Evolved over many years, grown not built 
– Difficult to maintain and troubleshoot
Like this... 
(((";!MOBILE PHONE*"; OR ";PHONE MAST*"; OR ";HANDSET*"; OR ";CELL* PHONE*"; OR ";3G"; OR ";GPRS"; OR ";G.P.R.S"; 
OR ";!GENERAL !RADIO PACKET SERVICE*"; OR ";GSM"; OR ";G.S.M"; OR ";!GLOBAL SYSTEM FOR !MOBILE COMM*"; OR 
";HSDPA"; OR ";H.S.D.P.A"; OR ";HIGH SPEED DOWNLINK !PACKET ACCESS"; OR ";HSUPA"; OR ";H.S.U.P.A"; OR ";HIGH 
SPEED !UPLINK !PACKET ACCESS"; OR ";UMTS"; OR ";U.M.T.S"; OR ";MVNO"; OR ";M.V.N.O"; OR ";SMS"; OR ";SHORT 
MESSAGE !SERVICE*"; OR ";MMS"; OR ";!MULTIMEDIA MESSAGE !SERVICE*"; OR ";!MOBILES"; OR ";!CELLPHONE*"; OR ";! 
TELECOM*"; OR ";!LANDLINE*"; OR ";!TELEPHONE*"; OR ";PHONE*"; OR ";!TELEKOM*"; OR ";TELCO*"; OR ";VODAFONE"; OR 
";T-MOBILE"; OR ";TMOBILE"; OR ";!TELEFONICA"; OR ";BT"; OR ";!MOBILE USER*"; OR ";TEXT MESSAG*"; OR 
";SMARTPHONE*"; OR ";!VIRGIN !MEDIA*"; OR ";CABLE & !WIRELESS"; OR ";CABLE AND !WIRELESS";) W/48 
((";PROFIT*"; OR ";LOSS*"; OR ";BAN"; OR ";BANNED"; OR ";PREMIUM RATE*"; OR ";FINANC*"; OR ";!REFINANC*"; OR 
";OFFICE OF FAIR TRADING"; OR ";MERGER*"; OR ";!ACQUISIT*"; OR ";ACQUIR*"; OR ";TAKEOVER*"; OR ";BUYOUT*"; OR 
";BUY-OUT*"; OR ";NEW PRODUCT*"; OR ";INVEST*"; OR ";SHARES"; OR ";MARKET*"; OR ";ACCOUNT*"; OR ";MONEY"; OR 
";CASH*"; OR ";SECURIT*"; OR ";!ENTERPRIS*"; OR ";!BUSINESS*"; OR ";PRICE*"; OR ";JOINT*"; OR ";NEW VENTURE*"; OR 
";PRICING"; OR ";COST*"; OR ";CHAIRM?N"; OR ";APPOINT*"; OR ";!EXECUTIVE"; OR ";SALE*"; OR ";SELL*"; OR ";FULL 
YEAR"; OR ";REGULAT*"; OR ";!DIRECTIVE*"; OR ";LAW"; OR ";LAWS"; OR ";!LEGISLAT*"; OR ";GREEN PAPER"; OR ";WHITE 
PAPER*"; OR ";!MEDIAWATCH"; OR ";MORAL*"; OR ";ETHIC*"; OR ";ADVERT*"; OR ";AD"; OR ";ADS"; OR ";MARKETING"; OR 
";!COMPLAIN*"; OR ";MIS-SOLD"; OR ";MIS-SELL*"; OR ";SPONSOR"; OR ";COSTCUT*"; OR ";COST CUT*"; OR ";CUT* COST*"; 
OR ";FIBRE OPTIC*"; OR ";TAX"; OR ";TAXES"; OR ";TAXED"; OR ";EXPAND*"; OR ";!EXPANSION"; OR ";EMPLOY*"; OR 
";STAFF"; OR ";WORKER*"; OR ";SPOKESM?N"; OR ";DEBUT"; OR ";BRAND*"; OR ";DIRECTOR*";) OR ((";FAIR"; OR 
";UNFAIR"; OR ";%UNSCRUPULOUS"; OR ";NOT FAIR"; OR ";UNJUST*"; OR ";!PENALISE*";) W/12 (";CHARG*"; OR ";TARIFF*"; 
OR ";PRICE PLAN*"; OR ";GLOBAL";)))) AND NOT (";EXPRESS OFFER"; OR ";TIMES OFFER"; OR ";READER OFFER"; OR 
((";CALLS COST";) W/6 (";FROM A LANDLINE"; OR ";FROM LANDLINE*"; OR ";BT LANDLINE*";)))) 
…and that's an easy one!
Media monitoring 
– how it used to be done 
Complex pipelines with many sources monitored 
– print & scan & OCR 
– XML feeds (e.g. NLA media access in UK) 
– Web content, Twitter etc. 
Lots of search engines running in parallel 
– dtSearch on 100+ servers, then merged 
– 'Stored query' functionality built in e.g. Verity
What we can't change... 
Syntax of the stored queries (sometimes) 
– Rewriting these is a huge, risky job for the business
What we can't change... 
Syntax of the stored queries (sometimes) 
– Rewriting these is a huge, risky job for the business 
Volume keeps increasing
What we can't change... 
Syntax of the stored queries (sometimes) 
– Rewriting these is a huge, risky job for the business 
Volume keeps increasing 
The need to test new and old queries on archive data - 
using the same syntax
What we can't change... 
Syntax of the stored queries (sometimes) 
– Rewriting these is a huge, risky job for the business 
Volume keeps increasing 
The need to test new and old queries on archive data - 
using the same syntax 
Source data will often be low quality 
– Scans 
– PDF extracts 
– Re-typed XML
Speaking the same language 
Parse the old query syntax into a new form 
– Either on the fly (dtSearch, Verity) 
– Or off-line (Verity) 
– In theory we can translate any old syntax 
– A neutral syntax protects against lock-in 
• Why not?
Speaking the same language 
Parse the old query syntax into a new form 
– Either on the fly (dtSearch, Verity) 
– Or off-line (Verity) 
– In theory we can translate any old syntax 
– A neutral syntax protects against lock-in 
• Why not? 
Testing very, very important 
– Agree a test set 
– Watch for broken queries (in several ways) 
– Use the client's resources to help
Turning search upside down.. 
Docs 
Result 
QQueureyry Stored 
Queries
Turning search upside down.. 
Docs 
Result 
QQueureyry Stored 
Queries 
1 million queries 
Some 250k long 
Complex rules 
1 million new 
documents a 
day 
Within 5-100ms
Turning search upside down.. 
Docs 
Result 
QQueureyry Stored 
Queries 
1 million queries 
Some 250k long 
Complex rules 
1 million new 
documents a 
day 
$$$$$$ 
Within 5-100ms
Turning search upside down.. 
Docs 
Result 
QQueureyry Stored 
Queries 
1 million queries 
Some 250k long 
Complex rules 
1 million new 
documents a 
day 
$$$$$$ 
Within 5-100ms
Turning search upside down.. 
Docs 
QQueureyry Stored 
Queries 1. 
Pre 
Query 
Subset 
1 million queries 
Some 250k long 
Complex rules 
~200 
Doc 
1 million new 
documents a 
day
Turning search upside down.. 
Docs 
QQueureyry Stored 
Queries 1. 
Pre 
Query 
Subset 
Doc 
Result 
1 million queries 
Some 250k long 
Complex rules 
~200 
2. 
Search 
1 million new 
documents a 
day
Turning search upside down.. 
How to avoid running 1 million searches on 1 million 
documents?
Turning search upside down.. 
How to avoid running 1 million searches on 1 million 
documents? – don't run all of them! 
– 1. Presearch - Find the likely candidates first by 
searching the stored queries using the document
Turning search upside down.. 
• How to avoid running 1 million searches on 1 million 
documents? – don't run all of them! 
– 1. Presearch - Find the likely candidates first by 
searching the stored queries using the document 
– 2. Search - Then perform a normal search, but with 
far fewer queries, and do it in memory for speed
Introducing Luwak 
Based on a (slight) fork of Apache Lucene 
A library for efficiently running many stored queries 
Two stages: 
– Presearcher (find candidate queries) 
– Monitor Searcher (run candidate queries against a 
single document in-memory index) 
Very fast – 70-100k queries per second when first tested 
and 3-4 times faster now 
https://github.com/flaxsearch/luwak
The gory details... 
Custom QueryParsers for dtSearch, Verity VQL/OTL etc. 
Hacked Lucene to expose positional information 
LUCENE-3831, LUCENE-3827 in Lucene 4.x 
LUCENE-2878 'Nuke Spans' coming in Lucene 5.0 
Works with standard Lucene queries 
TermQuery, BooleanQuery, RegexpQuery... 
Can be extended to work with custom query types 
Lots (and lots) of performance tuning
Using Luwak in a larger system 
Run multiple instances of Luwak to handle throughput 
Use message passing platforms (e.g. RabbitMQ, Apache 
Kafka) 
Build a separate index of articles – a searchable archive – 
for testing new stored queries and/or for customers 
Use RESTful Web Service APIs everywhere 
Test against known matches from older systems
Likely problems 
Some old queries will be broken 
– Replicate the broken behaviour, or fix it?
Likely problems 
Some old queries will be broken 
– Replicate the broken behaviour, or fix it? 
Some queries will need troubleshooting 
– Queries can be very large 
– Possibly in a foreign language
Likely problems 
Some old queries will be broken 
– Replicate the broken behaviour, or fix it? 
Some queries will need troubleshooting 
– Queries can be very large 
– Possibly in a foreign language 
Performance tuning is hard 
– JVM settings, caching, storage types
Likely problems 
Some old queries will be broken 
– Replicate the broken behaviour, or fix it? 
Some queries will need troubleshooting 
– Queries can be very large 
– Possibly in a foreign language 
Performance tuning is hard 
– JVM settings, caching, storage types 
Remember this is a different system – 1 to 1 matching 
correspondence is impossible!
Case study – 
media monitoring in Scandinavia 
“The leading Danish provider of media intelligence, i.e. 
media search, media monitoring and media analysis” 
Old system based on Verity (monitoring) and Autonomy 
IDOL (archive search) 
Issues: 
Systems at full capacity - even with lots of hardware 
Archive search very, very slow 
Complex workflow with some very old parts (FTP, batch 
files...) 
Proprietary query format (Verity VQL)
Case study – 
media monitoring in Scandinavia 
Flax engaged in 2013 as part of a complete re-architecture 
Phased approach: 
1. Translate all old queries to new in-house syntax 'IQL' 
2. New monitoring pipeline (RabbitMQ, RESTful Web 
Services, Flax's Luwak) 
3. New archive search (Apache Solr)
Case study – 
media monitoring in Scandinavia 
Flax engaged in 2013 as part of a complete re-architecture 
Phased approach: 
1. Translate all old queries to new in-house syntax 'IQL' 
2. New monitoring pipeline (RabbitMQ, RESTful Web 
Services, Flax's Luwak) 
3. New archive search (Apache Solr) 
Performance/volume targets: 
Monitoring 20 articles/second, aim for 100 eventually 
100 qps for archive search, response within 250ms 
17000 stored queries, 85 million articles in archive
Case study – 
media monitoring in Scandinavia 
Monitoring using 2 servers, Archive search using 8 
servers (including failover) 
6 cores, 18GB per server 
launch more servers to scale the system
Case study – 
media monitoring in Scandinavia 
Monitoring using 2 servers, Archive search using 8 
servers (including failover) 
6 cores, 18GB per server 
launch more servers to scale the system 
All stored queries translated to new IQL syntax 
Including the ones Verity didn't tell us weren't working...
Case study – 
media monitoring in Scandinavia 
Monitoring using 2 servers, Archive search using 8 
servers (including failover) 
6 cores, 18GB per server 
launch more servers to scale the system 
All stored queries translated to new IQL syntax 
Including the ones Verity didn't tell us weren't working... 
Monitoring customers are being migrated in stages from 
Oct 2014, Archive search beta testing during Oct 2014, 
completion by end 2014
Case study – 
media monitoring in Scandinavia 
The benefits: 
– No vendor lock-in (open source, IQL) 
– Industry standard software (Apache Lucene/Solr, 
RabbitMQ) 
– Predictable scaling & costs
Who else is using Luwak? 
Bloomberg 
http://www.flax.co.uk/blog/2014/03/24/london-search-meetup-serious-solr-at-bloomberg-elasticsearch-Booz Allen Hamilton 
http://www.slideshare.net/BryanBende/realtime-inverted-search-nyc-aslug-oct-2014 
Early versions are installed at 
We know of users in other sectors e.g. recruitment 
We're hoping to make it a standard part of Lucene/Solr 
and Elasticsearch (i.e. to improve the Percolator) 
Apache Samza integration?https://github.com/romseygeek/samza-luwak
Conclusions 
To solve a hard problem, sometimes you need to turn it 
upside down
Conclusions 
To solve a hard problem, sometimes you need to turn it 
upside down 
When you migrate to open source, you can keep some of 
the parts of your system (if you must)
Conclusions 
To solve a hard problem, sometimes you need to turn it 
upside down 
When you migrate to open source, you can keep some of 
the parts of your system (if you must) 
Open source helps you scale - with predictable costs
Conclusions 
To solve a hard problem, sometimes you need to turn it 
upside down 
When you migrate to open source, you can keep some of 
the parts of your system (if you must) 
Open source helps you scale - with predictable costs 
Leading companies are doing it
Conclusions 
To solve a hard problem, sometimes you need to turn it 
upside down 
When you migrate to open source, you can keep some of 
the parts of your system (if you must) 
Open source helps you scale - with predictable costs 
Leading companies are doing it 
Go and turn Search Upside Down!
Another thing... 
BioSolr – a funded project to improve the way search 
technology is used in bioinformatics 
– EBI & NCBI are involved 
– http://www.flax.co.uk/blog/2014/10/02/biosolr-begins-with-a-workshop-day/ 
– Already producing Solr patches e.g SOLR-1387 
– Come and join us!
Thankyou! 
Any questions? 
charlie@flax.co.uk 
www.flax.co.uk/blog 
+44 (0) 8700 118334 
Twitter: @FlaxSearch

More Related Content

What's hot

Agile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAAgile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAPaolo Platter
 
Flink Case Study: OKKAM
Flink Case Study: OKKAMFlink Case Study: OKKAM
Flink Case Study: OKKAMFlink Forward
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databasesjexp
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j
 
How Apache Spark Changed the Way We Hire People with Tomasz Magdanski
How Apache Spark Changed the Way We Hire People with Tomasz MagdanskiHow Apache Spark Changed the Way We Hire People with Tomasz Magdanski
How Apache Spark Changed the Way We Hire People with Tomasz MagdanskiDatabricks
 
Scaling Your Architecture with Services and Events
Scaling Your Architecture with Services and EventsScaling Your Architecture with Services and Events
Scaling Your Architecture with Services and EventsRandy Shoup
 
GraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-DevelopmentGraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-Developmentjexp
 
Choosing the Right Open Source Database
Choosing the Right Open Source DatabaseChoosing the Right Open Source Database
Choosing the Right Open Source DatabaseAll Things Open
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB
 
Full Stack Graph in the Cloud
Full Stack Graph in the CloudFull Stack Graph in the Cloud
Full Stack Graph in the CloudNeo4j
 
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to MarketBusiness Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to MarketMongoDB
 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProopenminted_eu
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiTimothy Spann
 
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science teamLars Albertsson
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everythingLew Tucker
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Mats Uddenfeldt
 
Stored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s GuideStored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s GuideVoltDB
 

What's hot (20)

Agile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAAgile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKA
 
Flink Case Study: OKKAM
Flink Case Study: OKKAMFlink Case Study: OKKAM
Flink Case Study: OKKAM
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
 
How Apache Spark Changed the Way We Hire People with Tomasz Magdanski
How Apache Spark Changed the Way We Hire People with Tomasz MagdanskiHow Apache Spark Changed the Way We Hire People with Tomasz Magdanski
How Apache Spark Changed the Way We Hire People with Tomasz Magdanski
 
Scaling Your Architecture with Services and Events
Scaling Your Architecture with Services and EventsScaling Your Architecture with Services and Events
Scaling Your Architecture with Services and Events
 
GraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-DevelopmentGraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-Development
 
Choosing the Right Open Source Database
Choosing the Right Open Source DatabaseChoosing the Right Open Source Database
Choosing the Right Open Source Database
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 
Full Stack Graph in the Cloud
Full Stack Graph in the CloudFull Stack Graph in the Cloud
Full Stack Graph in the Cloud
 
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to MarketBusiness Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
 
Tech view on Regulatory Compliance
Tech view on Regulatory ComplianceTech view on Regulatory Compliance
Tech view on Regulatory Compliance
 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKPro
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Stored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s GuideStored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s Guide
 

Similar to Turning search upside down with powerful open source search software

Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldC4Media
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Kris Jack
 
Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014ALTER WAY
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...Lucidworks
 
Making the Most of Customer Data
Making the Most of Customer DataMaking the Most of Customer Data
Making the Most of Customer DataWSO2
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
Webinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptxWebinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptxCloudian
 
AdhearsionConf 2013 Keynote
AdhearsionConf 2013 KeynoteAdhearsionConf 2013 Keynote
AdhearsionConf 2013 KeynoteMojo Lingo
 
Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?C4Media
 
Move Fast in the Age of Uncertainty
Move Fast in the Age of UncertaintyMove Fast in the Age of Uncertainty
Move Fast in the Age of UncertaintyOptimizely
 
Republica 2014 open-source_in_the_wild
Republica 2014 open-source_in_the_wildRepublica 2014 open-source_in_the_wild
Republica 2014 open-source_in_the_wildAcquia
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesKrishna Sankar
 
Splunk live! Customer Presentation – Wellsfargo
Splunk live! Customer Presentation – WellsfargoSplunk live! Customer Presentation – Wellsfargo
Splunk live! Customer Presentation – WellsfargoSplunk
 
Splunk in Nordstrom: IT Operations
Splunk in Nordstrom: IT OperationsSplunk in Nordstrom: IT Operations
Splunk in Nordstrom: IT OperationsTimur Bagirov
 
Selling 0days to governments and offensive security companies
Selling 0days to governments and offensive security companiesSelling 0days to governments and offensive security companies
Selling 0days to governments and offensive security companiesMaor Shwartz
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionSplunk
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseSplunk
 

Similar to Turning search upside down with powerful open source search software (20)

Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real World
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
 
Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
 
Making the Most of Customer Data
Making the Most of Customer DataMaking the Most of Customer Data
Making the Most of Customer Data
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Webinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptxWebinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptx
 
A6 big data_in_the_cloud
A6 big data_in_the_cloudA6 big data_in_the_cloud
A6 big data_in_the_cloud
 
AdhearsionConf 2013 Keynote
AdhearsionConf 2013 KeynoteAdhearsionConf 2013 Keynote
AdhearsionConf 2013 Keynote
 
Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?
 
Move Fast in the Age of Uncertainty
Move Fast in the Age of UncertaintyMove Fast in the Age of Uncertainty
Move Fast in the Age of Uncertainty
 
Republica 2014 open-source_in_the_wild
Republica 2014 open-source_in_the_wildRepublica 2014 open-source_in_the_wild
Republica 2014 open-source_in_the_wild
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
 
Splunk live! Customer Presentation – Wellsfargo
Splunk live! Customer Presentation – WellsfargoSplunk live! Customer Presentation – Wellsfargo
Splunk live! Customer Presentation – Wellsfargo
 
Splunk in Nordstrom: IT Operations
Splunk in Nordstrom: IT OperationsSplunk in Nordstrom: IT Operations
Splunk in Nordstrom: IT Operations
 
Selling 0days to governments and offensive security companies
Selling 0days to governments and offensive security companiesSelling 0days to governments and offensive security companies
Selling 0days to governments and offensive security companies
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 

More from Charlie Hull

Lucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challengesLucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challengesCharlie Hull
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big dataCharlie Hull
 
Search Solutions 2015: Towards a new model of search relevance testing
Search Solutions 2015:  Towards a new model of search relevance testingSearch Solutions 2015:  Towards a new model of search relevance testing
Search Solutions 2015: Towards a new model of search relevance testingCharlie Hull
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015Charlie Hull
 
Bio solr building a better search for bioinformatics
Bio solr   building a better search for bioinformaticsBio solr   building a better search for bioinformatics
Bio solr building a better search for bioinformaticsCharlie Hull
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 

More from Charlie Hull (6)

Lucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challengesLucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challenges
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big data
 
Search Solutions 2015: Towards a new model of search relevance testing
Search Solutions 2015:  Towards a new model of search relevance testingSearch Solutions 2015:  Towards a new model of search relevance testing
Search Solutions 2015: Towards a new model of search relevance testing
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
 
Bio solr building a better search for bioinformatics
Bio solr   building a better search for bioinformaticsBio solr   building a better search for bioinformatics
Bio solr building a better search for bioinformatics
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 

Recently uploaded

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 

Turning search upside down with powerful open source search software

  • 1. Turning Search Upside Down with open source software Charlie Hull - Managing Director 28th November 2014 charlie@flax.co.uk www.flax.co.uk/blog +44 (0) 8700 118334 Twitter: @FlaxSearch
  • 2. Who are Flax? We design, build and support open source powered search applications
  • 3. Who are Flax? We design, build and support open source powered search applications Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers
  • 4. Who are Flax? We design, build and support open source powered search applications Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers UK Authorized Partner of
  • 5. Who are Flax? We design, build and support open source powered search applications Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers UK Authorized Partner of Customers include Reed Specialist Recruitment, Mydeco, NLA, Gorkana, Financial Times, News UK, EMBL-EBI, Accenture, University of Cambridge, UK Government...
  • 6. Who are Flax? We design, build and support open source powered search applications Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers UK Authorized Partner of Customers in recruitment, government, e-commerce, news & media, bioinformatics, consulting, law...
  • 7. Who are Flax? We design, build and support open source powered search applications Based in Cambridge U.K., technology agnostic & independent – but open source exponents & committers UK Authorized Partner of Customers in recruitment, government, e-commerce, news & media, bioinformatics, consulting, law...
  • 8. What I'll cover today Monitoring & classifying data with stored queries Media monitoring - how it used to be done What we can't change Turning search upside down with Flax's Luwak Case studies: Media monitoring in Scandinavia Who else is using Luwak? Conclusions
  • 9. Monitoring & classifying with stored queries 'Saved searches' / 'alerting' / 'stored profiles'
  • 10. Monitoring & classifying with stored queries 'Saved searches' / 'alerting' / 'stored profiles' Use cases: – Monitor news for client interests – Tag/classify incoming data according to certain rules
  • 11. Monitoring & classifying with stored queries 'Saved searches' / 'alerting' / 'stored profiles' Use cases: – Monitor news for client interests – Tag/classify incoming data according to certain rules Volume & speed – 10k → 1m stored queries – 10k → 1m new items a day – Latencies as low as 50-100ms
  • 12. Media monitoring Provide a news monitoring service to clients
  • 13. Media monitoring Provide a news monitoring service to clients Usually also provide a searchable news archive
  • 14. Media monitoring Provide a news monitoring service to clients Usually also provide a searchable news archive People translate requirements & verify output...
  • 15. Media monitoring Provide a news monitoring service to clients Usually also provide a searchable news archive People translate requirements & verify output... ..but volume is handled by software
  • 16. Media monitoring Provide a news monitoring service to clients Usually also provide a searchable news archive People translate requirements & verify output... ..but volume is handled by software UK/Europe – Gorkana, Precise, Cision; USA – BurellesLuce
  • 17. Media monitoring – how it used to be done Manual process of translating client interests into stored queries
  • 18. Media monitoring – how it used to be done Manual process of translating client interests into stored queries – many pages of Boolean terms – 10k-250k characters – Evolved over many years, grown not built – Difficult to maintain and troubleshoot
  • 19. Like this... (((";!MOBILE PHONE*"; OR ";PHONE MAST*"; OR ";HANDSET*"; OR ";CELL* PHONE*"; OR ";3G"; OR ";GPRS"; OR ";G.P.R.S"; OR ";!GENERAL !RADIO PACKET SERVICE*"; OR ";GSM"; OR ";G.S.M"; OR ";!GLOBAL SYSTEM FOR !MOBILE COMM*"; OR ";HSDPA"; OR ";H.S.D.P.A"; OR ";HIGH SPEED DOWNLINK !PACKET ACCESS"; OR ";HSUPA"; OR ";H.S.U.P.A"; OR ";HIGH SPEED !UPLINK !PACKET ACCESS"; OR ";UMTS"; OR ";U.M.T.S"; OR ";MVNO"; OR ";M.V.N.O"; OR ";SMS"; OR ";SHORT MESSAGE !SERVICE*"; OR ";MMS"; OR ";!MULTIMEDIA MESSAGE !SERVICE*"; OR ";!MOBILES"; OR ";!CELLPHONE*"; OR ";! TELECOM*"; OR ";!LANDLINE*"; OR ";!TELEPHONE*"; OR ";PHONE*"; OR ";!TELEKOM*"; OR ";TELCO*"; OR ";VODAFONE"; OR ";T-MOBILE"; OR ";TMOBILE"; OR ";!TELEFONICA"; OR ";BT"; OR ";!MOBILE USER*"; OR ";TEXT MESSAG*"; OR ";SMARTPHONE*"; OR ";!VIRGIN !MEDIA*"; OR ";CABLE & !WIRELESS"; OR ";CABLE AND !WIRELESS";) W/48 ((";PROFIT*"; OR ";LOSS*"; OR ";BAN"; OR ";BANNED"; OR ";PREMIUM RATE*"; OR ";FINANC*"; OR ";!REFINANC*"; OR ";OFFICE OF FAIR TRADING"; OR ";MERGER*"; OR ";!ACQUISIT*"; OR ";ACQUIR*"; OR ";TAKEOVER*"; OR ";BUYOUT*"; OR ";BUY-OUT*"; OR ";NEW PRODUCT*"; OR ";INVEST*"; OR ";SHARES"; OR ";MARKET*"; OR ";ACCOUNT*"; OR ";MONEY"; OR ";CASH*"; OR ";SECURIT*"; OR ";!ENTERPRIS*"; OR ";!BUSINESS*"; OR ";PRICE*"; OR ";JOINT*"; OR ";NEW VENTURE*"; OR ";PRICING"; OR ";COST*"; OR ";CHAIRM?N"; OR ";APPOINT*"; OR ";!EXECUTIVE"; OR ";SALE*"; OR ";SELL*"; OR ";FULL YEAR"; OR ";REGULAT*"; OR ";!DIRECTIVE*"; OR ";LAW"; OR ";LAWS"; OR ";!LEGISLAT*"; OR ";GREEN PAPER"; OR ";WHITE PAPER*"; OR ";!MEDIAWATCH"; OR ";MORAL*"; OR ";ETHIC*"; OR ";ADVERT*"; OR ";AD"; OR ";ADS"; OR ";MARKETING"; OR ";!COMPLAIN*"; OR ";MIS-SOLD"; OR ";MIS-SELL*"; OR ";SPONSOR"; OR ";COSTCUT*"; OR ";COST CUT*"; OR ";CUT* COST*"; OR ";FIBRE OPTIC*"; OR ";TAX"; OR ";TAXES"; OR ";TAXED"; OR ";EXPAND*"; OR ";!EXPANSION"; OR ";EMPLOY*"; OR ";STAFF"; OR ";WORKER*"; OR ";SPOKESM?N"; OR ";DEBUT"; OR ";BRAND*"; OR ";DIRECTOR*";) OR ((";FAIR"; OR ";UNFAIR"; OR ";%UNSCRUPULOUS"; OR ";NOT FAIR"; OR ";UNJUST*"; OR ";!PENALISE*";) W/12 (";CHARG*"; OR ";TARIFF*"; OR ";PRICE PLAN*"; OR ";GLOBAL";)))) AND NOT (";EXPRESS OFFER"; OR ";TIMES OFFER"; OR ";READER OFFER"; OR ((";CALLS COST";) W/6 (";FROM A LANDLINE"; OR ";FROM LANDLINE*"; OR ";BT LANDLINE*";)))) …and that's an easy one!
  • 20. Media monitoring – how it used to be done Complex pipelines with many sources monitored – print & scan & OCR – XML feeds (e.g. NLA media access in UK) – Web content, Twitter etc. Lots of search engines running in parallel – dtSearch on 100+ servers, then merged – 'Stored query' functionality built in e.g. Verity
  • 21. What we can't change... Syntax of the stored queries (sometimes) – Rewriting these is a huge, risky job for the business
  • 22. What we can't change... Syntax of the stored queries (sometimes) – Rewriting these is a huge, risky job for the business Volume keeps increasing
  • 23. What we can't change... Syntax of the stored queries (sometimes) – Rewriting these is a huge, risky job for the business Volume keeps increasing The need to test new and old queries on archive data - using the same syntax
  • 24. What we can't change... Syntax of the stored queries (sometimes) – Rewriting these is a huge, risky job for the business Volume keeps increasing The need to test new and old queries on archive data - using the same syntax Source data will often be low quality – Scans – PDF extracts – Re-typed XML
  • 25. Speaking the same language Parse the old query syntax into a new form – Either on the fly (dtSearch, Verity) – Or off-line (Verity) – In theory we can translate any old syntax – A neutral syntax protects against lock-in • Why not?
  • 26. Speaking the same language Parse the old query syntax into a new form – Either on the fly (dtSearch, Verity) – Or off-line (Verity) – In theory we can translate any old syntax – A neutral syntax protects against lock-in • Why not? Testing very, very important – Agree a test set – Watch for broken queries (in several ways) – Use the client's resources to help
  • 27. Turning search upside down.. Docs Result QQueureyry Stored Queries
  • 28. Turning search upside down.. Docs Result QQueureyry Stored Queries 1 million queries Some 250k long Complex rules 1 million new documents a day Within 5-100ms
  • 29. Turning search upside down.. Docs Result QQueureyry Stored Queries 1 million queries Some 250k long Complex rules 1 million new documents a day $$$$$$ Within 5-100ms
  • 30. Turning search upside down.. Docs Result QQueureyry Stored Queries 1 million queries Some 250k long Complex rules 1 million new documents a day $$$$$$ Within 5-100ms
  • 31. Turning search upside down.. Docs QQueureyry Stored Queries 1. Pre Query Subset 1 million queries Some 250k long Complex rules ~200 Doc 1 million new documents a day
  • 32. Turning search upside down.. Docs QQueureyry Stored Queries 1. Pre Query Subset Doc Result 1 million queries Some 250k long Complex rules ~200 2. Search 1 million new documents a day
  • 33. Turning search upside down.. How to avoid running 1 million searches on 1 million documents?
  • 34. Turning search upside down.. How to avoid running 1 million searches on 1 million documents? – don't run all of them! – 1. Presearch - Find the likely candidates first by searching the stored queries using the document
  • 35. Turning search upside down.. • How to avoid running 1 million searches on 1 million documents? – don't run all of them! – 1. Presearch - Find the likely candidates first by searching the stored queries using the document – 2. Search - Then perform a normal search, but with far fewer queries, and do it in memory for speed
  • 36. Introducing Luwak Based on a (slight) fork of Apache Lucene A library for efficiently running many stored queries Two stages: – Presearcher (find candidate queries) – Monitor Searcher (run candidate queries against a single document in-memory index) Very fast – 70-100k queries per second when first tested and 3-4 times faster now https://github.com/flaxsearch/luwak
  • 37. The gory details... Custom QueryParsers for dtSearch, Verity VQL/OTL etc. Hacked Lucene to expose positional information LUCENE-3831, LUCENE-3827 in Lucene 4.x LUCENE-2878 'Nuke Spans' coming in Lucene 5.0 Works with standard Lucene queries TermQuery, BooleanQuery, RegexpQuery... Can be extended to work with custom query types Lots (and lots) of performance tuning
  • 38. Using Luwak in a larger system Run multiple instances of Luwak to handle throughput Use message passing platforms (e.g. RabbitMQ, Apache Kafka) Build a separate index of articles – a searchable archive – for testing new stored queries and/or for customers Use RESTful Web Service APIs everywhere Test against known matches from older systems
  • 39. Likely problems Some old queries will be broken – Replicate the broken behaviour, or fix it?
  • 40. Likely problems Some old queries will be broken – Replicate the broken behaviour, or fix it? Some queries will need troubleshooting – Queries can be very large – Possibly in a foreign language
  • 41. Likely problems Some old queries will be broken – Replicate the broken behaviour, or fix it? Some queries will need troubleshooting – Queries can be very large – Possibly in a foreign language Performance tuning is hard – JVM settings, caching, storage types
  • 42. Likely problems Some old queries will be broken – Replicate the broken behaviour, or fix it? Some queries will need troubleshooting – Queries can be very large – Possibly in a foreign language Performance tuning is hard – JVM settings, caching, storage types Remember this is a different system – 1 to 1 matching correspondence is impossible!
  • 43. Case study – media monitoring in Scandinavia “The leading Danish provider of media intelligence, i.e. media search, media monitoring and media analysis” Old system based on Verity (monitoring) and Autonomy IDOL (archive search) Issues: Systems at full capacity - even with lots of hardware Archive search very, very slow Complex workflow with some very old parts (FTP, batch files...) Proprietary query format (Verity VQL)
  • 44. Case study – media monitoring in Scandinavia Flax engaged in 2013 as part of a complete re-architecture Phased approach: 1. Translate all old queries to new in-house syntax 'IQL' 2. New monitoring pipeline (RabbitMQ, RESTful Web Services, Flax's Luwak) 3. New archive search (Apache Solr)
  • 45. Case study – media monitoring in Scandinavia Flax engaged in 2013 as part of a complete re-architecture Phased approach: 1. Translate all old queries to new in-house syntax 'IQL' 2. New monitoring pipeline (RabbitMQ, RESTful Web Services, Flax's Luwak) 3. New archive search (Apache Solr) Performance/volume targets: Monitoring 20 articles/second, aim for 100 eventually 100 qps for archive search, response within 250ms 17000 stored queries, 85 million articles in archive
  • 46. Case study – media monitoring in Scandinavia Monitoring using 2 servers, Archive search using 8 servers (including failover) 6 cores, 18GB per server launch more servers to scale the system
  • 47. Case study – media monitoring in Scandinavia Monitoring using 2 servers, Archive search using 8 servers (including failover) 6 cores, 18GB per server launch more servers to scale the system All stored queries translated to new IQL syntax Including the ones Verity didn't tell us weren't working...
  • 48. Case study – media monitoring in Scandinavia Monitoring using 2 servers, Archive search using 8 servers (including failover) 6 cores, 18GB per server launch more servers to scale the system All stored queries translated to new IQL syntax Including the ones Verity didn't tell us weren't working... Monitoring customers are being migrated in stages from Oct 2014, Archive search beta testing during Oct 2014, completion by end 2014
  • 49. Case study – media monitoring in Scandinavia The benefits: – No vendor lock-in (open source, IQL) – Industry standard software (Apache Lucene/Solr, RabbitMQ) – Predictable scaling & costs
  • 50. Who else is using Luwak? Bloomberg http://www.flax.co.uk/blog/2014/03/24/london-search-meetup-serious-solr-at-bloomberg-elasticsearch-Booz Allen Hamilton http://www.slideshare.net/BryanBende/realtime-inverted-search-nyc-aslug-oct-2014 Early versions are installed at We know of users in other sectors e.g. recruitment We're hoping to make it a standard part of Lucene/Solr and Elasticsearch (i.e. to improve the Percolator) Apache Samza integration?https://github.com/romseygeek/samza-luwak
  • 51. Conclusions To solve a hard problem, sometimes you need to turn it upside down
  • 52. Conclusions To solve a hard problem, sometimes you need to turn it upside down When you migrate to open source, you can keep some of the parts of your system (if you must)
  • 53. Conclusions To solve a hard problem, sometimes you need to turn it upside down When you migrate to open source, you can keep some of the parts of your system (if you must) Open source helps you scale - with predictable costs
  • 54. Conclusions To solve a hard problem, sometimes you need to turn it upside down When you migrate to open source, you can keep some of the parts of your system (if you must) Open source helps you scale - with predictable costs Leading companies are doing it
  • 55. Conclusions To solve a hard problem, sometimes you need to turn it upside down When you migrate to open source, you can keep some of the parts of your system (if you must) Open source helps you scale - with predictable costs Leading companies are doing it Go and turn Search Upside Down!
  • 56. Another thing... BioSolr – a funded project to improve the way search technology is used in bioinformatics – EBI & NCBI are involved – http://www.flax.co.uk/blog/2014/10/02/biosolr-begins-with-a-workshop-day/ – Already producing Solr patches e.g SOLR-1387 – Come and join us!
  • 57. Thankyou! Any questions? charlie@flax.co.uk www.flax.co.uk/blog +44 (0) 8700 118334 Twitter: @FlaxSearch