Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
FIBEP World Media Intelligence Congress17-20 November 2015, ViennaFIBEP World Media Intelligence Congress17-20 November 20...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
About Infomedia
• Founded in 2003
• The leading Danish ...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
About Flax
• Founded in 2001 in Cambridge, U.K.
• Indep...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
The situation at Infomedia in 2013
• Very old media mon...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
What to do?
• Different upgrading options explored thro...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Defining the project with Flax
• Infomedia searched for...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Replacing Verity
• Verity replaced by Flax Monitor
– Pa...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Replacing Autonomy IDOL
• Autonomy IDOL replaced by Apa...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Benefits of the project
• Articles indexed and searchab...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Learnings/Where are we now?
• A challenging, complex, t...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Other lessons
• You can also keep your old query langua...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
@_FIBEP #_FIBEP #WMIC15Date of Presentation
Thanks for ...
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
@_FIBEP #_FIBEP #WMIC15Date of Presentation
Something e...
Upcoming SlideShare
Loading in …5
×

FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to a fast, scalable and flexible open-source platform

2,742 views

Published on

How media monitoring organisation Infomedia upgraded their Autonomy IDOL and Verity search engines to open source Apache Solr and Luwak

Published in: Software
  • Be the first to comment

  • Be the first to like this

FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to a fast, scalable and flexible open-source platform

  1. 1. FIBEP World Media Intelligence Congress17-20 November 2015, ViennaFIBEP World Media Intelligence Congress17-20 November 2015, Vienna www.wmicongress.com Speaker: Twitter: How Infomedia upgraded their closed-source search engine to a fast, scalable and flexible open-source platform Session Title: 2015-11-19 Kristian Schou, Infomedia & Charlie Hull, Flax @InfomediaDK @Flaxsearch Web: www.flax.co.uk
  2. 2. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna About Infomedia • Founded in 2003 • The leading Danish provider of media monitoring and media analysis • Largest and oldest Danish Media archive with access to approximately 75 million searchable articles @_FIBEP #_FIBEP #WMIC152015-11-19
  3. 3. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna About Flax • Founded in 2001 in Cambridge, U.K. • Independent, honest advice and analysis • Expert design & development, Apache Solr committers • Test-driven relevancy and performance tuning • Custom training & mentoring for your staff • Flexible support up to 24/7/365 with SLAs • Some of our clients: @_FIBEP #_FIBEP #WMIC152015-11-19 
  4. 4. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna The situation at Infomedia in 2013 • Very old media monitoring system based on Verity • Verity was put into production in 2001 at the company that would later become Infomedia! • Slightly less old installation of Autonomy IDOL used for Infomedia’s Media Archive • put into production at Infomedia in 2009/10 • Drawbacks: – Verity at almost max capacity needing constant attention – Old and complex workflow for receiving and processing articles – Different platforms for monitoring and archive searches meant we were ‘bi-lingual’, using two different query languages in-house. – Verity no longer supported by the owning company (HP) – Verity not scalable! @_FIBEP #_FIBEP #WMIC152015-11-19
  5. 5. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna What to do? • Different upgrading options explored throughout 2011-2012 • Upgrade everything to Autonomy IDOL? • Switch to other commercial search engine? • Go open-source? • Recommendations and internal testing drew us to Apache Solr, an open source enterprise search platform • Advantages: – Transparency (going from commercial to open-source) – Rapid maturity of Solr – development moving very fast – Large and active Solr Community – Customizability – Solr is known to be fast and highly scalable – No license fees @_FIBEP #_FIBEP #WMIC152015-11-19
  6. 6. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Defining the project with Flax • Infomedia searched for Solr expertise in Denmark/Scandinavia – could not find an option that we were comfortable with • Introduced to Flax through networking and recommendations – Experience from similar upgrade projects with Gorkana and AAP – Very impressed with Flax’s insight, knowledge and credentials – Actual committer to Apache Solr • Project began in autumn of 2013 with the goals of: – Building a completely new search architecture to replace Verity and IDOL – Defining Infomedia's own query language, IQL, owned and controlled by Infomedia – Translating old monitoring queries (app. 8.000) to this new IQL syntax @_FIBEP #_FIBEP #WMIC152015-11-19
  7. 7. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Replacing Verity • Verity replaced by Flax Monitor – Parses IQL to Lucene queries – Runs on 2 servers – Uses Luwak, Flax's 'stored search' library: • Built on Apache Lucene (as is Solr) • Also used by Bloomberg, Booz Allen Hamilton & others • In use for 1m stored searches (some 250k characters), 1m stories/day • 40x faster than Elasticsearch Percolator • Open source at https://github.com/flaxsearch/luwak @_FIBEP #_FIBEP #WMIC152015-11-19
  8. 8. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Result Query QueryStored Queries $$$
  9. 9. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Result Query QueryStored Queries 1 million queries Some 250k long Complex rules 1 million new documents a day $$$ Within 5-100ms
  10. 10. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Result Query QueryStored Queries 1 million queries Some 250k long Complex rules 1 million new documents a day $$$$$$ Within 5-100ms
  11. 11. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Result Query QueryStored Queries 1 million queries Some 250k long Complex rules 1 million new documents a day $$$$$$ Within 5-100ms
  12. 12. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Query QueryStored Queries 1. Pre Query Subset 1 million queries Some 250k long Complex rules ~200 Doc 1 million new documents a day
  13. 13. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Query QueryStored Queries 1. Pre Query Subset Result 1 million queries Some 250k long Complex rules ~200 2. Search Doc 1 million new documents a day
  14. 14. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Replacing Autonomy IDOL • Autonomy IDOL replaced by Apache Solr − Parses IQL to Lucene queries − SolrCloud distributes the index & queries across several servers − Setup: 75 million documents hosted on 8 servers, 6 cores/24GB memory and 125 GB storage per server − This setup is doubled to have full redundancy − Features added to standard Solr by Flax: • Custom highlighting, • Framework to handle multiple languages • Extended error logging • Cluster management • Performance enhancements for complex wildcard queries @_FIBEP #_FIBEP #WMIC152015-11-19
  15. 15. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Benefits of the project • Articles indexed and searchable within minutes of receiving them • New, much smarter tools for constructing and comparing monitoring queries • The Flax Monitor is an extremely smart and performant monitoring solution • Huge benefits from defining the Infomedia Query Language, IQL – Extremely enlightening and empowering process to analyze what we actually need from a query language – We fully understand and have documented how IQL works – IQL is designed to match Infomedia’s demands and preferences – We can revise and expand IQL as new needs and opportunities arrive – Not bound to any search platform. We can take it with us @_FIBEP #_FIBEP #WMIC152015-11-19
  16. 16. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Learnings/Where are we now? • A challenging, complex, time-consuming but ultimately rewarding project • The ripple effect – we have had to revisit and update a lot of legacy systems • Customization is great, but can also mean more specification • Open Source prevents lock-in but demands investment in education - otherwise it is still just a magic box • Flax‘s expert knowledge has been invaluable • A succesful migration • More than 90% of Infomedia’s monitoring queries have been migrated to IQL with practically no negative change in precision or recall • The collaboration with Flax continues • As Infomedia develops, so do new ideas and feature requests • A customized open source platform also means continuous improvement • Currently updating to Solr 5.3 • Still experimenting with different ways to scale our Solr installation @_FIBEP #_FIBEP #WMIC152015-11-19
  17. 17. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Other lessons • You can also keep your old query language - Flax have written dtSearch & Verity parsers for Lucene • Some of your old queries might not be working - e.g. Verity doesn't always tell you when queries are broken! • Open source can help future-proof your search - and you have control of the software • Engage with the open source community: - User groups - Mailing lists - Contribute back if you can @_FIBEP #_FIBEP #WMIC152015-11-19
  18. 18. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna @_FIBEP #_FIBEP #WMIC15Date of Presentation Thanks for listening - any questions? Kristian Schou, Infomedia & Charlie Hull, Flax @InfomediaDK @Flaxsearch Web: www.flax.co.uk
  19. 19. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna @_FIBEP #_FIBEP #WMIC15Date of Presentation Something else you might like Think outside the search box! 2DSearch is a patent pending, radical alternative to traditional keyword search. Instead of a one-dimensional search box, concepts are expressed and manipulated as objects on a two-dimensional canvas. So you spend less time worrying about Boolean strings, and more time creating semantically transparent queries and effective search strategies. Sign up to gain early access at www.2dsearch.com

×