Building SaaS solutions with Apache Solr Alberto Mijares, Canoo Engineering AG alberto.mijares@canoo.com, 26/05/2011 Twitt...
<ul><li>Bullet point time! </li></ul>
What I Will Cover <ul><li>Practical applications of Apache Solr and Apache Lucene: how to increase the time spent by a use...
Who I am <ul><li>Alberto Mijares </li></ul><ul><li>Canoo Engineering AG </li></ul><ul><li>Background in web applications a...
Who is Canoo <ul><li>People: </li></ul><ul><ul><li>Dirk Koenig: Groovy founder </li></ul></ul><ul><ul><li>Andres Almiray: ...
Canoo FindIT <ul><li>http://www.canoo.com/videos/FindIT.html </li></ul>
<ul><li>Stop  “ bullet-pointing ” ! </li></ul>
The facts <ul><li>Axel Springer group is a market leader </li></ul>Bilanz, Handelszeitung and Stocks In Switzerland financ...
The facts <ul><li>Axel Springer group is a market leader </li></ul>Bilanz, Handelszeitung and Stocks In Switzerland financ...
The gap <ul><li>Make the online versions more profitable </li></ul>Make all newspapers  “ market leaders ”
The gap <ul><li>Make the online versions more profitable </li></ul>Make all newspapers  “ market leaders ”
The how <ul><li>Workshop </li></ul>“ Related articles ” “ Cross-selling ”
The how <ul><li>Workshop </li></ul>“ Related articles ” “ Cross-selling ”
The analysis <ul><li>Find a funding model </li></ul>Use Lucene ’ s  “ More like this ” Integrate back the suggestions Impl...
The analysis <ul><li>Find a funding model </li></ul>Use Lucene ’ s  “ More like this ” Integrate back the suggestions Impl...
The issues <ul><li>“ More like this ”  was  “ experimental ” </li></ul>Works out-of-the-box only in English Without  “ sem...
The issues <ul><li>“ More like this ”  was  “ experimental ” </li></ul>Works out-of-the-box only in English Without  “ sem...
The key
The key
The functional requirements <ul><li>Discover and index articles </li></ul>Extract only content Simple and flexible query s...
The functional requirements <ul><li>Discover and index articles </li></ul>Extract only content Simple and flexible query s...
The funding model
The business model <ul><li>SaaS </li></ul>
The  “ other ”  requirements <ul><li>Lucene-based analysis pipeline </li></ul>Web oriented platform Multi-application plat...
The  “ other ”  requirements <ul><li>Lucene-based analysis pipeline </li></ul>Web oriented platform Multi-application plat...
The search <ul><li>Wraps Lucene in a nice way </li></ul>It is mature and Open Source Supports scheduling, REST API, DIH,… ...
The search <ul><li>Wraps Lucene in a nice way </li></ul>It is mature and Open Source Supports scheduling, REST API, DIH… S...
The plan <ul><li>From POC to PROD in  “ 80 days ” </li></ul>
The plan <ul><li>From POC to PROD in  “ 80 days ” </li></ul>
The results <ul><li>Google analytics </li></ul>
The results <ul><li>Google analytics </li></ul>
The conclusions
The Q&A Thanks!
Sources <ul><li>Links </li></ul><ul><ul><li>http://people.canoo.com/share </li></ul></ul><ul><ul><li>http://www.canoo.com ...
Contact <ul><li>Alberto Mijares </li></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>Twitter: @lemaiol </li></...
Architecture <ul><li>Platform:  Apache Solr 1.4.1 </li></ul><ul><li>Architecture: </li></ul>Solr container Web container S...
Upcoming SlideShare
Loading in …5
×

Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

1,300
-1

Published on

See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011

SaaS applications have the advantage of remote web deployment that can be instantaneously be used
by potentially any consumer in internet, or of the cost reduction that a Web-based deployment
provides. The speaker explains in this talk the architecture of an innovative SaaS solution built for
Axel Springer media group (Switzerland). This application can extracting remotely the content of
multiple online newspaper articles, analyze them and classify them, determining which articles are
the most similar to a given one, and integrating back into the article to provide the user with a
“related articles” feature. The core components of the analysis process are: language-specific tools
(used to filter the superfluous language terms) and semantic knowledge bases (like Wikipedia, used
to enrich the indexed information with new context specific terms, or to disambiguate the extracted
terms). In a more technical layer, the speaker will explain the criteria to select the emerging
enterprise search framework Apache Solr as platform and how it reduced drastically the
development effort required.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,300
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

  1. 1. Building SaaS solutions with Apache Solr Alberto Mijares, Canoo Engineering AG alberto.mijares@canoo.com, 26/05/2011 Twitter: @lemaiol
  2. 2. <ul><li>Bullet point time! </li></ul>
  3. 3. What I Will Cover <ul><li>Practical applications of Apache Solr and Apache Lucene: how to increase the time spent by a user in an website and do website “ cross-selling ” . </li></ul><ul><li>Use case: how Canoo helped Axel Springer Switzerland to increased the page impressions, user permanence time and traffic in their financial online newspapers. </li></ul><ul><li>Key concepts: </li></ul><ul><ul><li>How to achieve this using Lucene & Solr </li></ul></ul><ul><ul><li>How to profit from a SaaS business model </li></ul></ul>
  4. 4. Who I am <ul><li>Alberto Mijares </li></ul><ul><li>Canoo Engineering AG </li></ul><ul><li>Background in web applications and standards: </li></ul><ul><ul><li>Participated in W3C Semantic Web interest group (SWEO) </li></ul></ul><ul><ul><li>Led web standards compliance tools development in the past (Web Accessibility and Mobile Web) </li></ul></ul><ul><ul><li>Led enterprise information retrieval projects in the recent past </li></ul></ul><ul><ul><li>Actually coaching Google Web Toolkit projects ’ development </li></ul></ul>
  5. 5. Who is Canoo <ul><li>People: </li></ul><ul><ul><li>Dirk Koenig: Groovy founder </li></ul></ul><ul><ul><li>Andres Almiray: Griffon project lead and Java Champion </li></ul></ul><ul><ul><li>Hamlet D ’ Arcy: Groovy committer and enthusiast </li></ul></ul><ul><ul><li>… almost 40 more top software engineers </li></ul></ul><ul><li>Products: </li></ul><ul><ul><li>WebTest: framework for web functional testing </li></ul></ul><ul><ul><li>RIA Suite (aka ULC): Java based RIA framework </li></ul></ul><ul><ul><li>FindIT: information retrieval and search tools </li></ul></ul><ul><ul><li>WMTrans: language analysis tools </li></ul></ul>
  6. 6. Canoo FindIT <ul><li>http://www.canoo.com/videos/FindIT.html </li></ul>
  7. 7. <ul><li>Stop “ bullet-pointing ” ! </li></ul>
  8. 8. The facts <ul><li>Axel Springer group is a market leader </li></ul>Bilanz, Handelszeitung and Stocks In Switzerland financials are important! Financial language is German Online media is the future
  9. 9. The facts <ul><li>Axel Springer group is a market leader </li></ul>Bilanz, Handelszeitung and Stocks In Switzerland financials are important! Financial language is German Online media is the future
  10. 10. The gap <ul><li>Make the online versions more profitable </li></ul>Make all newspapers “ market leaders ”
  11. 11. The gap <ul><li>Make the online versions more profitable </li></ul>Make all newspapers “ market leaders ”
  12. 12. The how <ul><li>Workshop </li></ul>“ Related articles ” “ Cross-selling ”
  13. 13. The how <ul><li>Workshop </li></ul>“ Related articles ” “ Cross-selling ”
  14. 14. The analysis <ul><li>Find a funding model </li></ul>Use Lucene ’ s “ More like this ” Integrate back the suggestions Implement a selection mechanism
  15. 15. The analysis <ul><li>Find a funding model </li></ul>Use Lucene ’ s “ More like this ” Integrate back the suggestions Implement a selection mechanism
  16. 16. The issues <ul><li>“ More like this ” was “ experimental ” </li></ul>Works out-of-the-box only in English Without “ semantics ” not always makes sense Indexing full pages produces noise
  17. 17. The issues <ul><li>“ More like this ” was “ experimental ” </li></ul>Works out-of-the-box only in English Without “ semantics ” not always makes sense Indexing full pages produces noise
  18. 18. The key
  19. 19. The key
  20. 20. The functional requirements <ul><li>Discover and index articles </li></ul>Extract only content Simple and flexible query service
  21. 21. The functional requirements <ul><li>Discover and index articles </li></ul>Extract only content Simple and flexible query service
  22. 22. The funding model
  23. 23. The business model <ul><li>SaaS </li></ul>
  24. 24. The “ other ” requirements <ul><li>Lucene-based analysis pipeline </li></ul>Web oriented platform Multi-application platform Reliable, fast and scalable Plan B?
  25. 25. The “ other ” requirements <ul><li>Lucene-based analysis pipeline </li></ul>Web oriented platform Multi-application platform Reliable, fast and scalable Plan B?
  26. 26. The search <ul><li>Wraps Lucene in a nice way </li></ul>It is mature and Open Source Supports scheduling, REST API, DIH,… Scalability out-of-the-box Well documented and has professional support
  27. 27. The search <ul><li>Wraps Lucene in a nice way </li></ul>It is mature and Open Source Supports scheduling, REST API, DIH… Scalability out-of-the-box Well documented and has professional support
  28. 28. The plan <ul><li>From POC to PROD in “ 80 days ” </li></ul>
  29. 29. The plan <ul><li>From POC to PROD in “ 80 days ” </li></ul>
  30. 30. The results <ul><li>Google analytics </li></ul>
  31. 31. The results <ul><li>Google analytics </li></ul>
  32. 32. The conclusions
  33. 33. The Q&A Thanks!
  34. 34. Sources <ul><li>Links </li></ul><ul><ul><li>http://people.canoo.com/share </li></ul></ul><ul><ul><li>http://www.canoo.com </li></ul></ul><ul><ul><li>http://www.canoo.net </li></ul></ul><ul><ul><li>http://www.leo.org </li></ul></ul><ul><ul><li>http://www.bilanz.ch </li></ul></ul><ul><ul><li>http://www.handelszeitung.ch </li></ul></ul><ul><ul><li>http://www.stocks.ch </li></ul></ul>
  35. 35. Contact <ul><li>Alberto Mijares </li></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>Twitter: @lemaiol </li></ul></ul>
  36. 36. Architecture <ul><li>Platform: Apache Solr 1.4.1 </li></ul><ul><li>Architecture: </li></ul>Solr container Web container Springer Solr Springer WebApp Customer 2 Solr Customer 2 WebApp Customer 3 Solr Customer 3 WebApp Extern access Intern access Requests
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×