APACHE SOLR CMS INTEGRATIONIngo RennerSoftware Engineer
we build smart.ID INFIELD DESIGNMAY.01.2013LUCENE/SOLR REVOLUTIONTYPO3 CMS and Solr. How we did it.APACHE SOLR CMS INTEGRA...
ABOUT IDWhat we do and who we do it for• Strategy Planning• Design• UX• Development & Integration
WHO IS THIS GUY?• Committer TYPO3 CMS• Committer and PMC member Apache Tika• Release Manager TYPO3 CMS 4.2• New San Franci...
TYPO3 CMS
TYPO3 CMS• Free and Open Source Enterprise CMS• Estimated 500,000+ installations worldwide• Over 6,000+ public extensions•...
TYPO3 COMMUNITY• Community driven development• Conferences in North America, Europe, Asia• Barcamps, Developer Days, Snowb...
SOLR & CMSINTEGRATION
Integration Challenges & SolutionsPAGE RENDERING• Different template engines• (too) flexible page rendering engine• Identif...
Integration Challenges & SolutionsINDEX QUEUE• Index Queue to track and index content• Record Monitor to update Index Queu...
Integration Challenges & SolutionsACCESS RIGHTS• Intranet, Extranet, ...• Not everybody may see everything• Flexible user ...
Integration Challenges & SolutionsSOLR ACCESS FILTER PLUGIN• Custom Solr access filter plugin• Query Parser and Filter• Use...
Integration Challenges & SolutionsFILE INDEXING• Finding file links in page content• Core file links vs. plugin file links• T...
Integration Challenges & SolutionsFILE INDEXING• File Detectors & File Index Queue• File system abstraction layer• Apache ...
Integration Challenges & SolutionsTHE REST• PHP people vs. Java technology• Talking to Solr• Learning from mistakes
Integration Challenges & SolutionsTHE REST• Fully automated bash install script• SolrPhpClient• Separate your languages
EXT:solr - Apache Solr for TYPO3FEATURES• Facetted Search• File Indexing• Multi-Language & Multi-Site Support• Did you mea...
we build smart.ID INFIELD DESIGNQUESTIONS?
ID INFIELD DESIGNwe build smart.THANKS.
ID INFIELD DESIGNwe build smart.T3CON North AmericaSan Francisco, May 30-3120% off regular ticket price, use:LUCENETYPO3IN...
CONFERENCE PARTYThe Tipsy Crow: 770 5th AveStarts after Stump The ChumpYour conference badge getsyou in the doorTOMORROWBr...
Cms integration of apache solr   how we did it.
Upcoming SlideShare
Loading in...5
×

Cms integration of apache solr how we did it.

1,403

Published on

Presented by Ingo Renner, Software Engineer, Infield Design

TYPO3 is an Open Source Content Management System that is very popular in Europe, especially in the German market, and gaining traction in the U.S., too.

TYPO3 is a good example of how to integrate Solr with a CMS. The challenges we faced are typical of any CMS integration. We came up with solutions and ideas to these challenges and our hope is that they might be of help for other CMS integrations as well.

That includes content indexing, file indexing, keeping track of content changes, handling multi-language sites, search and facetting, access restrictions, result presentation, and how to keep all these things flexible and re-usable for many different sites.

For all these things we used a couple additional Apache projects and we would like to show how we use them and how we contributed back to them while building our Solr integration.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,403
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Cms integration of apache solr how we did it."

  1. 1. APACHE SOLR CMS INTEGRATIONIngo RennerSoftware Engineer
  2. 2. we build smart.ID INFIELD DESIGNMAY.01.2013LUCENE/SOLR REVOLUTIONTYPO3 CMS and Solr. How we did it.APACHE SOLR CMS INTEGRATION
  3. 3. ABOUT IDWhat we do and who we do it for• Strategy Planning• Design• UX• Development & Integration
  4. 4. WHO IS THIS GUY?• Committer TYPO3 CMS• Committer and PMC member Apache Tika• Release Manager TYPO3 CMS 4.2• New San Franciscan• Snowboarding, mountain biking• Software Engineer, Architect at Infield Design- Caution -TYPO3-Evangelist
  5. 5. TYPO3 CMS
  6. 6. TYPO3 CMS• Free and Open Source Enterprise CMS• Estimated 500,000+ installations worldwide• Over 6,000+ public extensions• 6,000,000+ downloads• Content Management Framework• Multi-Site, Multi-Language, Versioning, Workflows, ...• Stable, Secure, Scaleable
  7. 7. TYPO3 COMMUNITY• Community driven development• Conferences in North America, Europe, Asia• Barcamps, Developer Days, Snowboard Tour• 4 times Google Summer of Code participant• Backed by TYPO3 Association• Several other projects under the TYPO3 brand
  8. 8. SOLR & CMSINTEGRATION
  9. 9. Integration Challenges & SolutionsPAGE RENDERING• Different template engines• (too) flexible page rendering engine• Identify relevant content on websites• Exclude navigation and common page elements• Content generated by plugins
  10. 10. Integration Challenges & SolutionsINDEX QUEUE• Index Queue to track and index content• Record Monitor to update Index Queue• Crawl pages, index unstructured content marked relevant• Exclude pages with plugin-generated content• Index structured plugin data directly from DB
  11. 11. Integration Challenges & SolutionsACCESS RIGHTS• Intranet, Extranet, ...• Not everybody may see everything• Flexible user groups and permissions• Permissions extended to sub-pages
  12. 12. Integration Challenges & SolutionsSOLR ACCESS FILTER PLUGIN• Custom Solr access filter plugin• Query Parser and Filter• User group IDs stored in documents• Current user’s groups submitted with query• Plugin matches document groups with user’s groups
  13. 13. Integration Challenges & SolutionsFILE INDEXING• Finding file links in page content• Core file links vs. plugin file links• Track files for indexing• Reading file content• Separate tools for different file formats
  14. 14. Integration Challenges & SolutionsFILE INDEXING• File Detectors & File Index Queue• File system abstraction layer• Apache Tika• Knows 1,200+ file formats, reads about half of them• Content & meta data extraction• Language detection
  15. 15. Integration Challenges & SolutionsTHE REST• PHP people vs. Java technology• Talking to Solr• Learning from mistakes
  16. 16. Integration Challenges & SolutionsTHE REST• Fully automated bash install script• SolrPhpClient• Separate your languages
  17. 17. EXT:solr - Apache Solr for TYPO3FEATURES• Facetted Search• File Indexing• Multi-Language & Multi-Site Support• Did you mean, More Like This• Search Word Highlighting• Auto Complete• Access Rights Support• Many More ...
  18. 18. we build smart.ID INFIELD DESIGNQUESTIONS?
  19. 19. ID INFIELD DESIGNwe build smart.THANKS.
  20. 20. ID INFIELD DESIGNwe build smart.T3CON North AmericaSan Francisco, May 30-3120% off regular ticket price, use:LUCENETYPO3INFIELD DESIGN is hiring!
  21. 21. CONFERENCE PARTYThe Tipsy Crow: 770 5th AveStarts after Stump The ChumpYour conference badge getsyou in the doorTOMORROWBreakfast starts at 7:30Keynotes start at 8:30CONTACT@irnnringo@typo3.org, ingo@apache.org

×