Search As A Service

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Search As A Service - Presentation Transcript

    1. Full­text searching with Marjory Markus Wolff    
    2. What's Marjory? A webservice for full­text indexing and   searching of documents Written in PHP  Based on Zend Framework  (Very) Roughly comparable to Solr  BSD­licensed, available on Google Code     
    3. How does Marjory work? Your application Sends search  Sends Document data Returns result in desired terms via GET or location via POST output format (default: XML) Marjory (ReST­based webservice) Stores document data Returns query Queries search in search engine results engine Search engine     (Default: Lucene)
    4. Features Search engine abstraction  use the engine that suits your needs, just write a   small adaptor class Zend_Search_Lucene built­in by default  Multiple search catalogs  Index many sites with one dedicated search server  Put all documents matching any criteria into   separate search indexes to speed up search    
    5. More features Two ways to index documents:  submit an XML snippet containing any content you   want to index or, just submit an URI (valid PHP stream resource)   and let Marjory extract the content from the  document HTML supported by default (for now)  add your own document parser class to extract plain text   from any other document format (or special markup  structures)    
    6. Even more features Index documents asynchronously using Dropr   as a messaging service Dropr: PHP­based durable messaging service  Example webservice and Dropr client included with   Marjory Application does not need to wait for document   retrieval, parsing and adding to the index More info: www.dropr.org     
    7. Latest additions Search results as a Dojo.Data compatible   JSON data source API exposure via JSON­RPC as alternative to   XML over ReST (experimental!)    
    8. How to add a catalog Send a POST request to:  http://marjory.example.com/rest/catalog/ Containing this XML snippet:  <add catalog=\"MyGloriousCatalog\" /> Et voilá, you got yourself a new search index     
    9. Adding a document Make a POST request to:  http://marjory.example.com/rest/add/ Send the document content as XML like this:  <add catalog=\"default\"> <doc uri=\"MyUniqueDocumentId\">     <field name=\"title\">Marjory: Search as a service</field>     <field name=\"abstract\"> An epic novel about full­text indexing in an SOA environment     </field>     <field name=\"content\">Lorem ipsum dolor sit amet... (to be continued)</field>  </doc> </add>    
    10. Adding a document, the easy way Or, if Marjory should retrieve and parse the   document: <add catalog=\"default\">   <doc src=\"http://my.website.tld/my/document.html\" /> </add> If you have many and/or complex documents,   better use Dropr to send messages to Marjory    
    11. Searching for documents Make a GET request including the query terms:  http://marjory.example.com/rest/select?q=Marjory Additional parameters to...  Limit number of results  Include only specific fields in response  Specify a search catalog  Default catalog name: „default“ ­ who would have   guessed?    
    12. Search response format <?xml version=\"1.0\" encoding=\"UTF­8\"?> <response>   <responseHeader>     <status>0</status><QTime>1</QTime>   </responseHeader>   <result numFound=\"2\" start=\"0\">    <doc>     <str name=\"id\">MA147LL/A</str>     <str name=\"name\">Apple 60 GB iPod Black</str>    </doc>    <doc>     <str name=\"id\">EN7800GTX/2DHTV/256M</str>     <str name=\"name\">ASUS Extreme N7800GTX</str>    </doc>   </result> </response>    
    13. Looks familiar? Blatantly stolen from Solr :­)  Why reinvent the wheel?  Makes switching between the two projects easy   if need be Don't like it? Try JSON­RPC instead.     
    14. Access control No access control provided by Marjory  Use your webserver's authentication and ACL   capabilities There are currently no plans to add anything   built­in, unless someone convinces me  otherwise :­)    
    15. Things to do Fully unit­test the beast  Add a nice admin GUI (currently in progress)  Add other engines  Support more document formats out of the box  (PDF likely to be next addition) Fine­tuning (how about renaming or removing   catalogs, for example?)    
    16. Is it production­ready? Yes, and it's already being used on production   websites    
    17. That's all, folks! More information:  http://code.google.com/p/marjory/  http://www.dropr.org/  http://blog.wolff­hamburg.de/     
    SlideShare Zeitgeist 2009

    + Markus WolffMarkus Wolff Nominate

    custom

    313 views, 0 favs, 0 embeds more stats

    A presentation I gave at the International PHP Conf more

    More info about this document

    CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

    Go to text version

    • Total Views 313
      • 313 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 9
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories