• Like

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

In Search Of: Integrating Site Search (PHP Barcelona)

  • 3,427 views
Uploaded on

Despite being a key method of navigation on many sites, search functionality often gets the short end of the stick in development, either by handing the job over to Google or just enabling full text …

Despite being a key method of navigation on many sites, search functionality often gets the short end of the stick in development, either by handing the job over to Google or just enabling full text search on the appropriate column in the database. In this talk we will look at how full text search actually works, how to integrate local text search engines into your PHP application, and how it’s possible to actually provide better and more relevant results than Google itself, at least for your own site.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
3,427
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
68
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. In Searchsite search integrating Of... Ian Barber @ianbarber http://phpir.com ian@ibuildings.com Friday, 29 October 2010
  • 2. How Search Works Integrating Search Improving Results Using Search Search Performance Questions 2 Friday, 29 October 2010
  • 3. 3 Friday, 29 October 2010
  • 4. Query Query Query Query Query Parser Result Result Result Result Index Analyser Document Document Document Document 4 Friday, 29 October 2010
  • 5. Tokenisation “ With AT&T’s help, the F.B.I Miami-Dade office had recovered $1.1 million from O’Healy’s Ponzi scheme, 10-15% more than ” expected. 5 Friday, 29 October 2010
  • 6. PHP Tokenisation function tokenise($string) { $string = strtolower($string); preg_match_all('/w+/', $string, $matches, PREG_OFFSET_CAPTURE); return $matches[0]; } 6 Friday, 29 October 2010
  • 7. Document Term Pairs Document ID Term 1 the 1 best 1 of 1 the ... ... 204 and 204 what 204 would 7 Friday, 29 October 2010
  • 8. Inverted Index Term Documents best 1 (4, 16), 4 (422), 129 (344) ... what 24 (50, 98), 75 (33, 208) ... would 99 (32, 599), 201 (344) .. ... ... 8 Friday, 29 October 2010
  • 9. Boolean Query Merge Query: Best Western Hotel best 1 4 129 298 305 338 western 4 95 194 204 298 305 working 4 298 305 hotel 2 40 200 298 355 402 Result: Document 298 9 Friday, 29 October 2010
  • 10. Lorem ipsum dolor sit amet, Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodalesipsum. Aliquam vel condimentum Lorem ipsum dolor sit amet, quis neque. ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur adipiscing elit. Sed sit amet ante consectetur elit metus. Nulla eleifend Curabitur ornare feugiat ornare. Donec vitae enim elementum semper sodales quis consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum ipsum. Aliquam vel condimentum neque. vestibulum, justo vel egestas elementum, tincidunt massa et euismod. Vestibulum sit amet, Lorem ipsum dolor Curabitur ornare feugiat ornare. Donec vestibulum, justo consectetur elementum,elit.enim sit ametquam, vel gravida est vel egestas adipiscing purus Sed ornare ante consectetur elit metus. Nulla eleifend purus enim ornarevitae enim elementum sempernibh. quam, vel gravida est vel sodales quis enim tincidunt massa et euismod. Vestibulum Lorem ipsum dolor sit amet, consectetur enim vel nibh. Lorem ipsum dolor ipsum. Aliquam vel condimentum neque. fringillavestibulum, justo vel egestas elementum, sit amet, Nam non eros nisi, eget justo. consectetur adipiscingCurabitur sit ametfeugiat ornare. Donec mauris vehicula enim ornare quam, vel gravida est elit. Sed ornare ante purus adipiscing elit. Sed sit amet ante vitae enim vitae enim elementum consectetur elitjusto.Fusce vel risus vitae Nam non eros nisi,semper sodalesmetus. Nulla eleifend eget fringilla quis enim vel nibh. Fusce vel risus condimentum neque. facilisis sit amet in mi. Nulla ut turpis id ipsum. Aliquam velvitae maurismassa et euismod. Vestibulum tincidunt vehicula elementum semper sodales quis ipsum. Aliquam facilisis sit amet in mi. Nulla ut turpis felis sollicitudin dictum sed nonNam non eros nisi, eget fringilla justo. Curabitur ornare feugiat ornare. Donec velid vestibulum, justo egestas elementum, ipsum. Praesent gravida nulla, sed blandit leo. ut risus est Lorem ipsum dolor sit amet, Lorem ipsum dolor sit amet, consectetur elit metus.purus enim ornare quam, vel volutpat laoreet lacus,Fusce vel risus vitae mauris vehicula felis sollicitudin dictum sed non ipsum. Nulla eleifend vel condimentum neque. Curabitur ornare enim Vestibulum Curabitur ut consectetur adipiscing elit. Sed sit amet ante consectetur adipiscing elit. Sed sit amet ante tincidunt massa risus nulla, sed nibh. leo.consectetur arcu vestibulum vel.facilisis sit amet in mi. Nulla ut turpis id Praesent ut et euismod.vel blandit ut sodales Donec Curabitur volutpat laoreet lacus, vitae enim elementum semper vitae enim elementum semper sodales quis quis felis sollicitudin dictum sed non ipsum. vestibulum, justo vel egestas elementum, dapibus fringilla arcu, et semper lacus feugiat ornare. Donec consectetur elit metus. Nam non vel. ipsum. Aliquam vel condimentumLorem ipsumut risussit amet, blandit leo. consectetur arcu vestibulumeros nisi, eget fringilla justo. purus enim ornare quam, vel gravida est Donec ipsum. Praesent vel condimentum neque. neque. Aliquam dolor nulla, sed arcu, vel risusCurabitur ornare feugiat ornare.consectetur adipiscing elit. Sed Donec ut Curabitur ornare volutpat laoreetsit amet ante Donec enim dapibus fringilla Fusce et sempervitae mauris vehicula vel nibh. lacus Curabitur feugiat ornare. lacus, consectetur elitut turpisNulla eleifendenim elementumNulla eleifend quis metus. id consectetur elit metus. semper sodales Donec Nulla eleifend tincidunt massa et euismod. facilisis sit amet in mi. Nulla vitae consectetur arcu vestibulum vel. tincidunt massa et euismod. Vestibulum massa et euismod. Vestibulum lacus tincidunt Nam non eros nisi, eget fringilla justo. dictum sed non ipsum. felis sollicitudin ipsum. dapibus fringilla arcu, et semper Aliquam vel condimentum neque. vestibulum, justo vel egestas elementum, ornare vel egestas elementum, vestibulum, justo feugiat ornare. Donec Vestibulum vestibulum, justo vel egestas Fusce vel risus vitae mauris vehicula nulla, sed blandit leo. Praesent ut risus purus Curabitur Curabitur volutpat enim ornare quam, vel gravidaenim ornare quam, vel gravida est purus est elit metus. Nulla eleifend facilisis sit amet in mi. Nulla ut turpis id laoreet lacus, ut consectetur enim vel nibh.vel. Donec consectetur arcu vestibulum enim vel nibh. et euismod. Vestibulum elementum, purus enim ornare quam, vel felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, dapibus fringilla arcu, et semper lacus sed blandit leo. tincidunt massa vestibulum, justo vel egestas elementum, Nam non eros nisi, eget fringilla justo. eros nisi, eget fringilla justo.est Nam non ornare quam, vel gravida gravida est enim vel nibh. Curabitur volutpat laoreet lacus, ut purus enim Fusce vel risus vitae mauris vehicula vel nibh. vitae mauris vehicula Fusce vel risus enim Lorem ipsum dolor sit amet, vel. Donec consectetur arcu vestibulum facilisis sit amet in mi. Nulla ut turpis id amet in mi. Nulla ut turpis id facilisis sit consectetur adipiscing elit.et semper lacus sollicitudin dictum sed non ipsum. dapibus fringilla arcu, Sed sit amet ante felis felis sollicitudin dictum sed non ipsum. Nam non eros nisi, eget fringilla justo. Nam non eros nisi, eget fringilla justo. Fusce vel vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Praesent ut risus nulla, sed blandit leo. utrisus vitae mauris vehicula Praesent risus nulla, sed blandit leo. Fusce vel Curabitur volutpat laoreet lacus, ut Curabitur volutpat laoreet lacus, ut Curabitur ornare feugiat ornare. Donec consectetur arcu vestibulum vel. Donec sit arcu vestibulum vel. turpis id facilisis amet in mi. Nulla ut risus vitae mauris vehicula facilisis sit amet in Lorem ipsum dolor sit amet, consectetur consectetur elit metus. Nulla eleifendadipiscing elit. Sed sit amet ante felis sollicitudin dictum sed non ipsum. consectetur Donec Lorem ipsum dolor sit amet, dapibus fringilla arcu, etLorem ipsum dolor sit amet, et semper lacus semper lacus fringilla nulla, sed blandit leo. dapibus ut risus arcu, consectetur adipiscing enimSed vitae elit. elementum ante quis Praesent tincidunt massa et euismod. Vestibulumsit amet semper sodalesconsectetur adipiscing elit. Sed sit amet ante mi. Nulla ut turpis id felis sollicitudin dictum vestibulum, justo vel egestas elementum, vitae enim elementum semper sodales quis vitae Curabitur volutpat laoreet lacus, ut ipsum. Aliquam vel condimentum neque. enim elementum semper sodales quis purus enim ornare quam,vel condimentum feugiat ornare. Donec Curabitur ornare neque. vel gravida est consectetur arcu vestibulum vel. Donec sed non ipsum. Praesent ut risus nulla, sed ipsum. Aliquam enim vel nibh. Curabitur ornare feugiat ornare. metus. ipsum. Aliquam vel condimentum neque. consectetur elit Donec Nulla eleifend Curabiturdapibus feugiat ornare.et semper lacus ornare fringilla arcu, Donec tincidunt massa et euismod. Vestibulum blandit leo. Curabitur volutpat laoreet lacus, ut consectetur elit metus. Nulla eleifend vestibulum,Loremvel egestas elementum, Nam non eros nisi, eget fringilla justo. justo ipsum dolor sit amet, tincidunt massa et euismod. Vestibulum consectetur elit metus. Nulla eleifend tincidunt ipsum dolor sit amet, Lorem massa et euismod. Vestibulum purus enim ornare quam, vel gravidaSed sit amet ante vel egestas elementum, consectetur adipiscing elit. est Fusce vel risus vitaejusto vel egestas elementum, vestibulum, mauris vehicula consectetur arcu vestibulum vel. Donec dapibus enim vel vitae enim est nibh. sit amet in ornare quam, vel id vestibulum, justo consectetur adipiscing elit. Sed sit amet ante facilisis purus enim mi. Nulla ut turpisgravida elementum semper sodales quis purus enim ornare quam, vel gravida est vitae enim elementum semper sodales quis felis sollicitudin dictum sed non ipsum. Aliquam vel condimentum vel nibh. vel condimentum neque. enim vel nibh. ipsum. enim neque. fringilla arcu, et semper lacus egestas non. Praesent ut risus nulla, sed blandit leo. nisi, eget fringilla Nam non eros ipsum. Aliquam Curabitur ornare feugiatjusto. Donec ornare. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend Curabitur volutpateros nisi,lacus, fringilla vitae mauris vehicula consectetur elit metus. Nulla eleifend Nam non laoreet egetvel risus justo. Fusce ut Nam non eros nisi, eget fringilla justo. Quisque eu purus ut lacus egestas dapibus. consectetur arcu vestibulum vel. Donec inmassa et euismod. Vestibulum tincidunt Fusce vel risus vitae mauris amet mi. Nulla ut turpis tincidunt massavitae mauris Vestibulum facilisis sit vehicula id Fusce vel risus et euismod. vehicula felis sollicitudin dictum vel egestas vestibulum,amet in mi. Nulla elementum, vestibulum, justo dapibus fringilla arcu, et semper lacus turpis id sed non ipsum. facilisis sit amet in mi. Nulla ut elementum, facilisis sit justo vel egestas ut turpis id Integer in velit id est dictum bibendum in id mi. purus enim ornareblandit vel gravida est felis sollicitudin dictum sed non ipsum. sed Praesent ut risus nulla, enim vel nibh. quam, leo. purus enim ornare quam, velnon ipsum. felis sollicitudin dictum sed gravida est Praesent ut risus Curabitur volutpat laoreet lacus, ut enim vel ut risus nulla, sed blandit leo. nulla, sed blandit leo. Praesent nibh. consectetur arcu vestibulum vel. Donec Curabitur volutpat laoreet lacus, ut Curabitur volutpat laoreet lacus, ut dapibus Nam nonarcu, nisi, eget fringilla justo. arcu vestibulum vel. Donec fringilla eros consectetur arcu vestibulum vel. Donec et semper lacus consectetur Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula dapibus fringilla arcu, et semper lacus Fusce velfringilla arcu, et semper lacus dapibus risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus dapibus fringilla arcu, et semper lacus Friday, 29 October 2010
  • 11. TF-IDF function getWeight($docID, $term, $total) { $tf = count($term[$docID]); $idf = log($total / count($term), 2); return $tf * $idf; } 11 Friday, 29 October 2010
  • 12. Document Vector socket what heavy steel ... Doc 1 0.02 0.3 0.001 0 ... Doc 2 0 0 0 0 ... Doc 3 0.001 0.2 0 0 ... Doc 4 0 0 0.002 0.003 ... 12 Friday, 29 October 2010
  • 13. Ranked Query Merge best 23 42 179 246 333 703 weight 0.008 0.002 0.023 0.039 0.014 0.001 western 42 88 120 179 246 798 weight 0.003 0.004 0.023 0.001 0.034 0.004 1 - 246: 0.073 2 - 179: 0.024 3 - 120: 0.023 13 Friday, 29 October 2010
  • 14. PHP Similarity function score($queryString, $index) { $query = tokenize($queryString); $matches = array(); foreach($query as $qterm) { $postings = $index[$qterm]; foreach($postings as $id => $posting) { $matches[$id] += $posting['score']; } } return arsort($matches); } 14 Friday, 29 October 2010
  • 15. Integrating Search 15 Friday, 29 October 2010
  • 16. MySQL Full Text Search CREATE TABLE example ( id INT(11) NOT NULL auto_increment, title VARCHAR(255), content TEXT, PRIMARY KEY(id), FULLTEXT(title,content) ) Engine=MyISAM; INSERT INTO example (title, content) VALUES ('Mikko & Bacon','Mikko loves bacon'), ('Marcello & Bacon','Marcello hates bacon'), ('Jo & Sausages','Johanna loves sausages'), ('Hollywood & Garlic','Lorenzo hates garlic'), ('James & Cheddar','James is keen on cheeses'); 16 Friday, 29 October 2010
  • 17. MySQL FTI Query SELECT * FROM example WHERE MATCH(title,content) AGAINST('loves bacon'); +----+------------------+------------------------+ | id | title | content | +----+------------------+------------------------+ | 1 | Mikko & Bacon | Mikko loves bacon | | 2 | Marcello & Bacon | Marcello hates bacon | | 3 | Jo & Sausages | Johanna loves sausages | +----+------------------+------------------------+ 3 rows in set (0.00 sec) 17 Friday, 29 October 2010
  • 18. Sphinx http://www.sphinxsearch.com 18 Friday, 29 October 2010
  • 19. Sphinx Configuration source posts { type = mysql sql_host = localhost sql_user = user sql_pass = password sql_db = search sql_query = SELECT id, title, content FROM example; sql_attr_multi = uint tag from query; SELECT example_id, tag_id FROM tags; } 19 Friday, 29 October 2010
  • 20. index posts { source = posts path = /var/data/sphinx/example morphology = stem_en min_word_len = 3 min_prefix_len = 3 min_infix_len = 0 enable_star = 1 } 20 Friday, 29 October 2010
  • 21. Stemming http://tartarus.org/~martin/PorterStemmer happening - happen happened - happen happens - happen 21 Friday, 29 October 2010
  • 22. Command Line Searching indexer --config /etc/sphinx.conf --all search --config /etc/sphinx.conf love bacon displaying matches: 1. document=1, weight=3, tag=(1,2) ! id=1 ! title=Mikko & Bacon ! content=Mikko loves bacon words: 1. 'love': 2 documents, 2 hits 2. 'bacon': 2 documents, 4 hits searchd --config /etc/sphinx.conf 22 Friday, 29 October 2010
  • 23. Sphinx From PHP $cl = new SphinxClient(); $cl->SetServer('localhost', 3312); $cl->SetMatchMode(SPH_MATCH_ANY); $result = $cl->Query('bac*'); $docIDs = array_keys($result["matches"]); $cl->SetFilter('tag', array(1)); $result = $cl->Query('bac*'); $docIDs = array_keys($result["matches"]); 23 Friday, 29 October 2010
  • 24. Swish-E http://swish-e.org pecl install swish-beta 24 Friday, 29 October 2010
  • 25. Filesystem Index With Swish-E /usr/local/bin/swish-e -S fs -c fs-swish-e.conf fs-swish-e.conf IndexDir /var/data/documents IndexFile fs-swish-e.index IndexOnly .doc .docx .pdf FuzzyIndexingMode Stemming_en1 FileFilter .pdf /usr/local/bin/swish_filter.pl FileFilter .doc /usr/local/bin/swish_filter.pl Friday, 29 October 2010
  • 26. Crawling Content /usr/local/bin/swish-e -S prog -c www-swish-e.conf www-swish-e.conf IndexDir /usr/local/lib/swish-e/spider.pl IndexFile www-swish-e.index SwishProgParameters default http://phpir.com/ FuzzyIndexingMode Stemming_en1 DefaultContents HTML Friday, 29 October 2010
  • 27. Swish-E With Multiple Indices $swish = new Swish( 'www-swish-e.index fs-swish-e.index' ); $search = $swish->prepare(); $queryStr = 'search string goes here'; $result = $search->execute($queryStr); $total = $result->hits; while($r = $result->nextResult()) { echo $r->swishdocpath; // url } Friday, 29 October 2010
  • 28. Lucene 28 Friday, 29 October 2010
  • 29. $index = Zend_Search_Lucene::create('idx'); foreach($documents as $title => $content) { $doc = new Zend_Search_Lucene_Document(); $doc->addField( Zend_Search_Lucene_Field::Text( 'title', $title)); $doc->addField( Zend_Search_Lucene_Field::UnStored( 'content', $content)); $index->addDocument($doc); } Build Index 29 Friday, 29 October 2010
  • 30. $results = $index->find('loves bacon'); foreach($results as $result) { echo $result->score, " "; echo $result->title, "n"; } Output: 0.81656279309067 Mikko and Bacon 0.24800278854758 Marcello & Bacon Query Zend Search Lucene 30 Friday, 29 October 2010
  • 31. $file = file_get_contents($url); $doc = Zend_Search_Lucene_Document_Html:: loadHTML($file); $doc->addField( Zend_Search_Lucene_Field::Text( 'url', $url ); $index->addDocument($doc) Index HTML 31 Friday, 29 October 2010
  • 32. Solr http://lucene.apache.org/solr/ 32 Friday, 29 October 2010
  • 33. Solr Search Index $options = array( 'hostname' => 'localhost', 'port' => 8983 ); $client = new SolrClient($options); $doc = new SolrInputDocument(); $doc->addField('id', $id); $doc->addField('cat', $category); $doc->addField('title', $title); $doc->addField('text', $text); $response = $client->addDocument($doc); $client->commit(); 33 Friday, 29 October 2010
  • 34. Solr Search Client $client = new SolrClient($options); $query = new SolrQuery('bacon'); $response = $client->query($query); $r = $response->getResponse(); foreach($r['response']['docs'] as $d) { echo $d->title[0] . "n"; } 34 Friday, 29 October 2010
  • 35. Xapian http://xapian.org 35 Friday, 29 October 2010
  • 36. Xapian In PHP $db = new XapianWritableDatabase( 'idx', Xapian::DB_CREATE_OR_OPEN); $i = new XapianTermGenerator(); $i->set_stemmer(new XapianStem("english")); $doc = new XapianDocument(); $doc->set_data($content); $doc->add_value(1, $title); $i->set_document($doc); $i->index_text($content); $db->add_document($doc); 36 Friday, 29 October 2010
  • 37. Xapian Search In PHP $database = new XapianDatabase('idx'); $enquire = new XapianEnquire($database); $qp = new XapianQueryParser(); $qp->set_stemmer(new XapianStem("english")); $qp->set_database($database); $qp->set_stemming_strategy( XapianQueryParser::STEM_SOME); $query = $qp->parse_query($queryString); $enquire->set_query($query); 37 Friday, 29 October 2010
  • 38. $matches = $enquire->get_mset(0, 10); $i = $matches->begin(); while(!$i->equals($matches->end())) { $n = $i->get_rank() + 1; $data = $i->get_document()->get_data(); $title = $i->get_document()->get_value(1); $score = $i->get_percent(); $i->next(); } 38 Friday, 29 October 2010
  • 39. Improving Results 39 Friday, 29 October 2010
  • 40. Anchor Text 40 Friday, 29 October 2010
  • 41. Parse Anchor Text $p = file_get_contents('http://phpir.com'); libxml_use_internal_errors(true); $dom = DomDocument::loadHTML($p); $links = $dom->getElementsByTagName('a'); foreach($links as $link) { $href = $link->getAttribute('href'); $text = $link->nodeValue; } 41 Friday, 29 October 2010
  • 42. 1 2 3 Zone Weighting 42 Friday, 29 October 2010
  • 43. $doc = new Zend_Search_Lucene_Document(); $tfield = Zend_Search_Lucene_Field::Text ('title', $title); $tfield->boost = 1.3; $doc->addField($tfield); $doc->addField( Zend_Search_Lucene_Field::UnStored ('content', $content)); $index->addDocument($doc); ZSL Zone Weighting 43 Friday, 29 October 2010
  • 44. Document Authority 44 Friday, 29 October 2010
  • 45. Document Weights in ZSL $doc = new Zend_Search_Lucene_Document(); $doc->addField( Zend_Search_Lucene_Field::Text ('title', $title)); $doc->addField( Zend_Search_Lucene_Field::UnStored ('content', $content)); $doc->boost = 1 + ($numComments / 100); $index->addDocument($doc); 45 Friday, 29 October 2010
  • 46. Using Search 46 Friday, 29 October 2010
  • 47. Summaries & Highlighting 47 Friday, 29 October 2010
  • 48. Sphinx Extract & Highlight $cl = new SphinxClient(); $cl->SetServer( "localhost", 3312 ); $q = 'bacon'; $r = $cl->Query($q); foreach ($r["matches"] as $doc => $info) { $text[$doc] = getTextFromDB($doc); } $e = $cl->BuildExcerpts($text, 'posts', $q); foreach($extracts as $extract) { echo $extract; } 48 Friday, 29 October 2010
  • 49. Friday, 29 October 2010
  • 50. Xapian Spelling Correction Indexer $indexer = new XapianTermGenerator(); $indexer->set_database($database); $indexer->set_flags( XapianTermGenerator::FLAG_SPELLING); Searcher $queryString = "strreplace or str_cmp"; $q = new XapianQueryParser(); $q->set_database($database); $query = $q->parse_query($queryString, XapianQueryParser::FLAG_SPELLING_CORRECTION); echo "Did you mean: " . $q->get_corrected_query_string() . "n"; 50 Friday, 29 October 2010
  • 51. Spelling Correction Output php xapsearch.php Did you mean: str_replace or strcmp 4644 results found for “strreplace or str_cmp”: 1: 2% docid=572 [phpdocs/html/cc.license.html] 2: 2% docid=7169 [phpdocs/html/imagick.constants.html] 3: 2% docid=10086 [phpdocs/html/sqlite3result.fetcharray.html] 4: 2% docid=6132 [phpdocs/html/function.swf-posround.html] 51 Friday, 29 October 2010
  • 52. Results Sorting 52 Friday, 29 October 2010
  • 53. Sorting in ZSL $q = Zend_Search_Lucene_Search_QueryParser:: parse('search string'); $results = $index->find($q, 'title'); foreach($results as $result) { echo '<h3>', $result->title, "</h3>n"; $doc = getDocumentFromDB($result->did); echo $q->htmlFragmentHighlightMatches($doc); } 53 Friday, 29 October 2010
  • 54. Faceted Search 54 Friday, 29 October 2010
  • 55. Faceted Search In Solr $client = new SolrClient($options); $query = new SolrQuery('bacon'); $response = $client->query($query); $query->setFacet(true); $query->addFacetField('cat'); $r = $response->getResponse(); $f = $r['facet_counts']['facet_fields']; foreach($f['cat'] as $facet => $count) { echo $facet . " " . $count . "n"; } 55 Friday, 29 October 2010
  • 56. More Like This 56 Friday, 29 October 2010
  • 57. More Like This $rset = new XapianRset(); $rset->add_document(5959); // str_replace $e = $enquire->get_eset(40, $rset); $t = $e->begin(); for($t; !$t->equals($e->end()); $t->next()){ $qs[] = new XapianQuery($t->get_term(), intval($t->get_weight())); } $query = new XapianQuery( XapianQuery::OP_OR, $qs); 57 Friday, 29 October 2010
  • 58. More Like This Example php xapsim.php 1656 results found: 1: 100% docid=5959 [phpdocs/html/function.str-replace.html] 2: 47% docid=5956 [phpdocs/html/function.str-ireplace.html] 3: 24% docid=5328 [phpdocs/html/function.preg-replace.html] 4: 18% docid=5958 [phpdocs/html/function.str-repeat.html] 58 Friday, 29 October 2010
  • 59. Search Performance 59 Friday, 29 October 2010
  • 60. Index Updates New Docs Docs Delta Docs Docs Delta Main Main Query Main Delta Main 60 Friday, 29 October 2010
  • 61. Search Speed Zend Search Lucene $index = Zend_Search_Lucene::open('index'); $index->optimize(); Sphinx indexer --merge main delta --rotate Solr $client = new SolrClient($options); $client->optimize(); Xapian xapian-compact xapindex xapindex2 61 Friday, 29 October 2010
  • 62. Distributing Search Document Document Document Document Index Index Index Application 62 Friday, 29 October 2010
  • 63. Large Scale Search http://www.nutch.org http://hadoop.apache.org 63 Friday, 29 October 2010
  • 64. Image Credits Title http://www.flickr.com/photos/generated/2084287794/ What Do You Want http://www.flickr.com/photos/the_justified_sinner/ You Are Here 2498066986/ http://www.flickr.com/photos/alecvuijlsteke/2692475420/ Integrating Search http://www.flickr.com/photos/squeaks2569/3700355684/ Sphinx http://www.flickr.com/photos/generated/2084287794/ Lucene http://www.flickr.com/photos/mypanda/7731447/ Swish-e http://www.flickr.com/photos/ryan_fung/2239687100/ Solr http://www.flickr.com/photos/m-j-s/2724756177/ Xapian http://www.flickr.com/photos/olibac/3522056495/ Using Search http://www.flickr.com/photos/eneas/175027945/ Improving Search http://www.flickr.com/photos/x-ray_delta_one/3928200642/ Search Performance http://www.flickr.com/photos/maisonbisson/1634408/ Large Scale Search http://www.flickr.com/photos/zedzap/3663508847/ 64 Friday, 29 October 2010
  • 65. Questions? 65 Friday, 29 October 2010
  • 66. Thank You! Ian Barber @ianbarber http://phpir.com ian@ibuildings.com Friday, 29 October 2010