Dev in Santos - Como NÃO fazer pesquisas usando LIKE

  • 1,650 views
Uploaded on

Palestra para o Evento Dev in Santos, em Novembro/2013. Demonstrar como a maioria implementa search errado, como é amplo o campo de classificação e pesquisa de textos e documentos e as melhores …

Palestra para o Evento Dev in Santos, em Novembro/2013. Demonstrar como a maioria implementa search errado, como é amplo o campo de classificação e pesquisa de textos e documentos e as melhores soluções do mercado hoje.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,650
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
22
Comments
0
Likes
10

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Como NÃO fazer pesquisas usando LIKE Fabio Akita @akitaonrails
  • 2. www.codeminer42.com
  • 3. www.codeminer42.com
  • 4. www.codeminer42.com
  • 5. www.codeminer42.com
  • 6. www.codeminer42.com
  • 7. www.codeminer42.com
  • 8. www.codeminer42.com
  • 9. www.codeminer42.com
  • 10. www.codeminer42.com
  • 11. www.codeminer42.com
  • 12. www.codeminer42.com
  • 13. www.codeminer42.com
  • 14. Search está em todos os lugares
  • 15. SELECT * FROM PRODUCTS WHERE NAME LIKE '%Camisetas%' AND DESCRIPTION LIKE '%Camisetas%' AND NAME NOT LIKE '%Calças%' AND DESCRIPTION NOT LIKE '%Calças%'
  • 16. Camisetas INDEX SEEK Rápido
  • 17. Camisetas INDEX SEEK Rápido Camisetas% INDEX SCAN Quase Rápido
  • 18. Camisetas INDEX SEEK Rápido Camisetas% INDEX SCAN Quase Rápido %Camisetas% TABLE SCAN Indo pra trás
  • 19. Índices não vão te ajudar
  • 20. WordPress wp-includes/taxonomy.php (1256 até 1545)
  • 21. <?php function get_terms($taxonomies, $args = '') {   ...   if ( !empty($name__like) ) {     $name__like = like_escape( $name__like );     $where .= $wpdb->prepare( " AND t.name LIKE %s",       '%' . $name__like . '%' );   }      if ( ! empty( $description__like ) ) {     $description__like = like_escape( $description__like );     $where .= $wpdb->prepare( " AND tt.description LIKE %s",       '%' . $description__like . '%' );   }   ...      if ( ! empty( $search ) ) {     $search = like_escape( $search );     $where .= $wpdb->prepare( ' AND ((t.name LIKE %s) OR (t.slug LIKE %s))',       '%' . $search . '%', '%' . $search . '%' );   }   ... } ?>
  • 22. Magento AbstractHelper.php
  • 23. <?php public function getCILike($field, $value, $options = array()) {   $quotedField = $this->_getReadAdapter()->quoteIdentifier($field);   return new Zend_Db_Expr($quotedField . ' LIKE ' .     $this->addLikeEscape($value, $options)); } ?>
  • 24. Rankeamento, Relevância
  • 25. Rankeamento, Relevância Frases, Proximidade, Intervalos
  • 26. Rankeamento, Relevância Frases, Proximidade, Intervalos Sinônimos, "Stemmer"
  • 27. Rankeamento, Relevância Frases, Proximidade, Intervalos Sinônimos, "Stemmer" “More Like This"
  • 28. Rankeamento, Relevância Frases, Proximidade, Intervalos Sinônimos, "Stemmer" “More Like This" “Did you mean …?"
  • 29. Rankeamento, Relevância Frases, Proximidade, Intervalos Sinônimos, "Stemmer" “More Like This" “Did you mean …?" Faceting (Terms, Geolocation, etc)
  • 30. Pesquisa Não-Estruturada
  • 31. Pesquisa Não-Estruturada Sugestões
  • 32. Pesquisa Não-Estruturada Sugestões Ordenação
  • 33. Pesquisa Não-Estruturada Sugestões Ordenação Terms Facet
  • 34. Pesquisa Não-Estruturada Sugestões Ordenação Terms Facet Agregação
  • 35. Pesquisa Não-Estruturada Sugestões Ordenação Terms Facet Agregação Paginação
  • 36. SELECT * FROM PRODUCTS WHERE MATCH (NAME, DESCRIPTION) AGAINST ('+Camisetas -Calças' IN BOOLEAN MODE)
  • 37. Magento CatalogSearch/Model/Resource/Helper.php
  • 38. <?php public function chooseFulltext($table, $alias, $select) {   $field = new Zend_Db_Expr(     'MATCH (' . $alias . '.data_index) AGAINST (:query IN BOOLEAN MODE)');   $select->columns(array('relevance' => $field));   return $field; } ?>
  • 39. SELECT * FROM PRODUCTS WHERE CONTAINS( (NAME, DESCRIPTION), 'Camisetas AND NOT Calças')
  • 40. SELECT * FROM PRODUCTS WHERE TO_TSVECTOR(NAME || '' || DESCRIPTION) @@ TO_TSQUERY('Camisetas &! Calças')
  • 41. Cadeias de Markov
  • 42. Cadeias de Markov Índices Invertidos
  • 43. Cadeias de Markov Índices Invertidos Vector Space Model
  • 44. Cadeias de Markov Índices Invertidos Vector Space Model Okapi BM25
  • 45. Vector Space Model http://u.akita.ws/vsm_example (Exemplo Simplificado)
  • 46. d1 “new york times" d2 “new york post" d3 “los angeles times"
  • 47. angeles log2(3/1)=1.584 los log2(3/1)=1.584 new log2(3/2)=0.584 post log2(3/1)=1.584 times log2(3/2)=0.584 york log2(3/2)=0.584
  • 48. angeles los new post times york d1 0 0 1 0 1 1 d2 0 0 1 1 0 1 d3 1 1 0 0 1 0
  • 49. angeles los new post times york d1 0 0 0.584 0 0.584 0.584 d2 0 0 0.584 1.584 0 0.584 d3 1.584 1.584 0 0 0.584 0
  • 50. angeles los q 0 0 new (2/2)*0.584= 0.584 post times york 0 (1/2)*0.584= 0.292 0 q = “new new times"
  • 51. Distância d1 sqrt(0.584^2+0.584^2+0.584^2) 1.011 Distância d2 sqrt(0.584^2+1.584^2+0.584^2) 1.786 Distância d3 sqrt(1.584^2+1.584^2+0.584^2) 2.316 Distância q sqrt(0.584^2+0.292^2) 0.652
  • 52. (0*0+0*0+0.584*0.584+0*0+0.584*0.292+0.584*0) / cosSim(d1,q) (1.011*0.652) 0.776 (0*0+0*0+0.584*0.584+1.584*0+0*0.292+0.584*0) / cosSim(d2,q) (1.786*0.652) 0.292 (1.584*0+1.584*0+0*0.584+0*0+0.584*0.292+0*0) / (2.316*0.652) 0.112 cosSim(d3,q)
  • 53. Douglass Cutting Lucene Nutch Hadoop ! Tika Solr ElasticSearch
  • 54. 150GB/hora 20%-30% tamanho do índice Apache Lucene
  • 55. HTML, XHTML, OOXML, ODF, XML, RSS, OLE2, iWorks (Pages, Numbers, Keynote), PDF, EPUB, RTF, Commons Compress (ar, cpio, Unix dump, tar, zip, gzip, XZ, Pack200, bzip2, 7z, arj e lzma), Audio (javax.sound, MIDI, Mp3), Image (javax.imageio, Tiff, Jpeg), Video (FLV, Flash), Mail (Mbox, RFC822), DWG, Font (TrueType), HDF, e plugins.
  • 56. InputStream is = new BufferedInputStream( new FileInputStream( new File("sample.pdf"))); ! Parser parser = new AutoDetectParser(); ContentHandler handler = new BodyContentHandler( System.out); ! Metadata metadata = new Metadata(); ! parser.parse(is, handler, metadata, new ParseContext()); ! for (String name : metadata.names()) { String value = metadata.get(name); ! if (value != null) { System.out.println("Metadata Name: " + name); System.out.println("Metadata Value: " + value); } }
  • 57. http://localhost:8983/solr/query?q=title:black
  • 58. http://localhost:8983/solr/query? q=*:* &fl=id,title,series_s,pubyear_i &sort=pubyear_i desc &group=true &group.main=true &group.field=series_s &facet=true &facet.field=cat
  • 59. curl "http://localhost:8983/solr/update/extract? literal.id=doc5&defaultField=text” --data-binary @tutorial.html -H 'Content-type:text/html'
  • 60. Solr ElasticSearch
  • 61. Solr Coordination ElasticSearch ZooKeeper Zen Discovery
  • 62. Solr ElasticSearch Coordination ZooKeeper Zen Discovery Shard Splitting Sim Não
  • 63. Solr ElasticSearch Coordination ZooKeeper Zen Discovery Shard Splitting Sim Não Automatic Shard Rebalancing Não Sim
  • 64. Solr ElasticSearch Coordination ZooKeeper Zen Discovery Shard Splitting Sim Não Automatic Shard Rebalancing Não Sim Schema +/- Sim
  • 65. Solr ElasticSearch Coordination ZooKeeper Zen Discovery Shard Splitting Sim Não Automatic Shard Rebalancing Não Sim Schema +/- Sim Nested Typing Não Sim
  • 66. Solr ElasticSearch Coordination ZooKeeper Zen Discovery Shard Splitting Sim Não Automatic Shard Rebalancing Não Sim Schema +/- Sim Nested Typing Não Sim Queries Key / Value JSON
  • 67. Solr ElasticSearch Coordination ZooKeeper Zen Discovery Shard Splitting Sim Não Automatic Shard Rebalancing Não Sim Schema +/- Sim Nested Typing Não Sim Queries Key / Value JSON Distributed Group By Sim Não
  • 68. Solr ElasticSearch Coordination ZooKeeper Zen Discovery Shard Splitting Sim Não Automatic Shard Rebalancing Não Sim Schema +/- Sim Nested Typing Não Sim Queries Key / Value JSON Distributed Group By Sim Não Percolation Queries Não Sim
  • 69. Setup cd ~ sudo apt-get update sudo apt-get install openjdk-7-jre-headless -y ### http://www.elasticsearch.org/download/ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/ elasticsearch-0.90.7.deb sudo dpkg -i elasticsearch-0.90.7.deb sudo service elasticsearch start
  • 70. Setup # Bonsai heroku addons:add bonsai heroku config:add ELASTICSEARCH_URL=`heroku config:get BONSAI_URL` ! # Found heroku addons:add foundelasticsearch heroku config:add ELASTICSEARCH_URL=`heroku config:get FOUNDELASTICSEARCH_URL` ! # SearchBox heroku addons:add searchbox:starter heroku config:add ELASTICSEARCH_URL=`heroku config:get SEARCHBOX_URL` ! # reindex heroku run rake searchkick:reindex CLASS=Product
  • 71. Setup # Gemfile - bundle install gem "searchkick" ! # app/models/product.rb class Product < ActiveRecord::Base searchkick end ! # config/initializers/elasticsearch.rb ENV["ELASTICSEARCH_URL"] = "http://username:password@api.searchbox.io" ! # no shell rails r "Product.reindex"
  • 72. # Search simples products = Product.search "Camisetas" products.each do |product| puts product.name end
  • 73. # Search simples products = Product.search "Camisetas" products.each do |product| puts product.name end # Search com campos Product.search "Camisetas", fields: [:name, :description] where: { in_stock: true, expires_at: {gt: 1.week.from_now}, or: [ [{in_stock: true}, {backordered: true}] ] }, order: {_score: :desc}, # relevant first limit: 10, offset: 50 # , page: params[:page], per_page: 20
  • 74. # Sinonimos class Product < ActiveRecord::Base searchkick synonyms: [ ["pc", "computador pessoal"], ["word", "microsoft office"] ] end
  • 75. # Sinonimos class Product < ActiveRecord::Base searchkick synonyms: [ ["pc", "computador pessoal"], ["word", "microsoft office"] ] end # Sugestões class Product < ActiveRecord::Base searchkick suggest: ["name"] end ! products = Product.search "cold miner ", suggest: true products.suggestions # ["codeminer"]
  • 76. class City < ActiveRecord::Base searchkick autocomplete: ["name"] end ! City.search "Sao P", autocomplete: true
  • 77. # app/controllers/cities_controller.rb class CitiesController < ApplicationController def autocomplete render json: City.search(params[:query], autocomplete: true, limit: 10).map(&:name) end end
  • 78. # app/controllers/cities_controller.rb class CitiesController < ApplicationController def autocomplete render json: City.search(params[:query], autocomplete: true, limit: 10).map(&:name) end end # partial <input type="text" id="query" name="query" /> ! <script src="jquery.js"></script> <script src="typeahead.js"></script> <script> $("#query").typeahead({ name: "city", remote: "/cities/autocomplete?query=%QUERY" }); </script>
  • 79. products = Product.search "GPS", facets: [:type, :brand, :screen_size] puts products.facets
  • 80. class City < ActiveRecord::Base searchkick locations: ["location"] ! def search_data attributes.merge location: [latitude, longitude] end end ! City.search "Codemi", where: { location: {near: [-23, -46], within: "10mi" } } # ou 16km
  • 81. Próximos Capítulos
  • 82. SELECT … LIKE ‘%'
  • 83. SELECT … LIKE ‘%'
  • 84. OBRIGADO! slideshare.net/akitaonrails codeminer42.com @akitaonrails