Java Search Engine Framework
soluzioni
•  Regular expression (can be slow and memory hungry)
•  Lucene (full-text search engine library)
•  Solr (stand...
Regular expression
•  (cos’è) una sequenza di simboli (quindi una
stringa) che identifica un insieme di stringhe
•  (che f...
Regular expression (esempio)
1.  Pattern p = Pattern.compile("eur*usd");
2.  Matcher m = p.matcher(
3.  “In quel ramo del ...
Lucene
•  Lucene is a high-performance, full-featured text
search engine library written entirely in Java. It
is a technol...
Lucene (esempio)
•  Analyzer analyzer = null;
•  Directory index = null;
•  IndexWriterConfig config = null;
•  IndexWrite...
Lucene (esempio 2)
1.  private void addDoc(long time, String value, String flag) throws Exception
{
2.  Document doc = new...
Lucene (esempio 3)
1.  IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(index));
2.  MultiFieldQueryParser ...
Solr
•  Solr is written in Java and runs as a standalone full-text search
server within a servlet container such as Jetty....
SolrJ
•  SolrJ is a java client to access Solr.
•  It offers a java interface to add, update,
and query the solr index.
• ...
SolrJ (esempio)
1.  SolrServer server = new HttpSolrServer("http://localhost:8983/solr/");
2.  server.deleteByQuery( "*:*"...
SolrJ (esempio)
1.  SolrDocumentList docsr = rsp.getResults();
2.  for(SolrDocument document : docsr){
3.  Object formName...
SolrJ (Product class)
1.  public class Product {
2.  private String id;
3.  public String getId() {
4.  return id;
5.  }
6...
SolrJ (file indexing)
1.  public static void indexPdfWithSolrJ(String fileName, String solrId) throws Exception {
2.  Stri...
references
•  Lucene & Solr http://lucene.apache.org/solr/
•  SolrJ http://wiki.apache.org/solr/Solrj
•  Tika http://tika....
Upcoming SlideShare
Loading in …5
×

Java Search Engine Framework

1,121 views

Published on

Flavio Marchi talks about Java search engine framework during Appsterdam TalkLab

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,121
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Java Search Engine Framework

  1. 1. Java Search Engine Framework
  2. 2. soluzioni •  Regular expression (can be slow and memory hungry) •  Lucene (full-text search engine library) •  Solr (standalone full-text search server ) •  SolrJ (java client per solr)
  3. 3. Regular expression •  (cos’è) una sequenza di simboli (quindi una stringa) che identifica un insieme di stringhe •  (che fa) definisce una funzione che prende in ingresso una stringa, e restituisce in uscita un valore del tipo sì/no, a seconda che la stringa segua o meno un certo pattern.
  4. 4. Regular expression (esempio) 1.  Pattern p = Pattern.compile("eur*usd"); 2.  Matcher m = p.matcher( 3.  “In quel ramo del lago di eUr&uSd”).toLowerCase() 4.  ); 5.  If(m.find()) { //trovato! Ma dove nella stringa? 6.  }
  5. 5. Lucene •  Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. •  Apache Software Foundation •  Stable release 4.3.0 / May 6, 2013 •  Development status Active
  6. 6. Lucene (esempio) •  Analyzer analyzer = null; •  Directory index = null; •  IndexWriterConfig config = null; •  IndexWriter w = null; •  //analyzer = new StandardAnalyzer(Version.LUCENE_43); •  analyzer = new KeywordAnalyzer(); •  index = new RAMDirectory(); •  config = new IndexWriterConfig(Version.LUCENE_43, analyzer); •  w = new IndexWriter(index, config);
  7. 7. Lucene (esempio 2) 1.  private void addDoc(long time, String value, String flag) throws Exception { 2.  Document doc = new Document(); 3.  doc.add(new StringField("time", String.valueOf(time), Field.Store.YES)); 4.  doc.add(new StringField("value", value, Field.Store.YES)); 5.  doc.add(new StringField("flag", flag, Field.Store.YES)); 6.  w.addDocument(doc); 7.  } à w.commit(); //da eseguire alla fine del batch
  8. 8. Lucene (esempio 3) 1.  IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(index)); 2.  MultiFieldQueryParser queryParser = new MultiFieldQueryParser( 3.  Version.LUCENE_43, 4.  new String[] {"time", "value", "flag"}, 5.  analyzer); 6.  QueryParser queryParser = new QueryParser( 7.  Version.LUCENE_43, 8.  "value", 9.  analyzer); 10.  TopDocs hits = searcher.search(queryParser.parse("VALUE:(+eurusd)"), 50); 11.  System.out.println(hits.totalHits); 12.  for(ScoreDoc scoreDoc : hits.scoreDocs) { 13.  Document doc = searcher.doc(scoreDoc.doc); 14.  System.out.println(doc.toString()); 15.  }
  9. 9. Solr •  Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Jetty. •  Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required. •  Apache Software Foundation •  Stable release 4.3.0 / May 6, 2013 •  Development status Active
  10. 10. SolrJ •  SolrJ is a java client to access Solr. •  It offers a java interface to add, update, and query the solr index. •  Last version: 1.4.X
  11. 11. SolrJ (esempio) 1.  SolrServer server = new HttpSolrServer("http://localhost:8983/solr/"); 2.  server.deleteByQuery( "*:*" );// CAUTION: deletes everything! 3.  SolrInputDocument doc1 = new SolrInputDocument(); 4.  doc1.addField( "id", 23425); 5.  doc1.addField( "name", "doc1"); 6.  doc1.addField( "price", 100980 ); 7.  SolrInputDocument doc2 = new SolrInputDocument(); 8.  doc2.addField( "id", 63432); 9.  doc2.addField( "name", "doc2"); 10. doc2.addField( "price", 205345 ); 11. Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>(); 12. docs.add(doc1); 13. docs.add(doc2); 14.  server.add(docs); 15.  server.commit(); 16.  SolrQuery query = new SolrQuery(); 17.  query.setQuery("+name:*c1 +price:100980"); 18.  QueryResponse rsp = server.query(query);
  12. 12. SolrJ (esempio) 1.  SolrDocumentList docsr = rsp.getResults(); 2.  for(SolrDocument document : docsr){ 3.  Object formName = document.getFieldValue("id"); 4.  System.out.println(formName); 5.  } 6.  List<Product> products = rsp.getBeans(Product.class); 7.  for(Product product : products){ 8.  Object empName = product.getId(); 9.  System.out.println(empName); 10.  }
  13. 13. SolrJ (Product class) 1.  public class Product { 2.  private String id; 3.  public String getId() { 4.  return id; 5.  } 6.  @Field("id") 7.  public void setId(String id) { 8.  this.id = id; 9.  } …the same for price and name attributes. 10. }
  14. 14. SolrJ (file indexing) 1.  public static void indexPdfWithSolrJ(String fileName, String solrId) throws Exception { 2.  String urlString = "http://localhost:8983/solr"; 3.  SolrServer solr = new HttpSolrServer(urlString); 4.  ContentStreamUpdateRequest up = new longnameclass("/update/extract"); 5.  up.addFile(new File(fileName),"application/pdf"); 6.  up.setParam("literal.id",solrId); 7.  up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); 8.  solr.request(up); 9.  QueryResponse rsp = solr.query(new SolrQuery("*:*")); 10.  System.out.println(rsp); 11.  }
  15. 15. references •  Lucene & Solr http://lucene.apache.org/solr/ •  SolrJ http://wiki.apache.org/solr/Solrj •  Tika http://tika.apache.org/

×