Lucene

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    4 Favorites & 1 Group

    Lucene - Presentation Transcript

    1. Open source indexing and search engine
    2. Web scale
    3. Lucene Inverted index
    4. Lucene Inverted index Results
    5. Lucene Inverted index Servlet container J2EE application server
    6. WARNING Java approaching!
    7. Java is strongly object orientated
    8. my @gene_names = (); push(@gene_names, $gene); print @gene_names; Perl Java Array gene_names = new Array(); gene_names.add(gene); System.out.println(gene_names.toString)
    9. my $gene = Gene->new(‘ENS12345’); $gene->set_name(‘BRCA2’); Perl Java Gene gene = new Gene(‘ENS12345’); gene.set_name(‘BRCA2’);
    10. Java is strongly typed
    11. my $number = “100”; $number = $number + 400 print $number; Perl Java Integer number = new Integer(100); number = number + 400; System.out.println(number + 400);
    12. Java is good at error handling
    13. eval ($gene->transform); warn $@ if $@; Perl Java try { gene->transform } catch (IOException e) { e.printStackTrace; }
    14. Java is surprisingly easy to learn
    15. Conditionals and loops Variables have scope Extras from CPAN Performance is important Perl Java Conditionals and loops Variables have scope Extras available as JAR files Performance is important
    16. Recipe 1: Indexing a collection of documents
    17. org.ensembl.lucene.Writer
    18. public static void main(String[] args) {         HashMap<String, String> arguments = new HashMap<String, String>();         String key = null;         for (String s: args) {            if (key == null) {                key = s;            } else {                arguments.put(key, s);                key = null;            }         }         Writer writer = new Writer();         writer.setIndexLocation(arguments.get(\"-index\"));         writer.setInputLocation(arguments.get(\"-input\"));         if (arguments.get(\"-mergefactor\") != null) { writer.setMergeFactor(Integer.valueOf(arguments.get(\"-mergefactor\")));     }         if (arguments.get(\"-maxmergedocs\") != null) { writer.setMaxMergeDocs(Integer.valueOf(arguments.get(\"-maxmergedocs\")));     }         try {             writer.index();         } catch (IOException e) {             e.printStackTrace();         }         System.out.println(\"Indexing complete\");   }
    19. public static void main(String[] args) {         HashMap<String, String> arguments = new HashMap<String, String>();         String key = null;         for (String s: args) {            if (key == null) {                key = s;            } else {                arguments.put(key, s);                key = null;            }         }         Writer writer = new Writer();         writer.setIndexLocation(arguments.get(\"-index\"));         writer.setInputLocation(arguments.get(\"-input\"));         if (arguments.get(\"-mergefactor\") != null) { writer.setMergeFactor(Integer.valueOf(arguments.get(\"-mergefactor\")));     }         if (arguments.get(\"-maxmergedocs\") != null) { writer.setMaxMergeDocs(Integer.valueOf(arguments.get(\"-maxmergedocs\")));     }         try {             writer.index();         } catch (IOException e) {             e.printStackTrace();         }         System.out.println(\"Indexing complete\");   }
    20. public static void main(String[] args) {         HashMap<String, String> arguments = new HashMap<String, String>();         String key = null;         for (String s: args) {            if (key == null) {                key = s;            } else {                arguments.put(key, s);                key = null;            }         }         Writer writer = new Writer();         writer.setIndexLocation(arguments.get(\"-index\"));         writer.setInputLocation(arguments.get(\"-input\"));         if (arguments.get(\"-mergefactor\") != null) { writer.setMergeFactor(Integer.valueOf(arguments.get(\"-mergefactor\")));     }         if (arguments.get(\"-maxmergedocs\") != null) { writer.setMaxMergeDocs(Integer.valueOf(arguments.get(\"-maxmergedocs\")));     }         try {             writer.index();         } catch (IOException e) {             e.printStackTrace();         }         System.out.println(\"Indexing complete\");   }
    21. Max-merge-docs how many documents are added to a segment
    22. Merge-factor how often Lucene merges index segments when adding documents
    23. public static void main(String[] args) {         HashMap<String, String> arguments = new HashMap<String, String>();         String key = null;         for (String s: args) {            if (key == null) {                key = s;            } else {                arguments.put(key, s);                key = null;            }         }         Writer writer = new Writer();         writer.setIndexLocation(arguments.get(\"-index\"));         writer.setInputLocation(arguments.get(\"-input\"));         if (arguments.get(\"-mergefactor\") != null) { writer.setMergeFactor(Integer.valueOf(arguments.get(\"-mergefactor\")));     }         if (arguments.get(\"-maxmergedocs\") != null) { writer.setMaxMergeDocs(Integer.valueOf(arguments.get(\"-maxmergedocs\")));     }         try {             writer.index();         } catch (IOException e) {             e.printStackTrace();         }         System.out.println(\"Indexing complete\");   }
    24. public void index() throws IOException {         File index = new File(getIndexLocation());         File location = new File(getInputLocation());         IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(), true);         writer.setMergeFactor(getMergeFactor());         writer.setMaxMergeDocs(getMaxMergeDocs());         indexDocuments(writer, location);         writer.optimize();         writer.close();   }     private static void indexDocuments(IndexWriter writer, Filelocation) throws IOException {         if (location.canRead()) {           if (location.isDirectory()) {             String[] files = location.list();             if (files != null) {               for (int i = 0; i < files.length; i++) {                 indexDocuments(writer, new File(location, files[i]));            }           }           } else {             System.out.println(\"Indexing  \" + location);             try {                 GeneFileDocument.index(writer, location);           }             catch (FileNotFoundException e) {               System.out.println(\"Caught exception: \" + e);           }          }         }     }
    25. public void index() throws IOException {         File index = new File(getIndexLocation());         File location = new File(getInputLocation());         IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(), true);         writer.setMergeFactor(getMergeFactor());         writer.setMaxMergeDocs(getMaxMergeDocs());         indexDocuments(writer, location);         writer.optimize();         writer.close();   }     private static void indexDocuments(IndexWriter writer, Filelocation) throws IOException {         if (location.canRead()) {           if (location.isDirectory()) {             String[] files = location.list();             if (files != null) {               for (int i = 0; i < files.length; i++) {                 indexDocuments(writer, new File(location, files[i]));            }           }           } else {             System.out.println(\"Indexing  \" + location);             try {                 GeneFileDocument.index(writer, location);           }             catch (FileNotFoundException e) {               System.out.println(\"Caught exception: \" + e);           }          }         }     }
    26. public void index() throws IOException {         File index = new File(getIndexLocation());         File location = new File(getInputLocation());         IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(), true);         writer.setMergeFactor(getMergeFactor());         writer.setMaxMergeDocs(getMaxMergeDocs());         indexDocuments(writer, location);         writer.optimize();         writer.close();   }     private static void indexDocuments(IndexWriter writer, Filelocation) throws IOException {         if (location.canRead()) {           if (location.isDirectory()) {             String[] files = location.list();             if (files != null) {               for (int i = 0; i < files.length; i++) {                 indexDocuments(writer, new File(location, files[i]));            }           }           } else {             System.out.println(\"Indexing  \" + location);             try {                 GeneFileDocument.index(writer, location);           }             catch (FileNotFoundException e) {               System.out.println(\"Caught exception: \" + e);           }          }         }     }
    27. public void index() throws IOException {         File index = new File(getIndexLocation());         File location = new File(getInputLocation());         IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(), true);         writer.setMergeFactor(getMergeFactor());         writer.setMaxMergeDocs(getMaxMergeDocs());         indexDocuments(writer, location);         writer.optimize();         writer.close();   }     private static void indexDocuments(IndexWriter writer, Filelocation) throws IOException {         if (location.canRead()) {           if (location.isDirectory()) {             String[] files = location.list();             if (files != null) {               for (int i = 0; i < files.length; i++) {                 indexDocuments(writer, new File(location, files[i]));            }           }           } else {             System.out.println(\"Indexing  \" + location);             try {                 GeneFileDocument.index(writer, location);           }             catch (FileNotFoundException e) {               System.out.println(\"Caught exception: \" + e);           }          }         }     }
    28. org.ensembl.lucene. GeneFileDocument
    29.     public static void index(IndexWriter writer, File f) throws IOException {         String fields[] = {\"subtype\", \"id\", \"url\", \"keywords\", \"description\"};         FileReader input = new FileReader(f);         BufferedReader bufRead = new BufferedReader(input);         String line;         line = bufRead.readLine();         while (line != null){              Document doc = new Document();              int count = 0;              String terms[] = line.split(\"\\t\");              while (count < terms.length) {                  String field = fields[count];                  String item = terms[count];                  doc.add(new Field(field, item, Field.Store.YES, Field.Index.TOKENIZED));                  count++;              }              writer.addDocument(doc);              line = bufRead.readLine();          }   }
    30.     public static void index(IndexWriter writer, File f) throws IOException {         String fields[] = {\"subtype\", \"id\", \"url\", \"keywords\", \"description\"};         FileReader input = new FileReader(f);         BufferedReader bufRead = new BufferedReader(input);         String line;         line = bufRead.readLine();         while (line != null){              Document doc = new Document();              int count = 0;              String terms[] = line.split(\"\\t\");              while (count < terms.length) {                  String field = fields[count];                  String item = terms[count];                  doc.add(new Field(field, item, Field.Store.YES, Field.Index.TOKENIZED));                  count++;              }              writer.addDocument(doc);              line = bufRead.readLine();          }   }
    31.     public static void index(IndexWriter writer, File f) throws IOException {         String fields[] = {\"subtype\", \"id\", \"url\", \"keywords\", \"description\"};         FileReader input = new FileReader(f);         BufferedReader bufRead = new BufferedReader(input);         String line;         line = bufRead.readLine();         while (line != null){              Document doc = new Document();              int count = 0;              String terms[] = line.split(\"\\t\");              while (count < terms.length) {                  String field = fields[count];                  String item = terms[count];                  doc.add(new Field(field, item, Field.Store.YES, Field.Index.TOKENIZED));                  count++;              }              writer.addDocument(doc);              line = bufRead.readLine();          }   }
    32. Quite a lot of memory ~1.5Gb
    33. Creates index
    34. Merge indices to form master search index
    35. Recipe 2: Finding documents containing a search term
    36. Easy
    37. org.ensembl.lucene.Search
    38. public static void main(String args[]) {         Timer timer = new Timer();         String index = \"index\";         try {             timer.start();             Searcher searcher = new IndexSearcher(index);             timer.stop();             System.out.println(\"Loaded \" + searcher.maxDoc() + \" documents in \" + timer.elapsed() + \"ms\");            search(searcher, \"subtype\", \"Vega_havana processed_pseudogene Gene\");             search(searcher, \"id\", \"OTTHUMG00000000423\");             searcher.close();         } catch (Exception e) {             e.printStackTrace();     }   }
    39. public static void main(String args[]) {         Timer timer = new Timer();         String index = \"index\";         try {             timer.start();             Searcher searcher = new IndexSearcher(index);             timer.stop();             System.out.println(\"Loaded \" + searcher.maxDoc() + \" documents in \" + timer.elapsed() + \"ms\");            search(searcher, \"subtype\", \"Vega_havana processed_pseudogene Gene\");             search(searcher, \"id\", \"OTTHUMG00000000423\");             searcher.close();         } catch (Exception e) {             e.printStackTrace();     }   }
    40. public static void main(String args[]) {         Timer timer = new Timer();         String index = \"index\";         try {             timer.start();             Searcher searcher = new IndexSearcher(index);             timer.stop();             System.out.println(\"Loaded \" + searcher.maxDoc() + \" documents in \" + timer.elapsed() + \"ms\");            search(searcher, \"subtype\", \"Vega_havana processed_pseudogene Gene\");             search(searcher, \"id\", \"OTTHUMG00000000423\");             searcher.close();         } catch (Exception e) {             e.printStackTrace();     }   }
    41. public static void main(String args[]) {         Timer timer = new Timer();         String index = \"index\";         try {             timer.start();             Searcher searcher = new IndexSearcher(index);             timer.stop();             System.out.println(\"Loaded \" + searcher.maxDoc() + \" documents in \" + timer.elapsed() + \"ms\");            search(searcher, \"subtype\", \"Vega_havana processed_pseudogene Gene\");             search(searcher, \"id\", \"OTTHUMG00000000423\");             searcher.close();         } catch (Exception e) {             e.printStackTrace();     }   }
    42.     private static void search(Searcher searcher, String field, String queryString) throws ParseException, IOException {         Timer timer = new Timer();         timer.start();         System.out.println(\"Search (\" + field + \"): \" + queryString);         QueryParser parser = new QueryParser(field, new StandardAnalyzer());         Query query = parser.parse(queryString);         Hits hits = searcher.search(query);         Integer count = 1;         Iterator<Hit> hiterator = hits.iterator();         while (hiterator.hasNext()) {             Hit hit = hiterator.next();             Document document = hit.getDocument();             System.out.println(count + \": ID: \" + document.get(\"id\"));             System.out.println(count + \": Subtype: \" + document.get(\"subtype\"));             count++;         }         int hitCount = hits.length();         timer.stop();         System.out.println(\"Hits: \" + hitCount);         System.out.println(\"Completed in \" + timer.elapsed() + \"ms\");
    43.     private static void search(Searcher searcher, String field, String queryString) throws ParseException, IOException {         Timer timer = new Timer();         timer.start();         System.out.println(\"Search (\" + field + \"): \" + queryString);         QueryParser parser = new QueryParser(field, new StandardAnalyzer());         Query query = parser.parse(queryString);         Hits hits = searcher.search(query);         Integer count = 1;         Iterator<Hit> hiterator = hits.iterator();         while (hiterator.hasNext()) {             Hit hit = hiterator.next();             Document document = hit.getDocument();             System.out.println(count + \": ID: \" + document.get(\"id\"));             System.out.println(count + \": Subtype: \" + document.get(\"subtype\"));             count++;         }         int hitCount = hits.length();         timer.stop();         System.out.println(\"Hits: \" + hitCount);         System.out.println(\"Completed in \" + timer.elapsed() + \"ms\");
    44.     private static void search(Searcher searcher, String field, String queryString) throws ParseException, IOException {         Timer timer = new Timer();         timer.start();         System.out.println(\"Search (\" + field + \"): \" + queryString);         QueryParser parser = new QueryParser(field, new StandardAnalyzer());         Query query = parser.parse(queryString);         Hits hits = searcher.search(query);         Integer count = 1;         Iterator<Hit> hiterator = hits.iterator();         while (hiterator.hasNext()) {             Hit hit = hiterator.next();             Document document = hit.getDocument();             System.out.println(count + \": ID: \" + document.get(\"id\"));             System.out.println(count + \": Subtype: \" + document.get(\"subtype\"));             count++;         }         int hitCount = hits.length();         timer.stop();         System.out.println(\"Hits: \" + hitCount);         System.out.println(\"Completed in \" + timer.elapsed() + \"ms\");
    45.     private static void search(Searcher searcher, String field, String queryString) throws ParseException, IOException {         Timer timer = new Timer();         timer.start();         System.out.println(\"Search (\" + field + \"): \" + queryString);         QueryParser parser = new QueryParser(field, new StandardAnalyzer());         Query query = parser.parse(queryString);         Hits hits = searcher.search(query);         Integer count = 1;         Iterator<Hit> hiterator = hits.iterator();         while (hiterator.hasNext()) {             Hit hit = hiterator.next();             Document document = hit.getDocument();             System.out.println(count + \": ID: \" + document.get(\"id\"));             System.out.println(count + \": Subtype: \" + document.get(\"subtype\"));             count++;         }         int hitCount = hits.length();         timer.stop();         System.out.println(\"Hits: \" + hitCount);         System.out.println(\"Completed in \" + timer.elapsed() + \"ms\");
    46.     private static void search(Searcher searcher, String field, String queryString) throws ParseException, IOException {         Timer timer = new Timer();         timer.start();         System.out.println(\"Search (\" + field + \"): \" + queryString);         QueryParser parser = new QueryParser(field, new StandardAnalyzer());         Query query = parser.parse(queryString);         Hits hits = searcher.search(query);         Integer count = 1;         Iterator<Hit> hiterator = hits.iterator();         while (hiterator.hasNext()) {             Hit hit = hiterator.next();             Document document = hit.getDocument();             System.out.println(count + \": ID: \" + document.get(\"id\"));             System.out.println(count + \": Subtype: \" + document.get(\"subtype\"));             count++;         }         int hitCount = hits.length();         timer.stop();         System.out.println(\"Hits: \" + hitCount);         System.out.println(\"Completed in \" + timer.elapsed() + \"ms\");
    47.     private static void search(Searcher searcher, String field, String queryString) throws ParseException, IOException {         Timer timer = new Timer();         timer.start();         System.out.println(\"Search (\" + field + \"): \" + queryString);         QueryParser parser = new QueryParser(field, new StandardAnalyzer());         Query query = parser.parse(queryString);         Hits hits = searcher.search(query);         Integer count = 1;         Iterator<Hit> hiterator = hits.iterator();         while (hiterator.hasNext()) {             Hit hit = hiterator.next();             Document document = hit.getDocument();             System.out.println(count + \": ID: \" + document.get(\"id\"));             System.out.println(count + \": Subtype: \" + document.get(\"subtype\"));             count++;         }         int hitCount = hits.length();         timer.stop();         System.out.println(\"Hits: \" + hitCount);         System.out.println(\"Completed in \" + timer.elapsed() + \"ms\");
    48. Recipe 3: Querying a remote document index
    49. Wrap everything into a single file
    50. Copy that file to an application server
    51. Restart the application server
    52. Voilà!
    53. (almost never that easy)
    54. You will need...
    55. Bonus recipe! Automate tasks with Ant
    56. XML based configuration
    57. Automated compiles
    58. Automated test runner
    59. Automated deployment
    60. Platform independent
    61. Flexible (but complex)
    62. ant deploy
    63. clean code clean index compile build index build jar build war deploy
    64. Could this work for Ensembl?
    65. lucene.apache.org
    66. Java IDEs rock: get stuck in
    67. Thank you

    + Matt WoodMatt Wood, 3 years ago

    custom

    2562 views, 4 favs, 1 embeds more stats

    Brief introduction to Lucene, including a Java for more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 2562
      • 2548 on SlideShare
      • 14 from embeds
    • Comments 0
    • Favorites 4
    • Downloads 137
    Most viewed embeds
    • 14 views on http://librarynet.nahey.net

    more

    All embeds
    • 14 views on http://librarynet.nahey.net

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories

    Groups / Events