SlideShare a Scribd company logo
1 of 14
Falcon
Full Text Search Engine
Mar 28, 2015
Adamson University
Master in Information Technology
Advanced Object Oriented Programming
Hideshi Ogoshi
What is Falcon
 Represents its speed and strength
 Light weight full text search engine
 Command line application
 Provides http server mode
 Written in Python programming language
 Only 1 file and 421 lines of code
 Data is stored in SQLite3 database
 https://github.com/hideshi/Falcon
What is full text search engine
 A storage for the text documents
 Extremely faster than SQL query which uses LIKE ‘%%’
partial match expression
 Composed of index manager, index builder and search
function
 Has own data structure called ‘inverted index’
 Each word is splitted into tokens by ‘tokenizer’
What is tokenizer
 Splits words, which are separated by spaces, into
several tokens
 Token is a group of characters
 This is a book -> ‘This’, ‘is’, ‘a’, ‘book’
 It’s useful for many languages which separate words by
spaces like English, French, Tagalog, etc.
 When it comes to applying it to Japanese or Chinese,
etc, it will cause some problems because these
languages don’t use spaces in their sentences.
What is ngram tokenizer
 Kinds of tokenizers which split words or sentences into
several tokens
 Each token has certain number of characters
 Number of characters depends on the type of ngram
tokenizer
 unigram, bigram, trigram, etc.
What is bigram
 How bigram tokenizer split a sentence into tokens
 Each token has two characters
 English
 This is a book -> ‘Th’, ‘hi’, ‘is’, ‘sa’, ‘ab’, ‘bo’, ‘oo’, ‘ok’
 Japanese
 これは本です -> ‘これ’, ‘れは’, ‘は本’, ‘本で’, ‘です’
 Chinese
 这是书 -> ‘这是’, ‘是书’
What is inverted index
 A structure of the data which provides a faster way to
retrieve data
Dictionary Posting List
This 0
is 1 5
a 2 6
book 3
That 4
pen 7
This is a book. That is a pen.
What is inverted index
“government of the people, by the people, for the people,
shall not perish from the earth.”
{“by”, 1, {1: [4]}}, {“earth”, 1, {1: [15]}}, {“for”, 1, {1: [7]}},
{“from”, 1, {1: [13]}}, {“government”, 1, {1: [0]}},
{“not”, 1, {1: [11]}}, {“of”, 1, {1: [1]}},
{“people”, 3, {1: [3, 6, 9]}}, {“perish”, 1, {1: [12]}},
{“shall”, 1, {1: [10]}}, {“the”, 4, {1: [2, 5, 8, 14]}}
Table Definition
INDICES
TOKEN TEXT PRIMARY KEY
POSTING_LIST BLOB
DOCUMENTS
ID INTEGER PRIMARY KEY
TITLE TEXT
CONTENT BLOB
Class Diagram
Performance Tuning
 A token which contains stop words composed of symbols like !”#$%&’()-
=^~¥|@`[{;+:*]},<.>/?_ are ignored by tokenizer to reduce the time for creating
index and searching.
 Document contents are compressed using bzip2 algorithm to reduce the time for
queries. Compression rate is 38.6% at most and average is 79.3%.
 Turn off journal_mode and synchronous so as not to create unnecessary files when
records are inserted. It increases 8% in speed.
 Use bulk insert instead of executing insert statement for each record. It increases
11% in speed.
 Falcon provides in-memory-database mode powered by SQLite3. So while creating
index, Falcon creates new records in its memory so as to reduce the time of I/O
accesses. Then after creating index, in-memory-database will be stored in a file. It
increases 17% in speed.
 Check memory usage constantly for the inverted index objects. When it excesses
the limitation of the usage, data will be stored in the database and deleted from
memory. It increases 380% in speed.
Performance Test
 Wikipedia Japanese / 10265 of articles / 130MB
 MySQL LIKE ‘%%’
 Project started on May 23, 1995
 Number of contributor(s) : 57 including Oracle and Google
 Number of search word(s) : 1, 2, 3
 Execution time (sec) : 2.71, 2.25, 2.02
 Groonga (Full text search engine)
 Project started on Jan 11, 2009
 Number of contributor(s) : 30
 Number of search word(s) : 1, 2, 3
 Execution time (sec) : 0.013, 0.016, 0.059
 Falcon
 Project started on Mar 8, 2015
 Number of contributor(s) : 1
 Number of search word(s) : 1, 2, 3
 Execution time (sec) : 0.137, 0.132, 0.170
Points to be improved
 Pursue scalability and higher performance
 Implement normalizer
 Search result should be sorted by high relativity between
search words and contents
 Develop an application using Falcon
 Highlight
 Snippet
 Keyword suggestion
 Possibility suggestion
 Error correction
 Pagination
Thank you

More Related Content

What's hot

Full text search
Full text searchFull text search
Full text search
deleteman
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
Swapnil & Patil
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
WO Community
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
GokulD
 
Url web design
Url web designUrl web design
Url web design
Cojo34
 

What's hot (20)

Lucene
LuceneLucene
Lucene
 
ProjectHub
ProjectHubProjectHub
ProjectHub
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
Lucene
LuceneLucene
Lucene
 
Full text search
Full text searchFull text search
Full text search
 
JavaEdge09 : Java Indexing and Searching
JavaEdge09 : Java Indexing and SearchingJavaEdge09 : Java Indexing and Searching
JavaEdge09 : Java Indexing and Searching
 
Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
 
Sphinx
SphinxSphinx
Sphinx
 
Meher ppt
Meher pptMeher ppt
Meher ppt
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
 
Meher ppt (1)
Meher ppt (1)Meher ppt (1)
Meher ppt (1)
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
 
Lucene indexing
Lucene indexingLucene indexing
Lucene indexing
 
Url web design
Url web designUrl web design
Url web design
 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015
 

Viewers also liked (8)

Class Diagram V2
Class Diagram V2Class Diagram V2
Class Diagram V2
 
Vertical Image Search Engine
 Vertical Image Search Engine Vertical Image Search Engine
Vertical Image Search Engine
 
Learning CakePHP from Source Code
Learning CakePHP from Source CodeLearning CakePHP from Source Code
Learning CakePHP from Source Code
 
How to create test data
How to create test dataHow to create test data
How to create test data
 
MySQL対応全文検索システムMroonga(むるんが)
MySQL対応全文検索システムMroonga(むるんが)MySQL対応全文検索システムMroonga(むるんが)
MySQL対応全文検索システムMroonga(むるんが)
 
Functional programming
Functional programmingFunctional programming
Functional programming
 
Search Engine project ppt
 Search Engine project ppt Search Engine project ppt
Search Engine project ppt
 
Introduction of Monaca
Introduction of MonacaIntroduction of Monaca
Introduction of Monaca
 

Similar to Falcon Full Text Search Engine

Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
GokulD
 
ApacheCon NA 2011 report
ApacheCon NA 2011 reportApacheCon NA 2011 report
ApacheCon NA 2011 report
Koji Kawamura
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
macrochen
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
Vikrant Chauhan
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
mghgk
 
2010 08-06 - sd ruby - solr
2010 08-06 - sd ruby - solr2010 08-06 - sd ruby - solr
2010 08-06 - sd ruby - solr
Nick Zadrozny
 
Model of semantic textual document clustering
Model of semantic textual document clusteringModel of semantic textual document clustering
Model of semantic textual document clustering
SK Ahammad Fahad
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
Abanti Aazmin
 

Similar to Falcon Full Text Search Engine (20)

Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
 
ApacheCon NA 2011 report
ApacheCon NA 2011 reportApacheCon NA 2011 report
ApacheCon NA 2011 report
 
Government Polytechnic Arvi-1.pptx
Government Polytechnic Arvi-1.pptxGovernment Polytechnic Arvi-1.pptx
Government Polytechnic Arvi-1.pptx
 
Shivam PPT.pptx
Shivam PPT.pptxShivam PPT.pptx
Shivam PPT.pptx
 
247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
ElasticSearch Basics
ElasticSearch Basics ElasticSearch Basics
ElasticSearch Basics
 
Filebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptxFilebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptx
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
 
2010 08-06 - sd ruby - solr
2010 08-06 - sd ruby - solr2010 08-06 - sd ruby - solr
2010 08-06 - sd ruby - solr
 
Solr Powr — Enterprise-grade search for your app
Solr Powr — Enterprise-grade search for your appSolr Powr — Enterprise-grade search for your app
Solr Powr — Enterprise-grade search for your app
 
Drupal & Summon: Keeping Article Discovery in the Library
Drupal & Summon: Keeping Article Discovery in the LibraryDrupal & Summon: Keeping Article Discovery in the Library
Drupal & Summon: Keeping Article Discovery in the Library
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Model of semantic textual document clustering
Model of semantic textual document clusteringModel of semantic textual document clustering
Model of semantic textual document clustering
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Recently uploaded (20)

%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 

Falcon Full Text Search Engine

  • 1. Falcon Full Text Search Engine Mar 28, 2015 Adamson University Master in Information Technology Advanced Object Oriented Programming Hideshi Ogoshi
  • 2. What is Falcon  Represents its speed and strength  Light weight full text search engine  Command line application  Provides http server mode  Written in Python programming language  Only 1 file and 421 lines of code  Data is stored in SQLite3 database  https://github.com/hideshi/Falcon
  • 3. What is full text search engine  A storage for the text documents  Extremely faster than SQL query which uses LIKE ‘%%’ partial match expression  Composed of index manager, index builder and search function  Has own data structure called ‘inverted index’  Each word is splitted into tokens by ‘tokenizer’
  • 4. What is tokenizer  Splits words, which are separated by spaces, into several tokens  Token is a group of characters  This is a book -> ‘This’, ‘is’, ‘a’, ‘book’  It’s useful for many languages which separate words by spaces like English, French, Tagalog, etc.  When it comes to applying it to Japanese or Chinese, etc, it will cause some problems because these languages don’t use spaces in their sentences.
  • 5. What is ngram tokenizer  Kinds of tokenizers which split words or sentences into several tokens  Each token has certain number of characters  Number of characters depends on the type of ngram tokenizer  unigram, bigram, trigram, etc.
  • 6. What is bigram  How bigram tokenizer split a sentence into tokens  Each token has two characters  English  This is a book -> ‘Th’, ‘hi’, ‘is’, ‘sa’, ‘ab’, ‘bo’, ‘oo’, ‘ok’  Japanese  これは本です -> ‘これ’, ‘れは’, ‘は本’, ‘本で’, ‘です’  Chinese  这是书 -> ‘这是’, ‘是书’
  • 7. What is inverted index  A structure of the data which provides a faster way to retrieve data Dictionary Posting List This 0 is 1 5 a 2 6 book 3 That 4 pen 7 This is a book. That is a pen.
  • 8. What is inverted index “government of the people, by the people, for the people, shall not perish from the earth.” {“by”, 1, {1: [4]}}, {“earth”, 1, {1: [15]}}, {“for”, 1, {1: [7]}}, {“from”, 1, {1: [13]}}, {“government”, 1, {1: [0]}}, {“not”, 1, {1: [11]}}, {“of”, 1, {1: [1]}}, {“people”, 3, {1: [3, 6, 9]}}, {“perish”, 1, {1: [12]}}, {“shall”, 1, {1: [10]}}, {“the”, 4, {1: [2, 5, 8, 14]}}
  • 9. Table Definition INDICES TOKEN TEXT PRIMARY KEY POSTING_LIST BLOB DOCUMENTS ID INTEGER PRIMARY KEY TITLE TEXT CONTENT BLOB
  • 11. Performance Tuning  A token which contains stop words composed of symbols like !”#$%&’()- =^~¥|@`[{;+:*]},<.>/?_ are ignored by tokenizer to reduce the time for creating index and searching.  Document contents are compressed using bzip2 algorithm to reduce the time for queries. Compression rate is 38.6% at most and average is 79.3%.  Turn off journal_mode and synchronous so as not to create unnecessary files when records are inserted. It increases 8% in speed.  Use bulk insert instead of executing insert statement for each record. It increases 11% in speed.  Falcon provides in-memory-database mode powered by SQLite3. So while creating index, Falcon creates new records in its memory so as to reduce the time of I/O accesses. Then after creating index, in-memory-database will be stored in a file. It increases 17% in speed.  Check memory usage constantly for the inverted index objects. When it excesses the limitation of the usage, data will be stored in the database and deleted from memory. It increases 380% in speed.
  • 12. Performance Test  Wikipedia Japanese / 10265 of articles / 130MB  MySQL LIKE ‘%%’  Project started on May 23, 1995  Number of contributor(s) : 57 including Oracle and Google  Number of search word(s) : 1, 2, 3  Execution time (sec) : 2.71, 2.25, 2.02  Groonga (Full text search engine)  Project started on Jan 11, 2009  Number of contributor(s) : 30  Number of search word(s) : 1, 2, 3  Execution time (sec) : 0.013, 0.016, 0.059  Falcon  Project started on Mar 8, 2015  Number of contributor(s) : 1  Number of search word(s) : 1, 2, 3  Execution time (sec) : 0.137, 0.132, 0.170
  • 13. Points to be improved  Pursue scalability and higher performance  Implement normalizer  Search result should be sorted by high relativity between search words and contents  Develop an application using Falcon  Highlight  Snippet  Keyword suggestion  Possibility suggestion  Error correction  Pagination