SlideShare a Scribd company logo
1 of 22
Document Search
Optimization using
Lucene API
Harsha Ummerpillai (harsha07@msu.edu)
Shyam Gedela (gedela@msu.edu)
Michigan State University
11/13/2014
1
During KD 2013 we presented MSU’s approach to optimizing
and improving performance of Rice Document search using
Lucene API. We are back this year to share the lessons learned
and results of our implementation.
Background
2
Kuali Days 2014
Indianapolis
Introduction
• Background
• Goals for Lucene implementation
• Technical Recap
• Implementation
• Performance Results
• How to
• Demo
3
Kuali Days 2014
Indianapolis
Background
• Document Search - Why is it important
• MSU implementation
– Go Live - Jan 1 2011
• Rice 2.1.9
• KFS 5.0.x
• KMM 1.x
• OOI 1.x
– ~4 yrs of operation
– ~4 million documents
– ~50 million search attributes
4
Kuali Days 2014
Indianapolis
Goals
• Goals for Lucene implementation
– Fast – Improved and consistent response
times
– Configurable – can be enabled/disabled
using configuration
– Seamless – No change to user screen
– Scalable
– Customizable
5
Technical Details Recap
6
Kuali Days 2014
Indianapolis
Document search
7
• Client applications define searchable attributes in Data
Dictionary.
• Rice extracts and builds index, saving key value pairs into
DB.
• Attributes saved into 4 different tables based on data types.
• Existing structure – 1 document to n indexed records
• Standard searchable fields
– Status codes
– Initiator
– Approver
– Action dates.
• Custom attributes defined by document types
Kuali Days 2014
Indianapolis
Document search
8
Kuali Days 2014
Indianapolis
Rice Doc Search
9
Kuali Days 2014
Indianapolis
Rice Doc Index w Lucene
10
Kuali Days 2014
Indianapolis
Rice Doc Index w Lucene
11
Key aspects of the implementation
Implementation
12
Kuali Days 2014
Indianapolis
Technical features
• Documents are queued for Lucene indexing with four
separate stages
– WAIT_FOR_REALTIME("0"),
– READY_FOR_REALTIME("1"),
– WAIT_FOR_MASTER("2"),
– READY_FOR_MASTER("3")
• Two indexes; master and real-time
• Master refreshed 3 times a day
• Real-time index refreshed every 5 seconds
• Single master node writes index to shared file storage
13
Kuali Days 2014
Indianapolis
Auto warming
14
Index Readers are auto warmed and queued in all nodes
Kuali Days 2014
Indianapolis
Index Storage
15
• Directory structure within Lucene Index store
• temp: Storage location before merge into active index
• meta-info: Index stats and message files
Results
16
Kuali Days 2014
Indianapolis
Performance Test Scenarios
• 7 business scenarios
• Invaluable for daily operations
– E.g. how many payment requests are
department approved but have not been
extracted by PDP (Vendors not paid)
17
Kuali Days 2014
Indianapolis
Performance Charts - Comparison
18
0
50000
100000
150000
200000
250000
300000
350000
ACCT Approver PO REQS PCDO PREQ CM
No Lucene
Lucene
We have created an open contribution JIRA CONTRIB-95 and
happy to provide latest fixes and patches from our production.
How To
19
Kuali Days 2014
Indianapolis
How to guide
• Visit contribution JIRA https://jira.kuali.org/browse/CONTRIB-95
• Download and apply the patch file to rice (base version - 2.1.9)
workspace.
• Add Lucene configuration properties to rice application configuration file.
• Setup shared file store location where index will be saved and shared.
• Add Lucene index queue table using lucene-setup.sql
• Build and start rice application with Lucene configuration enabled
• Visit “Administration > Lucene Administration “ click “Build Master Index”
• Click refresh link to see the status, when index.ready file is listed master
index is ready for use.
• Create a document and see if it is available in search, if real time indexer
is working new document should appear in search results within 5~10
seconds.
• Use administration page to see the latest status and manage the index.
20
Technical – Admin page
21
Kuali Days 2014
Indianapolis
References
• Lucene http://lucene.apache.org/core/
• KD 2013 Presentation
https://jira.kuali.org/secure/attachment/77886/KD-2013-Optimizing-
Document-Search-using-Lucene.pptx
• CONTRIB-95 https://jira.kuali.org/browse/CONTRIB-95
22

More Related Content

What's hot

Spca2014 apps samhassani81-2
Spca2014 apps samhassani81-2Spca2014 apps samhassani81-2
Spca2014 apps samhassani81-2NCCOMMS
 
(ATS6-APP06) Accelrys LIMS and Accelrys ELN integration
(ATS6-APP06) Accelrys LIMS and Accelrys ELN integration    (ATS6-APP06) Accelrys LIMS and Accelrys ELN integration
(ATS6-APP06) Accelrys LIMS and Accelrys ELN integration BIOVIA
 
Creative Ways to Leverage Operational Data
Creative Ways to Leverage Operational DataCreative Ways to Leverage Operational Data
Creative Ways to Leverage Operational DataCartegraph
 
LinkedIn Job Description--Specialist, Applications Information & Collaboration
LinkedIn Job Description--Specialist, Applications Information & CollaborationLinkedIn Job Description--Specialist, Applications Information & Collaboration
LinkedIn Job Description--Specialist, Applications Information & CollaborationSarah Gibbs
 
T44u 2015, upgrading to 8
T44u 2015, upgrading to 8T44u 2015, upgrading to 8
T44u 2015, upgrading to 8Terminalfour
 

What's hot (8)

Cloud monitoring with Applications Manager
Cloud monitoring with Applications ManagerCloud monitoring with Applications Manager
Cloud monitoring with Applications Manager
 
Spca2014 apps samhassani81-2
Spca2014 apps samhassani81-2Spca2014 apps samhassani81-2
Spca2014 apps samhassani81-2
 
Projects
ProjectsProjects
Projects
 
(ATS6-APP06) Accelrys LIMS and Accelrys ELN integration
(ATS6-APP06) Accelrys LIMS and Accelrys ELN integration    (ATS6-APP06) Accelrys LIMS and Accelrys ELN integration
(ATS6-APP06) Accelrys LIMS and Accelrys ELN integration
 
Creative Ways to Leverage Operational Data
Creative Ways to Leverage Operational DataCreative Ways to Leverage Operational Data
Creative Ways to Leverage Operational Data
 
LinkedIn Job Description--Specialist, Applications Information & Collaboration
LinkedIn Job Description--Specialist, Applications Information & CollaborationLinkedIn Job Description--Specialist, Applications Information & Collaboration
LinkedIn Job Description--Specialist, Applications Information & Collaboration
 
eCertificate
eCertificateeCertificate
eCertificate
 
T44u 2015, upgrading to 8
T44u 2015, upgrading to 8T44u 2015, upgrading to 8
T44u 2015, upgrading to 8
 

Similar to KD-2014-Optimizing-Document-Search-using-Lucene

Lehigh University OLE Implementation: Success and Lessons Learned
Lehigh University OLE Implementation: Success and Lessons LearnedLehigh University OLE Implementation: Success and Lessons Learned
Lehigh University OLE Implementation: Success and Lessons LearnedDoreen Herold
 
Shop talk - Project Server 2013
Shop talk - Project Server 2013Shop talk - Project Server 2013
Shop talk - Project Server 2013Chris Givens
 
Major Project Enterprise Resource Planning for Distribution Companies Present...
Major Project Enterprise Resource Planning for Distribution Companies Present...Major Project Enterprise Resource Planning for Distribution Companies Present...
Major Project Enterprise Resource Planning for Distribution Companies Present...TheKojuEffect
 
PPT AnalyticsCloud0415
PPT AnalyticsCloud0415PPT AnalyticsCloud0415
PPT AnalyticsCloud0415Inspirage
 
Oracle Application Extensions for Oracle Endeca - for Application DBA's
Oracle Application Extensions for Oracle Endeca - for Application DBA'sOracle Application Extensions for Oracle Endeca - for Application DBA's
Oracle Application Extensions for Oracle Endeca - for Application DBA'sRavi Madabhushanam
 
Asset Virtualization - Digitize, then add Intelligence 2014
Asset Virtualization - Digitize, then add Intelligence 2014Asset Virtualization - Digitize, then add Intelligence 2014
Asset Virtualization - Digitize, then add Intelligence 2014PaulOberle
 
Unit4 business-world-on-new-in-spring17-edition-170412
Unit4 business-world-on-new-in-spring17-edition-170412Unit4 business-world-on-new-in-spring17-edition-170412
Unit4 business-world-on-new-in-spring17-edition-170412Unit4
 
KD-2013-Optimizing-Document-Search-using-Lucene
KD-2013-Optimizing-Document-Search-using-LuceneKD-2013-Optimizing-Document-Search-using-Lucene
KD-2013-Optimizing-Document-Search-using-LuceneHarshakumar Ummerpillai
 
Bi4.1 and beyond
Bi4.1 and beyondBi4.1 and beyond
Bi4.1 and beyondsapbisignz
 
CREATE SEARCH DRIVEN BUSINESS INTELLIGENCE APPLICATION USING FAST SEARCH FO...
CREATE SEARCH DRIVEN BUSINESS  INTELLIGENCE APPLICATION USING  FAST SEARCH FO...CREATE SEARCH DRIVEN BUSINESS  INTELLIGENCE APPLICATION USING  FAST SEARCH FO...
CREATE SEARCH DRIVEN BUSINESS INTELLIGENCE APPLICATION USING FAST SEARCH FO...Netwoven Inc.
 
Prasad Doddi - Hyeprion Developer
Prasad Doddi - Hyeprion DeveloperPrasad Doddi - Hyeprion Developer
Prasad Doddi - Hyeprion Developerprasad doddi
 
Prasad Doddi - Hyeprion Developer
Prasad Doddi - Hyeprion DeveloperPrasad Doddi - Hyeprion Developer
Prasad Doddi - Hyeprion Developerprasad doddi
 
Alliance 2017 3891-University of California | Office of The President People...
Alliance 2017  3891-University of California | Office of The President People...Alliance 2017  3891-University of California | Office of The President People...
Alliance 2017 3891-University of California | Office of The President People...Smart ERP Solutions, Inc.
 

Similar to KD-2014-Optimizing-Document-Search-using-Lucene (20)

Lehigh University OLE Implementation: Success and Lessons Learned
Lehigh University OLE Implementation: Success and Lessons LearnedLehigh University OLE Implementation: Success and Lessons Learned
Lehigh University OLE Implementation: Success and Lessons Learned
 
Shop talk - Project Server 2013
Shop talk - Project Server 2013Shop talk - Project Server 2013
Shop talk - Project Server 2013
 
Major Project Enterprise Resource Planning for Distribution Companies Present...
Major Project Enterprise Resource Planning for Distribution Companies Present...Major Project Enterprise Resource Planning for Distribution Companies Present...
Major Project Enterprise Resource Planning for Distribution Companies Present...
 
Resume
ResumeResume
Resume
 
PPT AnalyticsCloud0415
PPT AnalyticsCloud0415PPT AnalyticsCloud0415
PPT AnalyticsCloud0415
 
Oracle Application Extensions for Oracle Endeca - for Application DBA's
Oracle Application Extensions for Oracle Endeca - for Application DBA'sOracle Application Extensions for Oracle Endeca - for Application DBA's
Oracle Application Extensions for Oracle Endeca - for Application DBA's
 
Asset Virtualization - Digitize, then add Intelligence 2014
Asset Virtualization - Digitize, then add Intelligence 2014Asset Virtualization - Digitize, then add Intelligence 2014
Asset Virtualization - Digitize, then add Intelligence 2014
 
Unit4 business-world-on-new-in-spring17-edition-170412
Unit4 business-world-on-new-in-spring17-edition-170412Unit4 business-world-on-new-in-spring17-edition-170412
Unit4 business-world-on-new-in-spring17-edition-170412
 
KD-2013-Optimizing-Document-Search-using-Lucene
KD-2013-Optimizing-Document-Search-using-LuceneKD-2013-Optimizing-Document-Search-using-Lucene
KD-2013-Optimizing-Document-Search-using-Lucene
 
Bi4.1 and beyond
Bi4.1 and beyondBi4.1 and beyond
Bi4.1 and beyond
 
AhmedWasfi2015
AhmedWasfi2015AhmedWasfi2015
AhmedWasfi2015
 
CREATE SEARCH DRIVEN BUSINESS INTELLIGENCE APPLICATION USING FAST SEARCH FO...
CREATE SEARCH DRIVEN BUSINESS  INTELLIGENCE APPLICATION USING  FAST SEARCH FO...CREATE SEARCH DRIVEN BUSINESS  INTELLIGENCE APPLICATION USING  FAST SEARCH FO...
CREATE SEARCH DRIVEN BUSINESS INTELLIGENCE APPLICATION USING FAST SEARCH FO...
 
BbW2012 - LN
BbW2012 - LNBbW2012 - LN
BbW2012 - LN
 
CV
CVCV
CV
 
Prasad Doddi - Hyeprion Developer
Prasad Doddi - Hyeprion DeveloperPrasad Doddi - Hyeprion Developer
Prasad Doddi - Hyeprion Developer
 
Prasad Doddi - Hyeprion Developer
Prasad Doddi - Hyeprion DeveloperPrasad Doddi - Hyeprion Developer
Prasad Doddi - Hyeprion Developer
 
UCPath at UCOP
UCPath at UCOPUCPath at UCOP
UCPath at UCOP
 
Alliance 2017 3891-University of California | Office of The President People...
Alliance 2017  3891-University of California | Office of The President People...Alliance 2017  3891-University of California | Office of The President People...
Alliance 2017 3891-University of California | Office of The President People...
 
Umesh_Kumar
Umesh_KumarUmesh_Kumar
Umesh_Kumar
 
Umesh_Kumar
Umesh_KumarUmesh_Kumar
Umesh_Kumar
 

KD-2014-Optimizing-Document-Search-using-Lucene

  • 1. Document Search Optimization using Lucene API Harsha Ummerpillai (harsha07@msu.edu) Shyam Gedela (gedela@msu.edu) Michigan State University 11/13/2014 1
  • 2. During KD 2013 we presented MSU’s approach to optimizing and improving performance of Rice Document search using Lucene API. We are back this year to share the lessons learned and results of our implementation. Background 2
  • 3. Kuali Days 2014 Indianapolis Introduction • Background • Goals for Lucene implementation • Technical Recap • Implementation • Performance Results • How to • Demo 3
  • 4. Kuali Days 2014 Indianapolis Background • Document Search - Why is it important • MSU implementation – Go Live - Jan 1 2011 • Rice 2.1.9 • KFS 5.0.x • KMM 1.x • OOI 1.x – ~4 yrs of operation – ~4 million documents – ~50 million search attributes 4
  • 5. Kuali Days 2014 Indianapolis Goals • Goals for Lucene implementation – Fast – Improved and consistent response times – Configurable – can be enabled/disabled using configuration – Seamless – No change to user screen – Scalable – Customizable 5
  • 7. Kuali Days 2014 Indianapolis Document search 7 • Client applications define searchable attributes in Data Dictionary. • Rice extracts and builds index, saving key value pairs into DB. • Attributes saved into 4 different tables based on data types. • Existing structure – 1 document to n indexed records • Standard searchable fields – Status codes – Initiator – Approver – Action dates. • Custom attributes defined by document types
  • 10. Kuali Days 2014 Indianapolis Rice Doc Index w Lucene 10
  • 11. Kuali Days 2014 Indianapolis Rice Doc Index w Lucene 11
  • 12. Key aspects of the implementation Implementation 12
  • 13. Kuali Days 2014 Indianapolis Technical features • Documents are queued for Lucene indexing with four separate stages – WAIT_FOR_REALTIME("0"), – READY_FOR_REALTIME("1"), – WAIT_FOR_MASTER("2"), – READY_FOR_MASTER("3") • Two indexes; master and real-time • Master refreshed 3 times a day • Real-time index refreshed every 5 seconds • Single master node writes index to shared file storage 13
  • 14. Kuali Days 2014 Indianapolis Auto warming 14 Index Readers are auto warmed and queued in all nodes
  • 15. Kuali Days 2014 Indianapolis Index Storage 15 • Directory structure within Lucene Index store • temp: Storage location before merge into active index • meta-info: Index stats and message files
  • 17. Kuali Days 2014 Indianapolis Performance Test Scenarios • 7 business scenarios • Invaluable for daily operations – E.g. how many payment requests are department approved but have not been extracted by PDP (Vendors not paid) 17
  • 18. Kuali Days 2014 Indianapolis Performance Charts - Comparison 18 0 50000 100000 150000 200000 250000 300000 350000 ACCT Approver PO REQS PCDO PREQ CM No Lucene Lucene
  • 19. We have created an open contribution JIRA CONTRIB-95 and happy to provide latest fixes and patches from our production. How To 19
  • 20. Kuali Days 2014 Indianapolis How to guide • Visit contribution JIRA https://jira.kuali.org/browse/CONTRIB-95 • Download and apply the patch file to rice (base version - 2.1.9) workspace. • Add Lucene configuration properties to rice application configuration file. • Setup shared file store location where index will be saved and shared. • Add Lucene index queue table using lucene-setup.sql • Build and start rice application with Lucene configuration enabled • Visit “Administration > Lucene Administration “ click “Build Master Index” • Click refresh link to see the status, when index.ready file is listed master index is ready for use. • Create a document and see if it is available in search, if real time indexer is working new document should appear in search results within 5~10 seconds. • Use administration page to see the latest status and manage the index. 20
  • 22. Kuali Days 2014 Indianapolis References • Lucene http://lucene.apache.org/core/ • KD 2013 Presentation https://jira.kuali.org/secure/attachment/77886/KD-2013-Optimizing- Document-Search-using-Lucene.pptx • CONTRIB-95 https://jira.kuali.org/browse/CONTRIB-95 22