MontySolr:Embedding CPython in Solr          Roman Chyla, CERN   roman.chyla@cern.ch, May 26, 2011
Why should I care?- Our challenge is to connect Python and Java- Without compromises- We created MontySolr extension   -  ...
Outline‣ Context- The Challenge- Key components  - Available technologies  - Our approach  - Problems solved- Evaluation- ...
CERN- European Organization for Nuclear Research  - Switzerland, Geneva- The largest laboratory for High Energy Physics- H...
CERN- European Organization for Nuclear Research  - Switzerland, Geneva- The largest laboratory for High Energy Physics- H...
CERN- European Organization for Nuclear Research  - Switzerland, Geneva- The largest laboratory for High Energy Physics- H...
CERN- European Organization for Nuclear Research  - Switzerland, Geneva- The largest laboratory for High Energy Physics- H...
CERN- European Organization for Nuclear Research  - Switzerland, Geneva- The largest laboratory for High Energy Physics- H...
CERN- European Organization for Nuclear Research  - Switzerland, Geneva- The largest laboratory for High Energy Physics- H...
CERN- European Organization for Nuclear Research  - Switzerland, Geneva- The largest laboratory for High Energy Physics- H...
CERN- European Organization for Nuclear Research  - Switzerland, Geneva- The largest laboratory for High Energy Physics- H...
SPIRES- Stanford Linear Accelerator Center - SLAC- High-Energy Physics Literature Database- Started December 1991  - The f...
SPIRES- Stanford Linear Accelerator Center - SLAC- High-Energy Physics Literature Database- Started December 1991  - The f...
6
7
Invenio- Integrated digital library software behind INSPIRE- Used by very large institutional repositories   - http://repo...
Outline- Context‣ The Challenge- Key components  - Available technologies  - Our approach  - Problems solved- Evaluation- ...
The Challenge- HEP scientific community   - Searches metadata oriented- However fulltexts are changing the situation- And ...
The Challenge  Invenio                11
The Challenge  Query: supersymmetry AND author:ellis  Invenio                                          11
The Challenge  Query: supersymmetry AND author:ellis  Invenio        fulltext:supersymmetry                               ...
The Challenge  Query: supersymmetry AND author:ellis  Invenio        fulltext:supersymmetry                 IDs: 1;2;3;9.....
The Challenge  Query: supersymmetry AND author:ellis  Invenio        fulltext:supersymmetry                 IDs: 1;2;3;9.....
The Challenge  Query: supersymmetry AND author:ellis  Invenio        fulltext:supersymmetry                 IDs: 1;2;3;9.....
The Challenge  Query: supersymmetry AND author:ellis  Invenio        fulltext:supersymmetry                 IDs: 1;2;3;9.....
The Challenge  Query: supersymmetry AND author:ellis  Invenio        fulltext:supersymmetry                       1-6M IDs...
The Challenge  Query: supersymmetry AND author:ellis  Invenio        fulltext:supersymmetry                       1-6M IDs...
The Challenge  Query: supersymmetry AND author:ellis  Invenio        fulltext:supersymmetry                       1-6M IDs...
The Challenge                                3. push IDs ?                                (eg._faceting)  Query: supersymm...
What is the “best” solution?- We love Python...- ...and our applications are written in Python...- But what if Solr is the...
Outline- Context- The Challenge‣ Key components  - Available technologies  - Our approach  - Evaluation- Demonstration- Wr...
To embed Solr (in Java app)- Your app simulates Java web container?  - use EmbeddedSolrServer- It knows nothing about Java...
To embed Solr (in Java app)- Your app simulates Java web container?  - use EmbeddedSolrServer- It knows nothing about Java...
To embed Solr (in Java app)- Your app simulates Java web container?  - use EmbeddedSolrServer- It knows nothing about Java...
To embed Solr (in Java app)- Your app simulates Java web container?  - use EmbeddedSolrServer- It knows nothing about Java...
To embed Solr (in Java app)- Your app simulates Java web container?  - use EmbeddedSolrServer- It knows nothing about Java...
To use Solr in non-Java app- Solr is already usable via HTTP requests, but we  need something else here...- Remote objects...
Jython?- Implementation of Python in 100% Java- Both Java and Python code- Truly multithreaded- C modules will not work  -...
Jython?- Implementation of Python in 100% Java- Both Java and Python code- Truly multithreaded- C modules will not work  -...
Jython?- Implementation of Python in 100% Java- Both Java and Python code- Truly multithreaded- C modules will not work  -...
JEPP - Java Embedded Python- Python code runs inside  Python interpreter- Embeds CPython interpreter  via Java Native Inte...
JEPP - Java Embedded Python                              19
JCC- Embeds JVM in Python- C++ code generator- C++ object interface  wraps a Java library- C++ wrappers conform  to Python...
JCC      21
JCC      21
JCC      21
To use Solr in non-Java app              Jython   JCC    JEPPPython                 ✓       ✓CModulesSpeed                ...
The first try                       Invenio                Solr                       JCC                                 23
Devil is in details...                         24
GIL - Global Interpreter Lock    Unfortunately Python webapp is not like Java...                                          ...
GIL - Global Interpreter LockWe can have 200 threads, but only 4 will run at time...                                      ...
GIL - Global Interpreter Lock                                27
Fortunately solution exists- JCC can embed Python inside Java   - Special thanks to Andi Vajda! (JCC creator)- We write ‘e...
The second try                       Solr /w Invenio    Invenio              (backend)   frontend                 XML     ...
Implementing the bridge- Special Java class- With method pythonExtension()- Native method pythonDecRef()  - JCC provides i...
MontySolr extension- JCC has great potential, but also added  complexity...- So the MontySolr project was born  - Modules ...
Hello World - Java partpublic class MontySolrBridge extends BasicBridge implementsPythonBridge {	   private long pythonObj...
Hello World - Python partfrom montysolr import MontySolrBridgeclass SimpleBridge(MontySolrBridge):    def __init__(self): ...
Example - running MontySolr- Java side  - JRE (32/64 bit)  - Standard Solr/Lucene jars  - JCC dynamic library- Python side...
Solr as search service                         Solr /w Invenio    Invenio                (backend)   frontend             ...
Example             Solr  MyCustom   Handler                    36
Example refersto:author:ellis                         Solr  MyCustom   Handler                                37
Example - Solr custom handler	   MontySolrVM.INSTANCE.sendMessage(message);		   PythonMessage msg = MontySolrVM.INSTANCE	 ...
Example - JNI connection refersto:author:ellis                                  Solr  MyCustom               Python   Hand...
Example - JNI connection refersto:author:ellis                                           Solr  MyCustom               Pyth...
Example - Python side    # handler is made ‘visible’ at startup    SolrpieTarget(Invenio:perform_search,         perform_s...
Example refersto:author:ellis                                                Solr                                         ...
Example - Java side again    MontySolrVM.INSTANCE.sendMessage(message);	   		   PythonMessage msg = MontySolrVM.INSTANCE	 ...
Solr as search service                         Solr /w Invenio   Apache                  (backend)  webserver             ...
Outline- Context- The Challenge- Key components  - Available technologies  - Our approach  - Problems solved‣ Evaluation- ...
Memory and garbage collection                                46
Comparing speed and load...                              47
The effect of cache                      48
Robust?- Extensive siege tests show very good  performance and stability under high load   - 100-200 users, complex search...
Easy to develop/maintain?- Added complexity  - Java in the toolbox  - Need to compile C++ extensions  - Python/OS version ...
Outline- Context- The Challenge- Key components  - Available technologies  - Our approach  - Problems solved- Evaluation‣ ...
Wrap-up- Our challenge was to connect two different  languages/systems- And we wanted to get the best of the two...   - So...
Questions?- MontySolr  - https://github.com/romanchyla/montysolr- Roman Chyla  -   Fellow, CERN Scientific Information Ser...
Additional information                         54
Links- Invenio platform   - http://invenio-software.org/- INSPIRE Digital library   - http://inspirebeta.net/- Diagrams of...
#1 - How to embed Solr (standard)- solr.client.solrj.embedded.EmbeddedSolrServer                                          ...
#2 - How to embed Solr (simplified)- solr.servlet.DirectSolrConnection- like previous, but simpler- all the queries are se...
#2 - How to embed Solr (simplified)- solr.servlet.DirectSolrConnection- like previous, but simpler- all the queries are se...
#3 - Example of a Solr custom handler                                        58
#4 - Example Python handler                              59
Upcoming SlideShare
Loading in...5
×

Lucene revolutionmontysolr 2011_presentation

547

Published on

See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
547
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • mention the transition/collaboration: cern-desy-fermilab-slac\n
  • paradigm of a full result set\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Python: fast-prototyping, easy for students (who write a lot of the code)\n
  • \n
  • X - not spend time on the code\n“I was waiting for the point ‘this is the solution’” :)\n\nad #1\n solr.client.solrj.embedded.EmbeddedSolrServer\n Solr is running as an embedded process, not inside a servlet container\n the default/recommended way\nad #2\n solr.servlet.DirectSolrConnect\n like previous, but simpler\n all the queries are sent as strings, everything is just a string\n very flexible and probably suitable for quick integration\n
  • X - not spend time on the code\n“I was waiting for the point ‘this is the solution’” :)\n\nad #1\n solr.client.solrj.embedded.EmbeddedSolrServer\n Solr is running as an embedded process, not inside a servlet container\n the default/recommended way\nad #2\n solr.servlet.DirectSolrConnect\n like previous, but simpler\n all the queries are sent as strings, everything is just a string\n very flexible and probably suitable for quick integration\n
  • X - not spend time on the code\n“I was waiting for the point ‘this is the solution’” :)\n\nad #1\n solr.client.solrj.embedded.EmbeddedSolrServer\n Solr is running as an embedded process, not inside a servlet container\n the default/recommended way\nad #2\n solr.servlet.DirectSolrConnect\n like previous, but simpler\n all the queries are sent as strings, everything is just a string\n very flexible and probably suitable for quick integration\n
  • X - not spend time on the code\n“I was waiting for the point ‘this is the solution’” :)\n\nad #1\n solr.client.solrj.embedded.EmbeddedSolrServer\n Solr is running as an embedded process, not inside a servlet container\n the default/recommended way\nad #2\n solr.servlet.DirectSolrConnect\n like previous, but simpler\n all the queries are sent as strings, everything is just a string\n very flexible and probably suitable for quick integration\n
  • I don’t mention some options like writing JNI ourselves or using intermediaries other than remote objects (eg. shared memory, if that would be possible)\n
  • everybody thinks Jython, right? No!\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • These are only some important features, omitted is simplicity and beauty (JEPP eval is just ugly way of doing things), documentation, community, support etc.\n
  • \n
  • \n
  • Make sure that it is clear that processes can have threads - here it is not clear what is process and what is thread (it is not visible)\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Truly bi-directional\n We can call Python functions and pass Java objects\n From inside Python we can call Java object/methods\n
  • \n
  • the real-code example is in appendix #3\n
  • \n
  • \n
  • the real code is in appendix #4\n
  • note: don’t forget to mention how the multiprocessing is saving memory on the linux systems (due to the read-write and forking). This is effectively an alternative to Python WSGI that cannot run multiprocessing. We show that it is possible to use multiprocessing effectively.\n
  • the real code is in appendix #3\n\n
  • \n
  • more precise - montysolr intro (include)\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • TODO:\nInvnenio is the same as Django\nToday, Solr can now do 2nd order operations\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Lucene revolutionmontysolr 2011_presentation

    1. 1. MontySolr:Embedding CPython in Solr Roman Chyla, CERN roman.chyla@cern.ch, May 26, 2011
    2. 2. Why should I care?- Our challenge is to connect Python and Java- Without compromises- We created MontySolr extension - Robust, tested (will be used by our system) - But works for any Python application (eg. Django) - And for any C/C++ app that Python understands! - Open source (GPL v2)- Try it out! - https://github.com/romanchyla/montysolr 2
    3. 3. Outline‣ Context- The Challenge- Key components - Available technologies - Our approach - Problems solved- Evaluation- Wrap-up 3
    4. 4. CERN- European Organization for Nuclear Research - Switzerland, Geneva- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide 4
    5. 5. CERN- European Organization for Nuclear Research - Switzerland, Geneva- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide 4
    6. 6. CERN- European Organization for Nuclear Research - Switzerland, Geneva- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide 4
    7. 7. CERN- European Organization for Nuclear Research - Switzerland, Geneva- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide 4
    8. 8. CERN- European Organization for Nuclear Research - Switzerland, Geneva- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide 4
    9. 9. CERN- European Organization for Nuclear Research - Switzerland, Geneva- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide 4
    10. 10. CERN- European Organization for Nuclear Research - Switzerland, Geneva- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide 4
    11. 11. CERN- European Organization for Nuclear Research - Switzerland, Geneva- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide 4
    12. 12. SPIRES- Stanford Linear Accelerator Center - SLAC- High-Energy Physics Literature Database- Started December 1991 - The first web outside Europe/CERN - The first database on web 5
    13. 13. SPIRES- Stanford Linear Accelerator Center - SLAC- High-Energy Physics Literature Database- Started December 1991 - The first web outside Europe/CERN - The first database on web 5
    14. 14. 6
    15. 15. 7
    16. 16. Invenio- Integrated digital library software behind INSPIRE- Used by very large institutional repositories - http://repositories.webometrics.info/toprep_inst.asp- Customizable virtual collections- Flexible management of metadata - 3 000 authors per article- Powerful search engine - Incl. citation map analysis- Written in Python (since 2001) - 290 000 lines of code 8
    17. 17. Outline- Context‣ The Challenge- Key components - Available technologies - Our approach - Problems solved- Evaluation- Wrap-up 9
    18. 18. The Challenge- HEP scientific community - Searches metadata oriented- However fulltexts are changing the situation- And we want to provide even better service - Bigger volumes of data - NLP processing - Semantic search 10
    19. 19. The Challenge Invenio 11
    20. 20. The Challenge Query: supersymmetry AND author:ellis Invenio 11
    21. 21. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 11
    22. 22. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11
    23. 23. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11
    24. 24. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11
    25. 25. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11
    26. 26. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9.... 11
    27. 27. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9.... 1. only IDs, no score = no ranking 11
    28. 28. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9....2. score merging 1. only IDs,difficult (if no scoreavailable) = no ranking 11
    29. 29. The Challenge 3. push IDs ? (eg._faceting) Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9....2. score merging 1. only IDs,difficult (if no scoreavailable) = no ranking 11
    30. 30. What is the “best” solution?- We love Python...- ...and our applications are written in Python...- But what if Solr is the master search engine?- Merge results inside Solr? - Typical size: 1-10 mil. IDs - Expected latency: 1-2 s.- What we want to achieve: - Fast transfer of hits from Invenio to Solr - Leverage the power of both (no compromises) - Developer-friendly integration, simplicity- Additional concerns: 12
    31. 31. Outline- Context- The Challenge‣ Key components - Available technologies - Our approach - Evaluation- Demonstration- Wrap-up 13
    32. 32. To embed Solr (in Java app)- Your app simulates Java web container? - use EmbeddedSolrServer- It knows nothing about Java servlets? - use DirectConnect class- Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14
    33. 33. To embed Solr (in Java app)- Your app simulates Java web container? - use EmbeddedSolrServer- It knows nothing about Java servlets? - use DirectConnect class- Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14
    34. 34. To embed Solr (in Java app)- Your app simulates Java web container? - use EmbeddedSolrServer- It knows nothing about Java servlets? - use DirectConnect class- Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14
    35. 35. To embed Solr (in Java app)- Your app simulates Java web container? - use EmbeddedSolrServer- It knows nothing about Java servlets? - use DirectConnect class- Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14
    36. 36. To embed Solr (in Java app)- Your app simulates Java web container? - use EmbeddedSolrServer- It knows nothing about Java servlets? - use DirectConnect class- Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14
    37. 37. To use Solr in non-Java app- Solr is already usable via HTTP requests, but we need something else here...- Remote objects/calls? - Pyro, execnet, CORBA, SOAP... - or simply pipes?- Access Python from Java? - Jython - JEPP- Access Java from Python? - JPype - JCC 15
    38. 38. Jython?- Implementation of Python in 100% Java- Both Java and Python code- Truly multithreaded- C modules will not work - but see http://bit.ly/iTRYbb- Slower than CPython 16
    39. 39. Jython?- Implementation of Python in 100% Java- Both Java and Python code- Truly multithreaded- C modules will not work - but see http://bit.ly/iTRYbb- Slower than CPython 17
    40. 40. Jython?- Implementation of Python in 100% Java- Both Java and Python code- Truly multithreaded- C modules will not work - but see http://bit.ly/iTRYbb- Slower than CPython 17
    41. 41. JEPP - Java Embedded Python- Python code runs inside Python interpreter- Embeds CPython interpreter via Java Native Interface (JNI) in Java- http://jepp.sourceforge.net/ - recently updated (27-Jan) - but JCC is more active 18
    42. 42. JEPP - Java Embedded Python 19
    43. 43. JCC- Embeds JVM in Python- C++ code generator- C++ object interface wraps a Java library- C++ wrappers conform to Pythons C type system- result: complete Python extension module 20
    44. 44. JCC 21
    45. 45. JCC 21
    46. 46. JCC 21
    47. 47. To use Solr in non-Java app Jython JCC JEPPPython ✓ ✓CModulesSpeed ✓ ?No code ✓ ✓changesAccess from ✓ ✓PythonAccess from ✓ ... ✓Java 22
    48. 48. The first try Invenio Solr JCC 23
    49. 49. Devil is in details... 24
    50. 50. GIL - Global Interpreter Lock Unfortunately Python webapp is not like Java... 25
    51. 51. GIL - Global Interpreter LockWe can have 200 threads, but only 4 will run at time... 26
    52. 52. GIL - Global Interpreter Lock 27
    53. 53. Fortunately solution exists- JCC can embed Python inside Java - Special thanks to Andi Vajda! (JCC creator)- We write ‘empty’ classes in Java ...- ... and implement them in Python Python /w Java inside Java /w Python inside 28
    54. 54. The second try Solr /w Invenio Invenio (backend) frontend XML JCC 29
    55. 55. Implementing the bridge- Special Java class- With method pythonExtension()- Native method pythonDecRef() - JCC provides its implementation- And number of other native methods - These will be implemented using Python- Like writing JNI Java/C code but without compilation... 30
    56. 56. MontySolr extension- JCC has great potential, but also added complexity...- So the MontySolr project was born - Modules must be built in shared mode - JCC dynamic library loaded and started from the main thread - Simple mechanism of the Python bridge and message - Configurable handlers on the Python side - Secured dereferencing of the native objects - Threading on the Java side - Multiprocessing on the Python side - Easy ant targets (compilation) ... 31
    57. 57. Hello World - Java partpublic class MontySolrBridge extends BasicBridge implementsPythonBridge { private long pythonObject; public void pythonExtension(long pythonObject) { this.pythonObject = pythonObject; } public long pythonExtension() { return this.pythonObject; } public void finalize() throws Throwable { pythonDecRef(); } public native void pythonDecRef(); public void sendMessage(PythonMessage message) { PythonVM vm = PythonVM.get(); vm.acquireThreadState(); receive_message(message); vm.releaseThreadState(); } public native void receive_message(PythonMessage message);} 32
    58. 58. Hello World - Python partfrom montysolr import MontySolrBridgeclass SimpleBridge(MontySolrBridge): def __init__(self): super(SimpleBridge, self).__init__() def receive_message(self, message): query = message.getParam(‘query’) message.setResults(‘Hello world!’) print ‘Python received from Java:’, query 33
    59. 59. Example - running MontySolr- Java side - JRE (32/64 bit) - Standard Solr/Lucene jars - JCC dynamic library- Python side - Python interpreter (32/64 bit) - 4 Python modules (jcc, solr, lucene, montysolr)- In the main thread - First we load JCC - Then start Python interpreter ... - ... load Python handlers 34
    60. 60. Solr as search service Solr /w Invenio Invenio (backend) frontend XML JCC 35
    61. 61. Example Solr MyCustom Handler 36
    62. 62. Example refersto:author:ellis Solr MyCustom Handler 37
    63. 63. Example - Solr custom handler MontySolrVM.INSTANCE.sendMessage(message); PythonMessage msg = MontySolrVM.INSTANCE .createMessage("perform_search") .setSender("Invenio") .setParam("query","refersto:author:ellis"); MontySolrVM.INSTANCE.sendMessage(msg); Object result = msg.getResults(); if (result != null) { int[] hits = (int[]) message.getResults(); } 38
    64. 64. Example - JNI connection refersto:author:ellis Solr MyCustom Python Handler Bridge 39
    65. 65. Example - JNI connection refersto:author:ellis Solr MyCustom Python Invenio Handler Bridge wrappers 40
    66. 66. Example - Python side # handler is made ‘visible’ at startup SolrpieTarget(Invenio:perform_search, perform_search) # search time - called from Java def perform_search(message): query = message.getParam(“query”) hits = call_real_search(query) # cast Python list into Java array message.setResults(JArray_ints(hits)) 41
    67. 67. Example refersto:author:ellis Solr Invenio Invenio MyCustom Python Invenio Handler Bridge wrappers Invenio Invenio 42
    68. 68. Example - Java side again MontySolrVM.INSTANCE.sendMessage(message); PythonMessage msg = MontySolrVM.INSTANCE .createMessage("perform_search") .setSender("Invenio") .setParam("query","refersto:author:ellis"); MontySolrVM.INSTANCE.sendMessage(msg); Object result = msg.getResults(); if (result != null) { int[] hits = (int[]) message.getResults(); } 43
    69. 69. Solr as search service Solr /w Invenio Apache (backend) webserver XML Invenio Invenio JCC 44
    70. 70. Outline- Context- The Challenge- Key components - Available technologies - Our approach - Problems solved‣ Evaluation- Wrap-up 45
    71. 71. Memory and garbage collection 46
    72. 72. Comparing speed and load... 47
    73. 73. The effect of cache 48
    74. 74. Robust?- Extensive siege tests show very good performance and stability under high load - 100-200 users, complex searches - 50 concurrent users, citation analysis - JCC incurs small overhead- We detected no memory leaks - The same as dbpedia.org- But watch out for errors in C - An error in C module brings down the whole JVM - (errors in pure Python module can be handled) 49
    75. 75. Easy to develop/maintain?- Added complexity - Java in the toolbox - Need to compile C++ extensions - Python/OS version dependencies- For this we get - Easy integration with Invenio - The best of two applications - A lot of features for free - And we can control Solr from Python! 50
    76. 76. Outline- Context- The Challenge- Key components - Available technologies - Our approach - Problems solved- Evaluation‣ Wrap-up 51
    77. 77. Wrap-up- Our challenge was to connect two different languages/systems- And we wanted to get the best of the two... - So we had to plug Python into Solr - And now our Solr knows citation analysis!- We created MontySolr extension - Robust, tested (will be used by INSPIRE) - Works for any Python application (eg. Django) - And for any C/C++ app that Python understands! - Free software license- Try it out! Help us make it better! - https://github.com/romanchyla/montysolr 52
    78. 78. Questions?- MontySolr - https://github.com/romanchyla/montysolr- Roman Chyla - Fellow, CERN Scientific Information Service - roman.chyla@cern.ch - @rchyla - https://svnweb.cern.ch/trac/rcarepo
    79. 79. Additional information 54
    80. 80. Links- Invenio platform - http://invenio-software.org/- INSPIRE Digital library - http://inspirebeta.net/- Diagrams of JCC and JEPP - Andreas Schreiber : Mixing Java and Python - http://www.slideshare.net/onyame/mixing-python-and- java- On Jython C Extension API - http://stackoverflow.com/questions/3097466/using- numpy-and-cpython-with-jython- Demo of a running service: - http://insdev01.cern.ch 55
    81. 81. #1 - How to embed Solr (standard)- solr.client.solrj.embedded.EmbeddedSolrServer 56
    82. 82. #2 - How to embed Solr (simplified)- solr.servlet.DirectSolrConnection- like previous, but simpler- all the queries are sent as strings, everything is just a string- very flexible and probably suitable for quick integration 57
    83. 83. #2 - How to embed Solr (simplified)- solr.servlet.DirectSolrConnection- like previous, but simpler- all the queries are sent as strings, everything is just a string- very flexible and probably suitable for quick integration 57
    84. 84. #3 - Example of a Solr custom handler 58
    85. 85. #4 - Example Python handler 59
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×