Cpython embedded in solr - By Roman Chyla

1,757 views

Published on

See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,757
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
29
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Cpython embedded in solr - By Roman Chyla

  1. 1. MontySolr: Embedding CPython in Solr Roman Chyla, CERN roman.chyla@cern.ch, May 26, 2011Thursday, May 26, 2011
  2. 2. Why should I care? - Our challenge is to connect Python and Java - Without compromises - We created MontySolr extension - Robust, tested (will be used by our system) - But works for any Python application (eg. Django) - And for any C/C++ app that Python understands! - Open source (GPL v2) - Try it out! - https://github.com/romanchyla/montysolr 2Thursday, May 26, 2011
  3. 3. Outline ‣ Context - The Challenge - Key components - Available technologies - Our approach - Problems solved - Evaluation - Wrap-up 3Thursday, May 26, 2011
  4. 4. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4Thursday, May 26, 2011
  5. 5. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4Thursday, May 26, 2011
  6. 6. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4Thursday, May 26, 2011
  7. 7. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4Thursday, May 26, 2011
  8. 8. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4Thursday, May 26, 2011
  9. 9. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4Thursday, May 26, 2011
  10. 10. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4Thursday, May 26, 2011
  11. 11. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4Thursday, May 26, 2011
  12. 12. SPIRES - Stanford Linear Accelerator Center - SLAC - High-Energy Physics Literature Database - Started December 1991 - The first web outside Europe/CERN - The first database on web 5Thursday, May 26, 2011
  13. 13. SPIRES - Stanford Linear Accelerator Center - SLAC - High-Energy Physics Literature Database - Started December 1991 - The first web outside Europe/CERN - The first database on web 5Thursday, May 26, 2011
  14. 14. 6Thursday, May 26, 2011
  15. 15. 7Thursday, May 26, 2011
  16. 16. Invenio - Integrated digital library software behind INSPIRE - Used by very large institutional repositories - http://repositories.webometrics.info/toprep_inst.asp - Customizable virtual collections - Flexible management of metadata - 3 000 authors per article - Powerful search engine - Incl. citation map analysis - Written in Python (since 2001) - 290 000 lines of code 8Thursday, May 26, 2011
  17. 17. Outline - Context ‣ The Challenge - Key components - Available technologies - Our approach - Problems solved - Evaluation - Wrap-up 9Thursday, May 26, 2011
  18. 18. The Challenge - HEP scientific community - Searches metadata oriented - However fulltexts are changing the situation - And we want to provide even better service - Bigger volumes of data - NLP processing - Semantic search 10Thursday, May 26, 2011
  19. 19. The Challenge Invenio 11Thursday, May 26, 2011
  20. 20. The Challenge Query: supersymmetry AND author:ellis Invenio 11Thursday, May 26, 2011
  21. 21. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 11Thursday, May 26, 2011
  22. 22. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11Thursday, May 26, 2011
  23. 23. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11Thursday, May 26, 2011
  24. 24. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11Thursday, May 26, 2011
  25. 25. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11Thursday, May 26, 2011
  26. 26. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9.... 11Thursday, May 26, 2011
  27. 27. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9.... 1. only IDs, no score = no ranking 11Thursday, May 26, 2011
  28. 28. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9.... 2. score merging 1. only IDs, difficult (if no score available) = no ranking 11Thursday, May 26, 2011
  29. 29. The Challenge 3. push IDs ? Query: supersymmetry AND author:ellis (eg._faceting) Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9.... 2. score merging 1. only IDs, difficult (if no score available) = no ranking 11Thursday, May 26, 2011
  30. 30. What is the “best” solution? - We love Python... - ...and our applications are written in Python... - But what if Solr is the master search engine? - Merge results inside Solr? - Typical size: 1-10 mil. IDs - Expected latency: 1-2 s. - What we want to achieve: - Fast transfer of hits from Invenio to Solr - Leverage the power of both (no compromises) - Developer-friendly integration, simplicity 12Thursday, May 26, 2011
  31. 31. Outline - Context - The Challenge ‣ Key components - Available technologies - Our approach - Evaluation - Demonstration - Wrap-up 13Thursday, May 26, 2011
  32. 32. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14Thursday, May 26, 2011
  33. 33. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14Thursday, May 26, 2011
  34. 34. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14Thursday, May 26, 2011
  35. 35. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14Thursday, May 26, 2011
  36. 36. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14Thursday, May 26, 2011
  37. 37. To use Solr in non-Java app - Solr is already usable via HTTP requests, but we need something else here... - Remote objects/calls? - Pyro, execnet, CORBA, SOAP... - or simply pipes? - Access Python from Java? - Jython - JEPP - Access Java from Python? - JPype - JCC 15Thursday, May 26, 2011
  38. 38. Jython? - Implementation of Python in 100% Java - Both Java and Python code - Truly multithreaded - C modules will not work - but see http://bit.ly/iTRYbb - Slower than CPython 16Thursday, May 26, 2011
  39. 39. Jython? - Implementation of Python in 100% Java - Both Java and Python code - Truly multithreaded - C modules will not work - but see http://bit.ly/iTRYbb - Slower than CPython 17Thursday, May 26, 2011
  40. 40. Jython? - Implementation of Python in 100% Java - Both Java and Python code - Truly multithreaded - C modules will not work - but see http://bit.ly/iTRYbb - Slower than CPython 17Thursday, May 26, 2011
  41. 41. JEPP - Java Embedded Python - Python code runs inside Python interpreter - Embeds CPython interpreter via Java Native Interface (JNI) in Java - http://jepp.sourceforge.net/ - recently updated (27-Jan) - but JCC is more active 18Thursday, May 26, 2011
  42. 42. JEPP - Java Embedded Python 19Thursday, May 26, 2011
  43. 43. JCC - Embeds JVM in Python - C++ code generator - C++ object interface wraps a Java library - C++ wrappers conform to Pythons C type system - result: complete Python extension module 20Thursday, May 26, 2011
  44. 44. JCC 21Thursday, May 26, 2011
  45. 45. JCC 21Thursday, May 26, 2011
  46. 46. JCC 21Thursday, May 26, 2011
  47. 47. To use Solr in non-Java app Jython JCC JEPP Python ✓ ✓ CModules Speed ✓ ? No code ✓ ✓ changes Access from ✓ ✓ Python Access from ✓ ... ✓ Java 22Thursday, May 26, 2011
  48. 48. The first try Invenio Solr JCC 23Thursday, May 26, 2011
  49. 49. Devil is in details... 24Thursday, May 26, 2011
  50. 50. GIL - Global Interpreter Lock Unfortunately Python webapp is not like Java... 25Thursday, May 26, 2011
  51. 51. GIL - Global Interpreter Lock We can have 200 threads, but only 4 will run at time... 26Thursday, May 26, 2011
  52. 52. GIL - Global Interpreter Lock 27Thursday, May 26, 2011
  53. 53. Fortunately solution exists - JCC can embed Python inside Java - Special thanks to Andi Vajda! (JCC creator) - We write ‘empty’ classes in Java ... - ... and implement them in Python Python /w Java inside Java /w Python inside 28Thursday, May 26, 2011
  54. 54. The second try Solr /w Invenio Invenio (backend) frontend XML JCC 29Thursday, May 26, 2011
  55. 55. Implementing the bridge - Special Java class - With method pythonExtension() - Native method pythonDecRef() - JCC provides its implementation - And number of other native methods - These will be implemented using Python - Like writing JNI Java/C code but without compilation... 30Thursday, May 26, 2011
  56. 56. MontySolr extension - JCC has great potential, but also added complexity... - So the MontySolr project was born - Modules must be built in shared mode - JCC dynamic library loaded and started from the main thread - Simple mechanism of the Python bridge and message - Configurable handlers on the Python side - Secured dereferencing of the native objects - Threading on the Java side - Multiprocessing on the Python side - Easy ant targets (compilation) ... 31Thursday, May 26, 2011
  57. 57. Hello World - Java part public class MontySolrBridge extends BasicBridge implements PythonBridge { private long pythonObject; public void pythonExtension(long pythonObject) { this.pythonObject = pythonObject; } public long pythonExtension() { return this.pythonObject; } public void finalize() throws Throwable { pythonDecRef(); } public native void pythonDecRef(); public void sendMessage(PythonMessage message) { PythonVM vm = PythonVM.get(); vm.acquireThreadState(); receive_message(message); vm.releaseThreadState(); } public native void receive_message(PythonMessage message); } 32Thursday, May 26, 2011
  58. 58. Hello World - Python part from montysolr import MontySolrBridge class SimpleBridge(MontySolrBridge): def __init__(self): super(SimpleBridge, self).__init__() def receive_message(self, message): query = message.getParam(‘query’) message.setResults(‘Hello world!’) print ‘Python received from Java:’, query 33Thursday, May 26, 2011
  59. 59. Example - running MontySolr - Java side - JRE (32/64 bit) - Standard Solr/Lucene jars - JCC dynamic library - Python side - Python interpreter (32/64 bit) - 4 Python modules (jcc, solr, lucene, montysolr) - In the main thread - First we load JCC - Then start Python interpreter ... - ... load Python handlers 34Thursday, May 26, 2011
  60. 60. Solr as search service Solr /w Invenio Invenio (backend) frontend XML JCC 35Thursday, May 26, 2011
  61. 61. Example Solr MyCustom Handler 36Thursday, May 26, 2011
  62. 62. Example refersto:author:ellis Solr MyCustom Handler 37Thursday, May 26, 2011
  63. 63. Example - Solr custom handler MontySolrVM.INSTANCE.sendMessage(message); PythonMessage msg = MontySolrVM.INSTANCE .createMessage("perform_search") .setSender("Invenio") .setParam("query","refersto:author:ellis"); MontySolrVM.INSTANCE.sendMessage(msg); Object result = msg.getResults(); if (result != null) { int[] hits = (int[]) message.getResults(); } 38Thursday, May 26, 2011
  64. 64. Example - JNI connection refersto:author:ellis Solr MyCustom Python Handler Bridge 39Thursday, May 26, 2011
  65. 65. Example - JNI connection refersto:author:ellis Solr MyCustom Python Invenio Handler Bridge wrappers 40Thursday, May 26, 2011
  66. 66. Example - Python side # handler is made ‘visible’ at startup SolrpieTarget(Invenio:perform_search, perform_search) # search time - called from Java def perform_search(message): query = message.getParam(“query”) hits = call_real_search(query) # cast Python list into Java array message.setResults(JArray_ints(hits)) 41Thursday, May 26, 2011
  67. 67. Example refersto:author:ellis Solr Invenio Invenio MyCustom Python Invenio Handler Bridge wrappers Invenio Invenio 42Thursday, May 26, 2011
  68. 68. Example - Java side again MontySolrVM.INSTANCE.sendMessage(message); PythonMessage msg = MontySolrVM.INSTANCE .createMessage("perform_search") .setSender("Invenio") .setParam("query","refersto:author:ellis"); MontySolrVM.INSTANCE.sendMessage(msg); Object result = msg.getResults(); if (result != null) { int[] hits = (int[]) message.getResults(); } 43Thursday, May 26, 2011
  69. 69. Solr as search service Solr /w Invenio Apache (backend) webserver XML Invenio Invenio JCC 44Thursday, May 26, 2011
  70. 70. Outline - Context - The Challenge - Key components - Available technologies - Our approach - Problems solved ‣ Evaluation - Wrap-up 45Thursday, May 26, 2011
  71. 71. Memory and garbage collection 46Thursday, May 26, 2011
  72. 72. Comparing speed and load... 47Thursday, May 26, 2011
  73. 73. The effect of cache 48Thursday, May 26, 2011
  74. 74. Robust? - Extensive siege tests show very good performance and stability under high load - 100-200 users, complex searches - 50 concurrent users, citation analysis - JCC incurs small overhead - We detected no memory leaks - The same as dbpedia.org - But watch out for errors in C - An error in C module brings down the whole JVM - (errors in pure Python module can be handled) 49Thursday, May 26, 2011
  75. 75. Easy to develop/maintain? - Added complexity - Java in the toolbox - Need to compile C++ extensions - Python/OS version dependencies - For this we get - Easy integration with Invenio - The best of two applications - A lot of features for free - And we can control Solr from Python! 50Thursday, May 26, 2011
  76. 76. Outline - Context - The Challenge - Key components - Available technologies - Our approach - Problems solved - Evaluation ‣ Wrap-up 51Thursday, May 26, 2011
  77. 77. Wrap-up - Our challenge was to connect two different languages/systems - And we wanted to get the best of the two... - So we had to plug Python into Solr - And now our Solr knows citation analysis! - We created MontySolr extension - Robust, tested (will be used by INSPIRE) - Works for any Python application (eg. Django) - And for any C/C++ app that Python understands! - Free software license - Try it out! Help us make it better! - https://github.com/romanchyla/montysolr 52Thursday, May 26, 2011
  78. 78. Questions? - MontySolr - https://github.com/romanchyla/montysolr - Roman Chyla - Fellow, CERN Scientific Information Service - roman.chyla@cern.ch - @rchyla - https://svnweb.cern.ch/trac/rcarepoThursday, May 26, 2011
  79. 79. Additional information 54Thursday, May 26, 2011
  80. 80. Links - Invenio platform - http://invenio-software.org/ - INSPIRE Digital library - http://inspirebeta.net/ - Diagrams of JCC and JEPP - Andreas Schreiber : Mixing Java and Python - http://www.slideshare.net/onyame/mixing-python-and- java - On Jython C Extension API - http://stackoverflow.com/questions/3097466/using- numpy-and-cpython-with-jython - Demo of a running service: - http://insdev01.cern.ch 55Thursday, May 26, 2011
  81. 81. #1 - How to embed Solr (standard) - solr.client.solrj.embedded.EmbeddedSolrServer 56Thursday, May 26, 2011
  82. 82. #2 - How to embed Solr (simplified) - solr.servlet.DirectSolrConnection - like previous, but simpler - all the queries are sent as strings, everything is just a string - very flexible and probably suitable for quick integration 57Thursday, May 26, 2011
  83. 83. #2 - How to embed Solr (simplified) - solr.servlet.DirectSolrConnection - like previous, but simpler - all the queries are sent as strings, everything is just a string - very flexible and probably suitable for quick integration 57Thursday, May 26, 2011
  84. 84. #3 - Example of a Solr custom handler 58Thursday, May 26, 2011
  85. 85. #4 - Example Python handler 59Thursday, May 26, 2011

×