Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

7,358 views

Published on

Published in: Technology

[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

  1. 1. How to Integrate Python into a Scala Stack to Build Realtime Predictive Models Jerry Chou Lead Research Engineer jerry@fliptop.com
  2. 2. Stories Beforehand • Product pivoted • Data search => data analysis • Build on top of existing infrastructure (hosted on AWS & Azure) • Need tools for scientific computation • Mahout (Java) • Weka (Java) • Scikit-learn (Python) 2
  3. 3. Agenda • Requirements and high level concepts • Tools for calling Python from Scala • Decision making 3
  4. 4. High Level Concept - Before 4 Existing business logic (in both Scala & Java) Modeling Logic (in Python) Node 1 Modeling Logic (in Python) Node 2 … Modeling Logic (in Python) Node N
  5. 5. Requirements • APIs to exploit Python’s modeling power • Train, predict, model info query, etc • Scalability • On demand Python serving nodes 5
  6. 6. Tools for Scala-Python Integration • Reimplementation of Python • Jython (JPython) • Communication through JNI • Jepp • Communication through IPC • Thrift • Communication through REST API calls • Bottle 6
  7. 7. Jython (JPython) • Re-Implementation of Python in Java • Compiles to Java bytecode • either on demand or statically. • Can import and use any Java class 7
  8. 8. Jython 8 JVM Scala Code Python Code Jython
  9. 9. Jython • Lacks support for lots of extensions for scientific computing • Numpy, Scipy, etc. • JyNI to the rescue? • Not ready yet for even Numpy 9
  10. 10. 10 糟透了 全部重做
  11. 11. Communication through JNI •Jepp (Java Embedded Python) • Embeds CPython in Java • Runs Python code in CPython • Leverages both JNI and Python/C API for integration 11
  12. 12. Python Interpreter Jepp 12 JVM Scala Code Python Code JNI Jepp
  13. 13. Jepp 13 object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b) } object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b) } def python_add(a, b): return a + b def python_add(a, b): return a + b python_util.py TestJepp.scala
  14. 14. Communication through IPC • Thrift •Developed & open sourced by Facebook •IDL-based (Interface Definition Language) •Generates server/client code in specified languages •Take care of protocol and transport layer details •Comes with generators for Java, Python, C++, etc. • No Scala generator • Scrooge to the rescue! 14
  15. 15. Thrift – IDL 15 namespace java python_service_test namespace py python_service_test service PythonAddService { i32 pythonAdd (1:i32 a, 2:i32 b), } namespace java python_service_test namespace py python_service_test service PythonAddService { i32 pythonAdd (1:i32 a, 2:i32 b), } TestThrift.thrift $ thrift --gen java --gen py TestThrift.thrift$ thrift --gen java --gen py TestThrift.thrift
  16. 16. Thrift – Python Server class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b handler = ExampleHandler() processor = Example.Processor(handler) transport = TSocket.TServerSocket(9090) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve() class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b handler = ExampleHandler() processor = Example.Processor(handler) transport = TSocket.TServerSocket(9090) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve() PythonAddServer.py class Iface: def pythonAdd(self, a, b): pass class Iface: def pythonAdd(self, a, b): pass PythonAddService.py
  17. 17. Thrift – Scala Client 17 object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol) transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close() } object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol) transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close() } PythonAddClient.scala
  18. 18. Thrift 18 JVM Scala Code Thrift Python Code Python Interpreter Thrift Python Code Python Interpreter Thrift … Auto Balancing 、 Built-in Encryption
  19. 19. 19 哦 ~ 還不錯
  20. 20. REST API Architecture 20 …Bottle Python Code Bottle Python Code Bottle Python Code JVM Scala Code Auto Balancer? Encoding?
  21. 21. Thrift v.s. REST Thrift RES T Load Balancer ✔ Encode / Decode ✔ Low Learning Curve ✔ No Dependency ✔ Does it matter? No (AWS & Azure) No (We’re already doing it) Maybe Yes
  22. 22. Fliptop’s Architecture 22 Load Balancer …Bottle Python Code Bottle Python Code Bottle Python Code JVM Scala Code 5 Python servers ~4,500 requests/sec
  23. 23. Summary • Jython • (✓) Tight integration with Scala/Java • (✗) Lack support for C extensions (JyNI might help in the future) • Jepp • (✓) Access high quality Python extensions with CPython speed • (✗) Two runtime environments • Thrift, REST • (✓) Language-independent development • (✗) Bigger communication overhead 23
  24. 24. Thank You 24
  25. 25. Other tools • JyNI (Jython Native Interface) • A compatibility layer to enable Jython to use native CPython extensions like NumPy or SciPy • Binary compatible with existing builds • Cython • A subset of Python implementation written in Python that translates Python codes to C • JNA (Java Native Access) • JNI-based wrapper providing Java programs access to native shared libraries • JPE (Java-Python Extension) • JNI-based wrapper integrating Java and standard Python • last updated at: 2013-03-22 25

×