How to Integrate Python into a
Scala Stack to Build
Realtime Predictive Models
Jerry Chou
Lead Research Engineer
jerry@fli...
Stories Beforehand
• Product pivoted
• Data search => data analysis
• Build on top of existing infrastructure (hosted on A...
Agenda
• Requirements and high level concepts
• Tools for calling Python from Scala
• Decision making
3
High Level Concept - Before
4
Existing business logic
(in both Scala & Java)
Modeling Logic
(in Python)
Node 1
Modeling Lo...
Requirements
• APIs to exploit Python’s modeling power
• Train, predict, model info query, etc
• Scalability
• On demand P...
Tools for Scala-Python Integration
• Reimplementation of Python
• Jython (JPython)
• Communication through JNI
• Jepp
• Co...
Jython (JPython)
• Re-Implementation of Python in Java
• Compiles to Java bytecode
• either on demand or statically.
• Can...
Jython
8
JVM
Scala Code
Python Code
Jython
Jython
• Lacks support for lots of extensions for
scientific computing
• Numpy, Scipy, etc.
• JyNI to the rescue?
• Not re...
10
糟透了 全部重做
Communication through JNI
•Jepp (Java Embedded Python)
• Embeds CPython in Java
• Runs Python code in CPython
• Leverages ...
Python Interpreter
Jepp
12
JVM
Scala Code
Python Code
JNI Jepp
Jepp
13
object TestJepp extends App {
val jep = new Jep()
jep.runScript("python_util.py")
val a = (2).asInstanceOf[AnyRef]...
Communication through IPC
• Thrift
•Developed & open sourced by Facebook
•IDL-based (Interface Definition Language)
•Gener...
Thrift – IDL
15
namespace java python_service_test
namespace py python_service_test
service PythonAddService
{
i32 pythonA...
Thrift – Python Server
class ExampleHandler(python_service_test.PythonAddService.Iface):
def pythonAdd(self, a, b):
return...
Thrift – Scala Client
17
object PythonAddClient extends App {
val transport: TTransport = new TSocket("localhost", 9090)
v...
Thrift
18
JVM Scala Code
Thrift
Python Code
Python Interpreter
Thrift
Python Code
Python Interpreter
Thrift
…
Auto Balanci...
19
哦 ~ 還不錯
REST API Architecture
20
…Bottle
Python Code
Bottle
Python Code
Bottle
Python Code
JVM
Scala Code
Auto Balancer?
Encoding?
Thrift v.s. REST
Thrift RES
T
Load Balancer
✔
Encode / Decode
✔
Low Learning Curve
✔
No Dependency
✔
Does it matter?
No
(A...
Fliptop’s Architecture
22
Load Balancer
…Bottle
Python Code
Bottle
Python Code
Bottle
Python Code
JVM Scala Code
5 Python ...
Summary
• Jython
• (✓) Tight integration with Scala/Java
• (✗) Lack support for C extensions (JyNI might help in the futur...
Thank You
24
Other tools
• JyNI (Jython Native Interface)
• A compatibility layer to enable Jython to use native CPython
extensions lik...
Upcoming SlideShare
Loading in …5
×

[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

6,180 views

Published on

Published in: Technology
1 Comment
5 Likes
Statistics
Notes
No Downloads
Views
Total views
6,180
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
62
Comments
1
Likes
5
Embeds 0
No embeds

No notes for slide

[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

  1. 1. How to Integrate Python into a Scala Stack to Build Realtime Predictive Models Jerry Chou Lead Research Engineer jerry@fliptop.com
  2. 2. Stories Beforehand • Product pivoted • Data search => data analysis • Build on top of existing infrastructure (hosted on AWS & Azure) • Need tools for scientific computation • Mahout (Java) • Weka (Java) • Scikit-learn (Python) 2
  3. 3. Agenda • Requirements and high level concepts • Tools for calling Python from Scala • Decision making 3
  4. 4. High Level Concept - Before 4 Existing business logic (in both Scala & Java) Modeling Logic (in Python) Node 1 Modeling Logic (in Python) Node 2 … Modeling Logic (in Python) Node N
  5. 5. Requirements • APIs to exploit Python’s modeling power • Train, predict, model info query, etc • Scalability • On demand Python serving nodes 5
  6. 6. Tools for Scala-Python Integration • Reimplementation of Python • Jython (JPython) • Communication through JNI • Jepp • Communication through IPC • Thrift • Communication through REST API calls • Bottle 6
  7. 7. Jython (JPython) • Re-Implementation of Python in Java • Compiles to Java bytecode • either on demand or statically. • Can import and use any Java class 7
  8. 8. Jython 8 JVM Scala Code Python Code Jython
  9. 9. Jython • Lacks support for lots of extensions for scientific computing • Numpy, Scipy, etc. • JyNI to the rescue? • Not ready yet for even Numpy 9
  10. 10. 10 糟透了 全部重做
  11. 11. Communication through JNI •Jepp (Java Embedded Python) • Embeds CPython in Java • Runs Python code in CPython • Leverages both JNI and Python/C API for integration 11
  12. 12. Python Interpreter Jepp 12 JVM Scala Code Python Code JNI Jepp
  13. 13. Jepp 13 object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b) } object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b) } def python_add(a, b): return a + b def python_add(a, b): return a + b python_util.py TestJepp.scala
  14. 14. Communication through IPC • Thrift •Developed & open sourced by Facebook •IDL-based (Interface Definition Language) •Generates server/client code in specified languages •Take care of protocol and transport layer details •Comes with generators for Java, Python, C++, etc. • No Scala generator • Scrooge to the rescue! 14
  15. 15. Thrift – IDL 15 namespace java python_service_test namespace py python_service_test service PythonAddService { i32 pythonAdd (1:i32 a, 2:i32 b), } namespace java python_service_test namespace py python_service_test service PythonAddService { i32 pythonAdd (1:i32 a, 2:i32 b), } TestThrift.thrift $ thrift --gen java --gen py TestThrift.thrift$ thrift --gen java --gen py TestThrift.thrift
  16. 16. Thrift – Python Server class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b handler = ExampleHandler() processor = Example.Processor(handler) transport = TSocket.TServerSocket(9090) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve() class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b handler = ExampleHandler() processor = Example.Processor(handler) transport = TSocket.TServerSocket(9090) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve() PythonAddServer.py class Iface: def pythonAdd(self, a, b): pass class Iface: def pythonAdd(self, a, b): pass PythonAddService.py
  17. 17. Thrift – Scala Client 17 object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol) transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close() } object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol) transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close() } PythonAddClient.scala
  18. 18. Thrift 18 JVM Scala Code Thrift Python Code Python Interpreter Thrift Python Code Python Interpreter Thrift … Auto Balancing 、 Built-in Encryption
  19. 19. 19 哦 ~ 還不錯
  20. 20. REST API Architecture 20 …Bottle Python Code Bottle Python Code Bottle Python Code JVM Scala Code Auto Balancer? Encoding?
  21. 21. Thrift v.s. REST Thrift RES T Load Balancer ✔ Encode / Decode ✔ Low Learning Curve ✔ No Dependency ✔ Does it matter? No (AWS & Azure) No (We’re already doing it) Maybe Yes
  22. 22. Fliptop’s Architecture 22 Load Balancer …Bottle Python Code Bottle Python Code Bottle Python Code JVM Scala Code 5 Python servers ~4,500 requests/sec
  23. 23. Summary • Jython • (✓) Tight integration with Scala/Java • (✗) Lack support for C extensions (JyNI might help in the future) • Jepp • (✓) Access high quality Python extensions with CPython speed • (✗) Two runtime environments • Thrift, REST • (✓) Language-independent development • (✗) Bigger communication overhead 23
  24. 24. Thank You 24
  25. 25. Other tools • JyNI (Jython Native Interface) • A compatibility layer to enable Jython to use native CPython extensions like NumPy or SciPy • Binary compatible with existing builds • Cython • A subset of Python implementation written in Python that translates Python codes to C • JNA (Java Native Access) • JNI-based wrapper providing Java programs access to native shared libraries • JPE (Java-Python Extension) • JNI-based wrapper integrating Java and standard Python • last updated at: 2013-03-22 25

×