Your SlideShare is downloading. ×
0
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

1,794

Published on

Published in: Technology
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
1,794
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
17
Comments
1
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. How to Integrate Python into a Scala Stack to Build Realtime Predictive Models Jerry Chou Lead Research Engineer jerry@fliptop.com
  • 2. Stories Beforehand • Product pivoted • Data search => data analysis • Build on top of existing infrastructure (hosted on AWS & Azure) • Need tools for scientific computation • Mahout (Java) • Weka (Java) • Scikit-learn (Python) 2
  • 3. Agenda • Requirements and high level concepts • Tools for calling Python from Scala • Decision making 3
  • 4. High Level Concept - Before 4 Existing business logic (in both Scala & Java) Modeling Logic (in Python) Node 1 Modeling Logic (in Python) Node 2 … Modeling Logic (in Python) Node N
  • 5. Requirements • APIs to exploit Python’s modeling power • Train, predict, model info query, etc • Scalability • On demand Python serving nodes 5
  • 6. Tools for Scala-Python Integration • Reimplementation of Python • Jython (JPython) • Communication through JNI • Jepp • Communication through IPC • Thrift • Communication through REST API calls • Bottle 6
  • 7. Jython (JPython) • Re-Implementation of Python in Java • Compiles to Java bytecode • either on demand or statically. • Can import and use any Java class 7
  • 8. Jython 8 JVM Scala Code Python Code Jython
  • 9. Jython • Lacks support for lots of extensions for scientific computing • Numpy, Scipy, etc. • JyNI to the rescue? • Not ready yet for even Numpy 9
  • 10. 10 糟透了 全部重做
  • 11. Communication through JNI •Jepp (Java Embedded Python) • Embeds CPython in Java • Runs Python code in CPython • Leverages both JNI and Python/C API for integration 11
  • 12. Python Interpreter Jepp 12 JVM Scala Code Python Code JNI Jepp
  • 13. Jepp 13 object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b) } object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b) } def python_add(a, b): return a + b def python_add(a, b): return a + b python_util.py TestJepp.scala
  • 14. Communication through IPC • Thrift •Developed & open sourced by Facebook •IDL-based (Interface Definition Language) •Generates server/client code in specified languages •Take care of protocol and transport layer details •Comes with generators for Java, Python, C++, etc. • No Scala generator • Scrooge to the rescue! 14
  • 15. Thrift – IDL 15 namespace java python_service_test namespace py python_service_test service PythonAddService { i32 pythonAdd (1:i32 a, 2:i32 b), } namespace java python_service_test namespace py python_service_test service PythonAddService { i32 pythonAdd (1:i32 a, 2:i32 b), } TestThrift.thrift $ thrift --gen java --gen py TestThrift.thrift$ thrift --gen java --gen py TestThrift.thrift
  • 16. Thrift – Python Server class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b handler = ExampleHandler() processor = Example.Processor(handler) transport = TSocket.TServerSocket(9090) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve() class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b handler = ExampleHandler() processor = Example.Processor(handler) transport = TSocket.TServerSocket(9090) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve() PythonAddServer.py class Iface: def pythonAdd(self, a, b): pass class Iface: def pythonAdd(self, a, b): pass PythonAddService.py
  • 17. Thrift – Scala Client 17 object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol) transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close() } object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol) transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close() } PythonAddClient.scala
  • 18. Thrift 18 JVM Scala Code Thrift Python Code Python Interpreter Thrift Python Code Python Interpreter Thrift … Auto Balancing 、 Built-in Encryption
  • 19. 19 哦 ~ 還不錯
  • 20. REST API Architecture 20 …Bottle Python Code Bottle Python Code Bottle Python Code JVM Scala Code Auto Balancer? Encoding?
  • 21. Thrift v.s. REST Thrift RES T Load Balancer ✔ Encode / Decode ✔ Low Learning Curve ✔ No Dependency ✔ Does it matter? No (AWS & Azure) No (We’re already doing it) Maybe Yes
  • 22. Fliptop’s Architecture 22 Load Balancer …Bottle Python Code Bottle Python Code Bottle Python Code JVM Scala Code 5 Python servers ~4,500 requests/sec
  • 23. Summary • Jython • (✓) Tight integration with Scala/Java • (✗) Lack support for C extensions (JyNI might help in the future) • Jepp • (✓) Access high quality Python extensions with CPython speed • (✗) Two runtime environments • Thrift, REST • (✓) Language-independent development • (✗) Bigger communication overhead 23
  • 24. Thank You 24
  • 25. Other tools • JyNI (Jython Native Interface) • A compatibility layer to enable Jython to use native CPython extensions like NumPy or SciPy • Binary compatible with existing builds • Cython • A subset of Python implementation written in Python that translates Python codes to C • JNA (Java Native Access) • JNI-based wrapper providing Java programs access to native shared libraries • JPE (Java-Python Extension) • JNI-based wrapper integrating Java and standard Python • last updated at: 2013-03-22 25

×