Scale out python realtime service using storm
Upcoming SlideShare
Loading in...5
×
 

Scale out python realtime service using storm

on

  • 5,544 views

Storm is a real-time distributed computation tool and provides distribute RPC service. ...

Storm is a real-time distributed computation tool and provides distribute RPC service.
In this slide, we'll learn how to exploit Storm to build an online realtime prediction by storm DRPC.
Storm DRPC provides the benefits of load balance and real-time response service.

Statistics

Views

Total Views
5,544
Views on SlideShare
5,536
Embed Views
8

Actions

Likes
11
Downloads
48
Comments
0

3 Embeds 8

http://www.docshut.com 3
http://www.linkedin.com 3
http://www.slashdocs.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Scale out python realtime service using storm Scale out python realtime service using storm Presentation Transcript

  • Scale out Python realtime service using Storm http://storm-project.net/ Jimmy Lai 2013/01/24 r97922028 [at] ntu.edu.tw
  • Outline• Setup a storm cluster• Storm DRPC• Example: Build a real-time SVM prediction service with Storm DRPC – Steps 1-5 – Live Demo Storm http://storm-project.net 2
  • Setup a storm cluster• https://github.com/nathanmarz/storm/wiki/Setting-up-a-Storm-cluster• Configs: – Zookeeper – Storm• Commands: – Zookeeper • bin/zkServer.sh start – Storm • storm numbus • storm supervisor • storm drpc • storm ui Storm http://storm-project.net 3 View slide
  • Storm DRPC• https://github.com/nathanmarz/storm/wiki/Distributed-RPC• DRPC daemon receives requests and distributes those requests to user-defined Bolt/Topology.• We follow the examples in https://github.com/nathanmarz/storm-starter to build a Python DRPC service.• The benefits provided by Storm: – Load balance and resource allocation – Real-time service – Fault tolerance Storm http://storm-project.net 4 View slide
  • Example: Build a real-time SVM prediction service with Storm DRPC• Goal: We have a trained SVM model, and plan to provide a real-time prediction service. – Steps: 1. Train the SVM model. 2. Build the Storm DRPC topology with Python Bolt. 3. Deploy the topology to storm. 4. Build the Storm DRPC Client. 5. Prediction on the fly.• Code repository: storm_demo directory in https://bitbucket.org/noahsark/slideshare Storm http://storm-project.net 5
  • Step 1. Train the SVM model.• Note: the following codes are in storm_demo dir.$ ./train_model.py• We use the 20 newsgroup data from sklearn to build a SVM classification model.• The output model is a pickle file (svm_model.pkl) in storm-starter/multilang/resources/ Storm http://storm-project.net 6
  • Step 2. Build the Storm DRPC topology with Python Bolt.• storm-starter dir comes from . It contains https://github.com/nathanmarz/storm-starter lots topology example, we’ll build our DRPC topology in storm-starter/src/jvm/storm/jimmy: SVMDRPCTopology.java• We build a DRPC Topology by LinearDRPCTopologyBuilder and write a Bolt by extends ShellBolt implements IRichBolt. After that we can write the Bolt in Python.• Note: the number 3 and 6 in program are adjustable parameters related to parallelism and number of worker. Storm http://storm-project.net 7
  • public class SVMDRPCTopology { public static class SVMBolt extends ShellBolt implements IRichBolt { public SVMBolt() { super("python", "svm_bolt.py"); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("id", "result")); } @Override public Map<String, Object> getComponentConfiguration() { return null; } } public static void main(String[] args) throws Exception { LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("svm"); builder.addBolt(new SVMBolt(), 3); Config conf = new Config(); conf.setNumWorkers(6); StormSubmitter.submitTopology("svm", conf, builder.createRemoteTopology()); }} Storm http://storm-project.net 8
  • Step 2. Build the Storm DRPC topology with Python Bolt.• We write svm_bolt.py in storm- starter/multilang/resources• Note that all files in this dir will be packed into a jar file, so the svm model file is also put in this dir.• Bolt in Python: – Extend storm.BasicBolt – Implement initialize() and process() – Dump exception message to file for debug. Storm http://storm-project.net 9
  • class SVMBolt(storm.BasicBolt): def initialize(self, stormconf, context): svm_bolt.py initialize your members here. try: self.model = pkl.load(open(svm_model.pkl, rb)) except: traceback.print_exc(file=open(/tmp/trace_svm_bolt.txt, a)) def process(self, tup): We serialize the input and output by json for convenience. try: data = array(json.loads(tup.values[1])) result = self.model.predict(data) storm.emit([tup.values[0], json.dumps(result.tolist())]) except: traceback.print_exc(file=open(/tmp/trace_svm_bolt.txt, a))if __name__ == __main__: try: SVMBolt().run() except: traceback.print_exc(file=open(/tmp/trace_svm_bolt.txt, a)) Storm http://storm-project.net 10
  • Step 3. Deploy the topology to storm.• Commands:/storm-starter $ mvn -f m2-pom.xml package – This will generate jar files in target dir./storm-starter $ storm jar target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jarstorm.jimmy.SVMDRPCTopology – Submit topology$ storm list – Check whether the topology is running Storm http://storm-project.net 11
  • Step 4. Build the Storm DRPC Client.• We’ll exploit the Python API generated by Thrift to connect to DRPC server. The required files are in storm dir, comes from https://github.com/nathanmarz/storm• For the background knowledge of Thrift, refer to http://thrift.apache.org/tutorial/• The client: (predict_model.py) 1. Construct connection 2. Call Service by execute(‘svm’, data_to_predict) Storm http://storm-project.net 12
  • class Client(DistributedRPC.Iface): def __init__(self, host=localhost, port=3772, timeout=6000): try: predict_model.py socket = TSocket.TSocket(host, port) socket.setTimeout(timeout) self.conn = TTransport.TFramedTransport(socket) self.client =DistributedRPC.Client(TBinaryProtocol.TBinaryProtocol(self.conn)) self.conn.open() except Thrift.TException, exc: print exc def close(self): self.conn.close() def execute(self, func, args): try: return self.client.execute(func, args) except Thrift.TException, exc: print exc.message() except DRPCExecutionException, exc: print exc Storm http://storm-project.net 13
  • Step 5. Prediction on the fly.$ ./predict_model.pydata prepared (Live Demo)data predicted precision recall f1-score support • We can run many 0 1 1.00 0.50 1.00 0.67 1.00 0.57 1 3 clients and get the 2 3 1.00 1.00 0.50 0.75 0.67 0.86 2 4 prediction results on 4 0.50 0.50 0.50 2 the fly. 5 1.00 0.50 0.67 2 6 7 1.00 0.50 1.00 1.00 1.00 0.67 4 1 • The clients can be 8 9 1.00 0.80 1.00 1.00 1.00 0.89 2 4 written in many 10 11 1.00 1.00 0.50 1.00 0.67 1.00 4 1 different languages with 13 14 1.00 1.00 0.50 1.00 0.67 1.00 2 1 Thrift. 15 1.00 1.00 1.00 2 16 0.33 0.33 0.33 3 17 0.33 1.00 0.50 1 18 0.00 0.00 0.00 1 19 0.00 0.00 0.00 0 Storm http://storm-project.net 14avg / total 0.81 0.72 0.74 40