Simon Chansimon@prediction.ioData Science London - April 24, 2013Big Data Week
Machine Learning is....computers learning to predictfrom data
puttingMachine Learninginto practice
challenge #1Scalability
Big Data BottlenecksMachine Learning Processing
PredictionIO has ahorizontally scalablearchitecture
Async SDKClient client = new Client(appkey);// Adding user behaviorsreq = client.getUserRateItemRequestBuilder(uid, iid, r...
PlayFramework‣ stateless - no server session‣ non-blocking web request
Play: A Non-blocking Exampledef index = Action {val futureInt = scala.concurrent.Future { slowDataProcess() }Async {future...
MongoDB‣ Read scaling: Replica Sets‣ Write scaling: Sharding‣ Indexes (e.g. geospatial){ geoSearch : "places", near : [33,...
HadoopHadoop&Cascading&(Java)&Scalding&(Scala)&
MapReduce- Native Javapublic class WordCount {public static class Map extends Mapper<LongWritable, Text, Text, IntWritable...
MapReduce- Scaldingclass ScaldingTestJob(args: Args) extends Job(args) {Tsv(args(0), text).flatMap(text -> word) { text : ...
Sample Code
### Sample PredictionIO Python SDK Codeclient = predictionio.Client(appkey="<your app key>")# Add Dataclient.create_user(u...
GettingInvolved!- @PredictionIO- prediction.io - Newsletter- github.com/predictionio
Q&AQ: Selecting the right features is a big problem. Can PredictionIO solve this problem?A: Not at this moment.That’s why ...
PredictionIO - Scalable Machine Learning Architecture
Upcoming SlideShare
Loading in...5
×

PredictionIO - Scalable Machine Learning Architecture

1,048

Published on

PredictionIO's presentation slides for Data Science London on April 24, 2013 during the Big Data Week.

Published in: Technology, Education
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,048
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
37
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Transcript of "PredictionIO - Scalable Machine Learning Architecture"

  1. 1. Simon Chansimon@prediction.ioData Science London - April 24, 2013Big Data Week
  2. 2. Machine Learning is....computers learning to predictfrom data
  3. 3. puttingMachine Learninginto practice
  4. 4. challenge #1Scalability
  5. 5. Big Data BottlenecksMachine Learning Processing
  6. 6. PredictionIO has ahorizontally scalablearchitecture
  7. 7. Async SDKClient client = new Client(appkey);// Adding user behaviorsreq = client.getUserRateItemRequestBuilder(uid, iid, rate);client.userRateItemAsFuture(req);
  8. 8. PlayFramework‣ stateless - no server session‣ non-blocking web request
  9. 9. Play: A Non-blocking Exampledef index = Action {val futureInt = scala.concurrent.Future { slowDataProcess() }Async {futureInt.map(i => Ok(views.html.result.render(i)))}}
  10. 10. MongoDB‣ Read scaling: Replica Sets‣ Write scaling: Sharding‣ Indexes (e.g. geospatial){ geoSearch : "places", near : [33, 33],maxDistance : 6, search : { uid : "user1" } }
  11. 11. HadoopHadoop&Cascading&(Java)&Scalding&(Scala)&
  12. 12. MapReduce- Native Javapublic class WordCount {public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word = new Text();public void map(LongWritable key, Text value, Context context) throws .....{String line = value.toString();StringTokenizer tokenizer = new StringTokenizer(line);while (tokenizer.hasMoreTokens()) {word.set(tokenizer.nextToken());context.write(word, one);}}}public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {int sum = 0;for (IntWritable val : values) { sum += val.get(); }context.write(key, new IntWritable(sum));}}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = new Job(conf, "wordcount");job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);job.setMapperClass(Map.class);job.setReducerClass(Reduce.class);job.setInputFormatClass(TextInputFormat.class);job.setOutputFormatClass(TextOutputFormat.class);FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));job.waitForCompletion(true);}}
  13. 13. MapReduce- Scaldingclass ScaldingTestJob(args: Args) extends Job(args) {Tsv(args(0), text).flatMap(text -> word) { text : String => text.split("s+") }.groupBy(word) { _.size }.write(Tsv(args(1))}
  14. 14. Sample Code
  15. 15. ### Sample PredictionIO Python SDK Codeclient = predictionio.Client(appkey="<your app key>")# Add Dataclient.create_user(uid=”user123”)client.create_item(iid=”itemXYZ”, itypes=(1,))client.user_view_item(uid=”user123”, iid=”itemXYZ”)# Get Predictionrec = client.get_itemrec(engine="<engine name>", uid=”user123”, n=5)
  16. 16. GettingInvolved!- @PredictionIO- prediction.io - Newsletter- github.com/predictionio
  17. 17. Q&AQ: Selecting the right features is a big problem. Can PredictionIO solve this problem?A: Not at this moment.That’s why we focus on collaborative filtering algorithms right nowwhich don’t require the use of features.And we believe that the involvement of datascientists is needed for many specific problems. PredictionIO is positioned as a tool tomake their work easier, but not as a replacement.Q: How’s PredictionIO different from Weka?A:Weka, like Mahout, is a ML algorithm library.You can see PredictionIO as a layer on topof it, which helps you to implement algorithm into production environment by providing acomplete infrastructure.Q: How do you compare PredictionIO with RapidMiner?A: RapidMiner is a great product to define data engineering workflow visually.PredictionIO focuses on a different problem -- i.e. deploying ML solution into productionenvironment.Q: How does the algorithm evaluation metrics work in PredictionIO?A: At this moment, you can evaluate algorithms by some offline metrics, such as MeanAverage Precision, based on your existing data.Q:What’s the business model?A: We focus on making PredictionIO a useful open source product at this moment.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×