• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Practical Machine Learning and Rails Part2
 

Practical Machine Learning and Rails Part2

on

  • 2,804 views

our machine learning talk, Ryan's part

our machine learning talk, Ryan's part

Part 1: http://www.slideshare.net/ryanstout/practical-machine-learning-and-rails-part1

Statistics

Views

Total Views
2,804
Views on SlideShare
1,368
Embed Views
1,436

Actions

Likes
3
Downloads
43
Comments
0

11 Embeds 1,436

http://blog.andrewcantino.com 1092
http://lanyrd.com 281
http://feeds.feedburner.com 28
http://localhost 25
http://localhost:4000 3
https://twitter.com 2
http://www.hanrss.com 1
https://si0.twimg.com 1
http://translate.googleusercontent.com 1
http://www.newsblur.com 1
https://twimg0-a.akamaihd.net 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • having an example makes it easier to understand the process\n
  • also could use movie/product review data\n
  • also could use movie/product review data\n
  • also could use movie/product review data\n
  • also could use movie/product review data\n
  • also could use movie/product review data\n
  • \n
  • bag of words - a way of generating features from text that only looks at which words occur in the text\n- doesn’t look at word order, syntax, grammar, punctuation, etc...\n
  • bag of words - a way of generating features from text that only looks at which words occur in the text\n- doesn’t look at word order, syntax, grammar, punctuation, etc...\n
  • bag of words - a way of generating features from text that only looks at which words occur in the text\n- doesn’t look at word order, syntax, grammar, punctuation, etc...\n
  • words in dictionary array are replaced with the count’s in the text\n\n
  • words in dictionary array are replaced with the count’s in the text\n\n
  • words in dictionary array are replaced with the count’s in the text\n\n
  • words in dictionary array are replaced with the count’s in the text\n\n
  • word vectors/labels\n
  • word vectors/labels\n
  • word vectors/labels\n
  • word vectors/labels\n
  • word vectors/labels\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • generated using RARFF\n
  • \n
  • \n
  • \n
  • \n
  • load the arff\nload the model - serialized java object\nload a dataset\n
  • create a sparse instance, set the dataset\nget distribution (predicted values for each class)\n
  • the cat ran out the door\n[the cat] [cat ran] [ran out]...\n
  • the cat ran out the door\n[the cat] [cat ran] [ran out]...\n
  • the cat ran out the door\n[the cat] [cat ran] [ran out]...\n
  • the cat ran out the door\n[the cat] [cat ran] [ran out]...\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • assume a max of three words\neach feature of three words, 0’s if less words\n
  • assume a max of three words\neach feature of three words, 0’s if less words\n
  • assume a max of three words\neach feature of three words, 0’s if less words\n
  • assume a max of three words\neach feature of three words, 0’s if less words\n
  • assume a max of three words\neach feature of three words, 0’s if less words\n
  • \n
  • clustering - similar documents, related terms\n
  • clustering - similar documents, related terms\n
  • clustering - similar documents, related terms\n
  • \n
  • vowpal - good for large datasets, contains different algorithms (matrix factorization, collab filtering, lda, etc..)\n
  • hopefully this helped you know the tools and techniques\nyou can teach yourself\nfeel free to contact us\n

Practical Machine Learning and Rails Part2 Practical Machine Learning and Rails Part2 Presentation Transcript

  • SENTIMENTCLASSIFICATION
  • TRAINING DATA:
  • TRAINING DATA:- tweets
  • TRAINING DATA:- tweets- positive/negative
  • TRAINING DATA:- tweets- positive/negative - use emoticons from twitter
  • TRAINING DATA:- tweets- positive/negative - use emoticons from twitter :-) or :-(
  • BUILDING TRAINING DATA: NEGATIVE is upset that he cant update his Facebook by texting it... and might cry as a result School today also. Blah! I couldnt bear to watch it. And I thought the UA loss was embarrassing I hate when I have to call and wake people up POSITIVE Just woke up. Having no school is the best feeling ever Im enjoying a beautiful morning here in Phoenix dropping molly off getting ice cream with Aaron
  • FEATURES:
  • FEATURES: BAG OF WORDS MODEL
  • FEATURES: BAG OF WORDS MODEL split the text into words, create a dictionary, and replace text with word counts
  • BAG OF WORDS
  • BAG OF WORDStweets:I ran fastBob ran farI ran to Bob
  • BAG OF WORDStweets:I ran fastBob ran farI ran to Bob dictionary = %w{I ran fast Bob far to}
  • BAG OF WORDStweets:I ran fastBob ran farI ran to Bob dictionary = %w{I ran fast Bob far to}
  • BAG OF WORDStweets: word vectors:I ran fast [1 1 1 0 0 0]Bob ran far [0 1 0 1 1 0]I ran to Bob [1 1 0 1 0 1] dictionary = %w{I ran fast Bob far to}
  • CLASSIFIER:
  • CLASSIFIER: training examples:word vector -> labels
  • CLASSIFIER: training examples:word vector -> labels
  • CLASSIFIER: training examples: word vector -> labelsclassification algorithm
  • CLASSIFIER: training examples: word vector -> labelsclassification algorithm
  • CLASSIFIER: training examples: word vector -> labelsclassification algorithm model
  • WEKA
  • WEKA• open source java app
  • WEKA• open source java app• contains common ML algorithms
  • WEKA• open source java app• contains common ML algorithms• gui interface
  • WEKA• open source java app• contains common ML algorithms• gui interface• can access it from jruby
  • WEKA• open source java app• contains common ML algorithms• gui interface• can access it from jruby• helps with:
  • WEKA• open source java app• contains common ML algorithms• gui interface• can access it from jruby• helps with: • converting words into vectors
  • WEKA• open source java app• contains common ML algorithms• gui interface• can access it from jruby• helps with: • converting words into vectors • training/test, cross-validation, metrics
  • ARFF FILE
  • TRAINING IN WEKA[SHOW EXAMPLE HERE]
  • EVALUATION• correctly classified• mean squared error
  • EVALUATIONfalse negative/positives
  • SENTIMENT CLASSIFICATION EXAMPLEhttps://github.com/ryanstout/mlexample
  • QUERYINGarff_path = Rails.root.join("data/sentiment.arff").to_sarff = FileReader.new(arff_path)model_path = Rails.root.join("models/sentiment.model").to_sclassifier = SerializationHelper.read(model_path)data = begin Instances.new(arff,1).tap do |instance| if instance.class_index == -1 instance.set_class_index(instance.num_attributes - 1) end endend
  • QUERYINGinstance = SparseInstance.new(data.num_attributes)instance.set_dataset(data)instance.set_value(data.attribute(0), params[:sentiment][:message])result = classifier.distribution_for_instance(instance).firstpercent_positive = 1 - result.to_f@message = "The text is #{(percent_positive*100.0).round}% positive"
  • HOW DO WE IMPROVE?
  • HOW DO WE IMPROVE?•bigger dictionary
  • HOW DO WE IMPROVE?•bigger dictionary•bi-grams/tri-grams
  • HOW DO WE IMPROVE?•bigger dictionary•bi-grams/tri-grams•part of speech tagging
  • HOW DO WE IMPROVE?•bigger dictionary•bi-grams/tri-grams•part of speech tagging•more data
  • Feature Generation
  • Feature Generation think about what information is valuable to an expert
  • Feature Generation think about what information is valuable to an expert remove data that isnt useful (attribute selection)
  • ATTRIBUTE SELECTION[SHOW ATTRIBUTE SELECTIONEXAMPLE]
  • ATTRIBUTESELECTION
  • DOMAIN PRICE PREDICTION• predict how much a domain would sell for
  • TRAINING DATA
  • TRAINING DATA• domains
  • TRAINING DATA• domains• historical sale prices for domains
  • FEATURES
  • FEATURES• split domain by words
  • FEATURES• split domain by words• generate features for each word
  • FEATURES• split domain by words• generate features for each word • how common the word is
  • FEATURES• split domain by words• generate features for each word • how common the word is • number of google results for each word
  • FEATURES• split domain by words• generate features for each word • how common the word is • number of google results for each word • cpc for the word
  • ALGORITHMsupport vector regression functions > SMOreg in weka
  • WHAT WE DIDN’T COVER
  • WHAT WE DIDN’T COVER• collaborative filtering
  • WHAT WE DIDN’T COVER• collaborative filtering• clustering
  • WHAT WE DIDN’T COVER• collaborative filtering• clustering• theorem proving (classical AI)
  • ADDITIONAL RESOURCESstanford machine learning class ml-class.org
  • TOOLS• weka• libsvm, liblinear• vowpal wabbit (big dictionaries)• recommendify • https://github.com/paulasmuth/recommendify
  • QUESTIONScontact us on twitter at@tectonic and @ryanstout