Practical Machine Learning and Rails Part2

3,186 views

Published on

our machine learning talk, Ryan's part

Part 1: http://www.slideshare.net/ryanstout/practical-machine-learning-and-rails-part1

Published in: Technology, Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,186
On SlideShare
0
From Embeds
0
Number of Embeds
1,493
Actions
Shares
0
Downloads
59
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • having an example makes it easier to understand the process\n
  • also could use movie/product review data\n
  • also could use movie/product review data\n
  • also could use movie/product review data\n
  • also could use movie/product review data\n
  • also could use movie/product review data\n
  • \n
  • bag of words - a way of generating features from text that only looks at which words occur in the text\n- doesn’t look at word order, syntax, grammar, punctuation, etc...\n
  • bag of words - a way of generating features from text that only looks at which words occur in the text\n- doesn’t look at word order, syntax, grammar, punctuation, etc...\n
  • bag of words - a way of generating features from text that only looks at which words occur in the text\n- doesn’t look at word order, syntax, grammar, punctuation, etc...\n
  • words in dictionary array are replaced with the count’s in the text\n\n
  • words in dictionary array are replaced with the count’s in the text\n\n
  • words in dictionary array are replaced with the count’s in the text\n\n
  • words in dictionary array are replaced with the count’s in the text\n\n
  • word vectors/labels\n
  • word vectors/labels\n
  • word vectors/labels\n
  • word vectors/labels\n
  • word vectors/labels\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • generated using RARFF\n
  • \n
  • \n
  • \n
  • \n
  • load the arff\nload the model - serialized java object\nload a dataset\n
  • create a sparse instance, set the dataset\nget distribution (predicted values for each class)\n
  • the cat ran out the door\n[the cat] [cat ran] [ran out]...\n
  • the cat ran out the door\n[the cat] [cat ran] [ran out]...\n
  • the cat ran out the door\n[the cat] [cat ran] [ran out]...\n
  • the cat ran out the door\n[the cat] [cat ran] [ran out]...\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • assume a max of three words\neach feature of three words, 0’s if less words\n
  • assume a max of three words\neach feature of three words, 0’s if less words\n
  • assume a max of three words\neach feature of three words, 0’s if less words\n
  • assume a max of three words\neach feature of three words, 0’s if less words\n
  • assume a max of three words\neach feature of three words, 0’s if less words\n
  • \n
  • clustering - similar documents, related terms\n
  • clustering - similar documents, related terms\n
  • clustering - similar documents, related terms\n
  • \n
  • vowpal - good for large datasets, contains different algorithms (matrix factorization, collab filtering, lda, etc..)\n
  • hopefully this helped you know the tools and techniques\nyou can teach yourself\nfeel free to contact us\n
  • Practical Machine Learning and Rails Part2

    1. 1. SENTIMENTCLASSIFICATION
    2. 2. TRAINING DATA:
    3. 3. TRAINING DATA:- tweets
    4. 4. TRAINING DATA:- tweets- positive/negative
    5. 5. TRAINING DATA:- tweets- positive/negative - use emoticons from twitter
    6. 6. TRAINING DATA:- tweets- positive/negative - use emoticons from twitter :-) or :-(
    7. 7. BUILDING TRAINING DATA: NEGATIVE is upset that he cant update his Facebook by texting it... and might cry as a result School today also. Blah! I couldnt bear to watch it. And I thought the UA loss was embarrassing I hate when I have to call and wake people up POSITIVE Just woke up. Having no school is the best feeling ever Im enjoying a beautiful morning here in Phoenix dropping molly off getting ice cream with Aaron
    8. 8. FEATURES:
    9. 9. FEATURES: BAG OF WORDS MODEL
    10. 10. FEATURES: BAG OF WORDS MODEL split the text into words, create a dictionary, and replace text with word counts
    11. 11. BAG OF WORDS
    12. 12. BAG OF WORDStweets:I ran fastBob ran farI ran to Bob
    13. 13. BAG OF WORDStweets:I ran fastBob ran farI ran to Bob dictionary = %w{I ran fast Bob far to}
    14. 14. BAG OF WORDStweets:I ran fastBob ran farI ran to Bob dictionary = %w{I ran fast Bob far to}
    15. 15. BAG OF WORDStweets: word vectors:I ran fast [1 1 1 0 0 0]Bob ran far [0 1 0 1 1 0]I ran to Bob [1 1 0 1 0 1] dictionary = %w{I ran fast Bob far to}
    16. 16. CLASSIFIER:
    17. 17. CLASSIFIER: training examples:word vector -> labels
    18. 18. CLASSIFIER: training examples:word vector -> labels
    19. 19. CLASSIFIER: training examples: word vector -> labelsclassification algorithm
    20. 20. CLASSIFIER: training examples: word vector -> labelsclassification algorithm
    21. 21. CLASSIFIER: training examples: word vector -> labelsclassification algorithm model
    22. 22. WEKA
    23. 23. WEKA• open source java app
    24. 24. WEKA• open source java app• contains common ML algorithms
    25. 25. WEKA• open source java app• contains common ML algorithms• gui interface
    26. 26. WEKA• open source java app• contains common ML algorithms• gui interface• can access it from jruby
    27. 27. WEKA• open source java app• contains common ML algorithms• gui interface• can access it from jruby• helps with:
    28. 28. WEKA• open source java app• contains common ML algorithms• gui interface• can access it from jruby• helps with: • converting words into vectors
    29. 29. WEKA• open source java app• contains common ML algorithms• gui interface• can access it from jruby• helps with: • converting words into vectors • training/test, cross-validation, metrics
    30. 30. ARFF FILE
    31. 31. TRAINING IN WEKA[SHOW EXAMPLE HERE]
    32. 32. EVALUATION• correctly classified• mean squared error
    33. 33. EVALUATIONfalse negative/positives
    34. 34. SENTIMENT CLASSIFICATION EXAMPLEhttps://github.com/ryanstout/mlexample
    35. 35. QUERYINGarff_path = Rails.root.join("data/sentiment.arff").to_sarff = FileReader.new(arff_path)model_path = Rails.root.join("models/sentiment.model").to_sclassifier = SerializationHelper.read(model_path)data = begin Instances.new(arff,1).tap do |instance| if instance.class_index == -1 instance.set_class_index(instance.num_attributes - 1) end endend
    36. 36. QUERYINGinstance = SparseInstance.new(data.num_attributes)instance.set_dataset(data)instance.set_value(data.attribute(0), params[:sentiment][:message])result = classifier.distribution_for_instance(instance).firstpercent_positive = 1 - result.to_f@message = "The text is #{(percent_positive*100.0).round}% positive"
    37. 37. HOW DO WE IMPROVE?
    38. 38. HOW DO WE IMPROVE?•bigger dictionary
    39. 39. HOW DO WE IMPROVE?•bigger dictionary•bi-grams/tri-grams
    40. 40. HOW DO WE IMPROVE?•bigger dictionary•bi-grams/tri-grams•part of speech tagging
    41. 41. HOW DO WE IMPROVE?•bigger dictionary•bi-grams/tri-grams•part of speech tagging•more data
    42. 42. Feature Generation
    43. 43. Feature Generation think about what information is valuable to an expert
    44. 44. Feature Generation think about what information is valuable to an expert remove data that isnt useful (attribute selection)
    45. 45. ATTRIBUTE SELECTION[SHOW ATTRIBUTE SELECTIONEXAMPLE]
    46. 46. ATTRIBUTESELECTION
    47. 47. DOMAIN PRICE PREDICTION• predict how much a domain would sell for
    48. 48. TRAINING DATA
    49. 49. TRAINING DATA• domains
    50. 50. TRAINING DATA• domains• historical sale prices for domains
    51. 51. FEATURES
    52. 52. FEATURES• split domain by words
    53. 53. FEATURES• split domain by words• generate features for each word
    54. 54. FEATURES• split domain by words• generate features for each word • how common the word is
    55. 55. FEATURES• split domain by words• generate features for each word • how common the word is • number of google results for each word
    56. 56. FEATURES• split domain by words• generate features for each word • how common the word is • number of google results for each word • cpc for the word
    57. 57. ALGORITHMsupport vector regression functions > SMOreg in weka
    58. 58. WHAT WE DIDN’T COVER
    59. 59. WHAT WE DIDN’T COVER• collaborative filtering
    60. 60. WHAT WE DIDN’T COVER• collaborative filtering• clustering
    61. 61. WHAT WE DIDN’T COVER• collaborative filtering• clustering• theorem proving (classical AI)
    62. 62. ADDITIONAL RESOURCESstanford machine learning class ml-class.org
    63. 63. TOOLS• weka• libsvm, liblinear• vowpal wabbit (big dictionaries)• recommendify • https://github.com/paulasmuth/recommendify
    64. 64. QUESTIONScontact us on twitter at@tectonic and @ryanstout

    ×