Topic Tagging with Watson by Ken Goldberg, UC Berkeley

397 views

Published on

Ken Goldberg, Artist and UC Berkeley Professor presentation on Cognitive Systems Institute Speaker Series February 4, 2016.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
397
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Topic Tagging with Watson by Ken Goldberg, UC Berkeley

  1. 1. M-CAFE Topic Tagging With Watson
  2. 2. Dataset § M-CAFE for IEOR 115: 16 weeks in Aug - Dec, 2015 • Student count: 115 • Idea count: 106 § 106 ideas with tags are split randomly into train (86 ideas) and test (20 ideas).
  3. 3. Watson NaturalLanguageClassifier
  4. 4. Train&Test Sets • Train: 86 ideas with topics tagged. • Test: 20 ideas without topics tagged. Screen capture of the .csv file for training set
  5. 5. Code • curl -i -u "896090f0-631f-4745-b02a- 47b6417140d6":"xuDyj6lD9USr" -F training_data=@/Users/apple/Desktop/mcafe_watson_train.c sv -F training_metadata="{"language":"en","name":"McafeCl assifier"}" "https://gateway.watsonplatform.net/natural- language-classifier/api/v1/classifiers" • curl -G -u "896090f0-631f-4745-b02a- 47b6417140d6":"xuDyj6lD9USr" "https://gateway.watsonplatform.net/natural-language- classifier/api/v1/classifiers/3AE103x13-nlc-1276/classify" -- data-urlencode"text=testData"
  6. 6. Test Result: 80% Accuracy! Out of the 20 test samples, 16 were corrected classified.
  7. 7. Idea Topic Slower pace. Lectures Add Lecture overview Resources I want more practice with Relational Algebra and eventually SQL. Homework The last few lectures have been very mathematically precise in notation which can make it a bit tricky to wrap your head around. Specific questions/examples (like what might be on hw) would be great to help us make sure we understand it moving forward. Lectures The project seems a little stop and go. We haven't been able to work on it for a week or so but I feel like we'll soon be expected to do a bunch of work for DP2. It would be helpful if we could have the tools to have a more constant level of work on the project. Projects Please try and post the labs earlier so that we can get a head start reading and understanding them. Labs Homework 2 only has database questions, maybe put some connectives? Homework Incorporate a short question and answer period midway of lecture to assess participating students' understanding of the lecture/topics being presented. Lectures Examples of ideas which are correctly classified:
  8. 8. Misclassifications • The true tag is among the top two tags suggested by the classifier. • Misclassification occurs when an idea is arbitrarily tagged or with lack of context. Idea True Tag Pred Tag Confidence 1. slow down a little bit Lectures Resources Resources: 0.288; Lectures:0.224 2. It would be great if you could provide outside resources on rules and guidelines for things like ER diagrams that you think are worth our time. Resources Lectures Lectures: 0.879; Resources:0.130
  9. 9. Idea True Tag Pred Tag Confidence 3. I would like have some implantation problems using SQL Homework New Topics New Topics: 0.803; Homework: 0.076 4. More hands on experiences on Databases Homework New Topics New Topics: 0.786; Homework: 0.117 Misclassifications Contd… • The true tag is among the top two tags suggested by the classifier. • Misclassification occurs when an idea is arbitrarily tagged or with lack of context.
  10. 10. Questions for IBM • 1. How is the classifier trained? What is the classification method? • 2. Is there a version of the classifier that can return the predicted topic for the test set? • 3. This essentially a supervised classification problem, does Watson have an unsupervised version available, just provide raw text and it would assign tags?

×