Domain	
  Scoping	
  for	
  Subject	
  
Ma4er	
  Experts	
  
Elham	
  Khabiri,	
  Ma4hew	
  Riemer,	
  
Fenno	
  F.	
  Hea...
IntroducFon	
  
•  Exploring	
  Social	
  Media	
  is	
  essenFal	
  for	
  SMEs	
  
–  To	
  discover	
  relevant	
  cont...
Demo	
  
•  h4p://10.122.124.55:8880/#iniFate	
  
Different	
  Methods	
  and	
  Datasets	
  
Generates	
  terms	
  that	
  occur	
  frequently	
  with	
  one	
  or	
  more	...
Different	
  Methods	
  and	
  Datasets	
  
Generates	
  terms	
  that	
  are	
  “similar”	
  to	
  the	
  seed	
  terms.	
...
Different	
  Methods	
  and	
  Datasets	
  
GLOVE	
  (Pennington,	
  Socher,	
  Manning):	
  Generates	
  terms	
  that	
  ...
Different	
  Methods	
  and	
  Datasets	
  
Generates	
  term	
  pairs	
  that	
  occur	
  with	
  one	
  or	
  more	
  of	...
Finding	
  Relevant	
  EnFFes:	
  Using	
  Wikipedia	
  
Phase1:	
  
DisambiguaFon	
  
Phase2:	
  Find	
  
Synonyms	
  
C:...
Using	
  Wikipedia	
  
Shows	
  the	
  categories	
  
that	
  the	
  seed	
  terms	
  are	
  
belong	
  to	
  
Using	
  Wikipedia	
  
American	
  Educator,	
  
proponent	
  of	
  
homeschooling	
  
C:	
  Youth	
  
E:	
  World	
  Educ...
Thanks	
  for	
  Listening!	
  
TFIDF	
  
Word2vec	
  
GLOVE	
  
Seed	
  Term:	
  EducaFon	
  reform	
  
CollocaFon	
  
WikiPedia	
  
Upcoming SlideShare
Loading in …5
×

Domain Scoping for Subject Matter Experts by Elham Khabiri

370 views

Published on

Presentation for Cognitive Systems Institute Group Speaker Series call on October 15, 2015. Elham Khabiri is a Researcher at the IBM TJ Watson Research Center.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
370
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Domain Scoping for Subject Matter Experts by Elham Khabiri

  1. 1. Domain  Scoping  for  Subject   Ma4er  Experts   Elham  Khabiri,  Ma4hew  Riemer,   Fenno  F.  Heath  III,  Richard  Hull     Oct  15  
  2. 2. IntroducFon   •  Exploring  Social  Media  is  essenFal  for  SMEs   –  To  discover  relevant  content  around  a  subject   –  To  analyze  senFment  about  a  subject   –  Example:  Possible  changes  to  the  common-­‐ core  by  government,  how  it  is  reflected  in   social  media   –  What  are  the  vocabularies  that  arFcles,   media,  use  to  address  relevant  discussions   •  Provide  a  tool  to  define  scope  of  vocabulary   –  Suggest  SMEs  what  vocabularies  to  search  for  in  the  news  and  social  media   –  Construct  “domain  model”:  family  of  vocab  and  extractors   –  Different  Algorithm  and  datasets  are  offered  by  the  tool   •  Dataset:  Common  Crawl,  BoardReader  News  and  Forums,  Google  News     •  Methods:  CollocaFon,  TFIDF,  NN  (Word2Vec  and  Glove)  
  3. 3. Demo   •  h4p://10.122.124.55:8880/#iniFate  
  4. 4. Different  Methods  and  Datasets   Generates  terms  that  occur  frequently  with  one  or  more  of  the   seed  terms  with  a  frequency  that  is  relaFvely  high  as   compared  with  how  ocen  they  occur  in  “all”  documents.  
  5. 5. Different  Methods  and  Datasets   Generates  terms  that  are  “similar”  to  the  seed  terms.  The   similarity  metric  is  based  on  an  analysis  of  a  large  family  of   news  arFcles  from  2013,  that  were  gathered  by  Google.   1M  unigrams,  1M  bigrams,  1M  trigrams.  
  6. 6. Different  Methods  and  Datasets   GLOVE  (Pennington,  Socher,  Manning):  Generates  terms  that   are  “similar”  to  the  seed  terms.  The  similarity  metric  is  based   on  an  analysis  of  a  large  family  of  web  documents  from  the   last  7  years,  that  were  gathered  by  Common  Crawl.  3M   unigrams,  10K  bigrams.    
  7. 7. Different  Methods  and  Datasets   Generates  term  pairs  that  occur  with  one  or  more  of  the  seed   terms  with  a  frequency  that  is  relaFvely  high  as  compared  with   how  ocen  they  occur  in  “all”  documents.  
  8. 8. Finding  Relevant  EnFFes:  Using  Wikipedia   Phase1:   DisambiguaFon   Phase2:  Find   Synonyms   C:  Youth   E:  World  EducaFon   Services   C:  EducaFon   issues   C:  History  of   EducaFon   C:  EducaFon   reform   E:  Shlomo   Dovrat   Wikipedia   Categories   Wikipedia   EnFFes   E:  EducaFon   Reform   E:  Common   Core  State  Init.   DisambiguaFon:  Atlas  V  is   called  Common  Core   Booster  
  9. 9. Using  Wikipedia   Shows  the  categories   that  the  seed  terms  are   belong  to  
  10. 10. Using  Wikipedia   American  Educator,   proponent  of   homeschooling   C:  Youth   E:  World  EducaFon   Services   C:  EducaFon   issues   C:  History  of   EducaFon   C:  EducaFon   reform   E:  Shlomo   Dovrat   E:  School-­‐to-­‐ work   transiFon   E:  EducaFon   reform  
  11. 11. Thanks  for  Listening!  
  12. 12. TFIDF   Word2vec   GLOVE  
  13. 13. Seed  Term:  EducaFon  reform   CollocaFon   WikiPedia  

×