Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ekaw2014 ziqi zhang

464 views

Published on

Learning with Partial Data for Semantic Table Interpretation

  • Be the first to comment

  • Be the first to like this

Ekaw2014 ziqi zhang

  1. 1. Learning with Partial Data for Semantic Table Interpretation Ziqi Zhang Department of Computer Science, University of Sheffield
  2. 2. Semantic Table Interpretation •Input • Ontology • Relational table •Goals/Tasks • Column – classes/concepts • Cell – named entities • Column, Column – relation Thing Company Work Time Period … … Ent:2kGames Ent:THQ … VidoeGameCompany Video Game Year Name Publisher Year 1 Gears of War Microsoft 2006 2 Civilization IV 2k Games 2006 3 Titan Quest THQ 2006 99 Civilization V 2k Games 2010 Table of video games (PC) < … … > … … Rel:publishedBy Rel:publishedBy Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  3. 3. Motivation •SoA semantic table interpretation methods, e.g. [Limaye2010, Venetis2011, Mulwad2013] Limitation Algorithm is ‘exhaustive’, but unnecessary Goal: Assign a concept to this column Hint: Content in the column gives useful clues How much do we need for inference (99 rows in this example)? - Human: SOME (learn by example) - SoA: ALL Z. Zhang / Learning with Partial Data for Semantic Table Interpretation Name Publisher Year 1 Gears of War Microsoft 2006 2 Civilization IV 2k Games 2006 3 Titan Quest THQ 2006 99 Civilization V 2k Games 2010 < … … >
  4. 4. Research Questions •Can machines ‘learn by example’ • inference using only partial data (sample) • achieving good accuracy •How to choose a sample • does it matter (e.g., in terms of accuracy) • how to optimize Z. Zhang / Learning with Partial Data for Semantic Table Interpretation Zhang, Z. (2014). Towards efficient and effective semantic table interpretation. In Proceedings of the 13th International Semantic Web Conference, 487-502 TableMiner (contribution of this work) Sample Selection
  5. 5. Method Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  6. 6. TableMiner (modified) •Incremental inference (I-Inf) to address two tasks • Column classification •Using some data in the column • Cell disambiguation •Using column label to constrain disambiguation Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  7. 7. •Incremental inference (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell Z. Zhang / Learning with Partial Data for Semantic Table Interpretation TableMiner (modified)
  8. 8. Z. Zhang / Learning with Partial Data for Semantic Table Interpretation TableMiner (modified) 1 2 3 … … Until Cj changes little (convergence)
  9. 9. Z. Zhang / Learning with Partial Data for Semantic Table Interpretation TableMiner (modified) Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>, …. <c11,s11‘>} Column label (class) used as constraint in selecting candidate entities for disambiguation
  10. 10. Sample Selection – the Principle •‘Order matters’ • TableMiner processes data in order until convergence • Changing the order means •(Possibly) Different convergence speed •Different data are processed •Change the order of cells in a column (and corresponding row) such that • cells that are ‘easier’ to disambiguate come to the top •because the class for a column depends on cells in the column Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  11. 11. Sample Selection- ‘name length’ hypothesis •Longer names are easier to disambiguate than shorter names • e.g., “Manchester” v.s. “Manchester United F.C.” •Method name length (nl): •nl(Ti,j) = # of tokens in cell Ti,j •Re-order table rows by sorting on column Tj using nl(Ti,j) Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  12. 12. •Names that have a richer feature representation are easier to disambiguate • B.O.W. representation using row context • ‘one-sense-per-discourse’ (in non-subject columns) • Z. Zhang / Learning with Partial Data for Semantic Table Interpretation Sample Selection- ‘feature density’ hypothesis
  13. 13. •Method ‘duplicate content cell’ (dup) • re-arrange the target column and table following ospd • dup(Ti,j) = # of times text of Ti,j is duplicated in column Tj • Re-order table rows by sorting on column Tj using dup(Ti,j) Z. Zhang / Learning with Partial Data for Semantic Table Interpretation Sample Selection- ‘feature density’ hypothesis
  14. 14. •Method ‘feature representation size’ (rep) • re-arrange the target column and table following ospd • rep(Ti,j) = # of tokens in the B.O.W. representation of Ti,j • Re-order table rows by sorting on column Tj using rep(Ti,j) Z. Zhang / Learning with Partial Data for Semantic Table Interpretation Sample Selection- ‘feature density’ hypothesis
  15. 15. Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  16. 16. Evaluation Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  17. 17. Data •Data • Freebase as reference ontology/background knowledge • Limaye200 – 200 Web tables from Limaye2010 originally annotated with Wikipedia •Column classes are manually annotated • LimayeAll – 6310 Web tables from Limaye2010 •Names in content cells are automatically mapped to Freebase Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  18. 18. Settings •Baseline • 푇푀푏푠 – modified TableMiner to use all cells in a column for column classification (everything else unchanged) •Comparison* • 푇푀푚표푑 푛푙 - TableMiner using name length sample selection method • 푇푀푚표푑 푑푢푝 - TableMiner using duplicate content cell sample selection method • 푇푀푚표푑 푟푒푝 - TableMiner using feature representation size sample selection method * The original TableMiner is modified. For details and other settings see paper. Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  19. 19. Results •Results in F1 •Convergence speed in column classification •Reduced candidate named entities for disambiguation 푇푀푏푠 푇푀푚표푑 푛푙 푇푀푚표푑 푑푢푝 푇푀푚표푑 푟푒푝 Classification (Limaye200) 72.1 72.3 72.0 72.1 Disambiguation (LimayeAll) 80.9 81.3 81.22 81.24 푇푀푏푠 푇푀푚표푑 푛푙 푇푀푚표푑 푑푢푝 푇푀푚표푑 푟푒푝 Limaye200 100% 36.3% 36.1% 35.3% 푇푀푏푠 푇푀푚표푑 푛푙 푇푀푚표푑 푑푢푝 푇푀푚표푑 푟푒푝 Limaye200 0 32.4% 48.1% 46.8% Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  20. 20. Results •Results in F1 •Convergence speed in column classification •Reduced candidate named entities disambiguation 푇푀푏푠 푇푀푚표푑 푛푙 푇푀푚표푑 푑푢푝 푇푀푚표푑 푟푒푝 Classification (Limaye200) 72.1 72.3 72.0 72.1 Disambiguation (LimayeAll) 80.9 81.3 81.22 81.24 푇푀푏푠 푇푀푚표푑 푛푙 푇푀푚표푑 푑푢푝 푇푀푚표푑 푟푒푝 Limaye200 100% 36.3% 36.1% 35.3% 푇푀푏푠 푇푀푚표푑 푛푙 푇푀푚표푑 푑푢푝 푇푀푚표푑 푟푒푝 Limaye200 0 32.4% 48.1% 46.8% Z. Zhang / Learning with Partial Data for Semantic Table Interpretation Comparable or better accuracy But uses only partial data for column classification … and process much less data for disambiguation
  21. 21. Conclusion •Learning with partial data for semantic table interpretation can be both effective and efficient •The choice of sample selection methods makes limited difference in terms of accuracy and efficiency Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  22. 22. Thank you Z. Zhang / Learning with Partial Data for Semantic Table Interpretation @ziqizhang_zz http://staffwww.dcs.shef.ac.uk/people/Z.Zhang

×