Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
ACL-IJCNLP 2015
Beijing, China
Stats
•173 long papers
•105 as oral
•68 as poster presentations
•692 total long paper submissions
•145 short papers
•50 as...
Many Minions
A Computational Approach to Automatic
Prediction of Drunk-Texting
Alcohol abuse may lead to unsociable behavior such
as cr...
Prediction of Drunk-Texting
http://ej.uz/Drunk-Texting
Feature Set for Drunk-texting Prediction
Drunk-Poster?
Modeling Argument Strength in Student Essays
While recent years have seen a surge of interest in automated essay grading, ...
Modeling Argument Strength in Student Essays
http://ej.uz/ArgStrInStudEssays
Novel features:
• POS N-grams
• Semantic Fram...
Driving ROVER with Segment-based
ASR Quality Estimation
ROVER is a widely used method to combine the output of multiple
au...
Driving ROVER with Segment-based
ASR Quality Estimation
1. Split the utterance into segments (ideally at sentence level);
...
Multi-level Translation Quality Prediction
with QUEST++
This paper presents QUEST++ , an open source tool for quality esti...
QUEST++
http://ej.uz/QUESTpp
QUEST++
Unsupervised Decomposition of a Multi-Author
Document Based on Naive-Bayesian Model
This paper proposes a new unsupervised...
Unsupervised Decomposition of a Multi-Author
Document Based on Naive-Bayesian Model
http://ej.uz/Mult-AuthDocDecomposition
Decomposition of a Multi-Author Document
• Step 1 Divide the document into segments of fixed length.
• Step 2 Represent th...
Automatic Identification of
Age-Appropriate Ratings of Song Lyrics
This paper presents a novel task, namely the
automatic ...
Linguistic Harbingers of Betrayal:
A Case Study on an Online Strategy Game
Interpersonal relations are fickle, with close ...
Linguistic Harbingers of Betrayal
http://ej.uz/LinguisticBetrayal
Features for recognizing imminent betrayal:
in decreasin...
An analysis of the user occupational class
through Twitter content
Social media content can be used as a complementary sou...
Occupational class through Twitter content
User level attributes for a Twitter user:Topics, represented by their most cent...
Beijing
Food
More photos & blog post
www.lielakeda.lv
ACL-IJCNLP 2015
ACL-IJCNLP 2015
ACL-IJCNLP 2015
ACL-IJCNLP 2015
ACL-IJCNLP 2015
Upcoming SlideShare
Loading in …5
×

ACL-IJCNLP 2015

231 views

Published on

ACL-IJCNLP 2015

Published in: Technology
  • Be the first to comment

  • Be the first to like this

ACL-IJCNLP 2015

  1. 1. ACL-IJCNLP 2015 Beijing, China
  2. 2. Stats •173 long papers •105 as oral •68 as poster presentations •692 total long paper submissions •145 short papers •50 as oral •95 as poster presentations •648 total short paper submissions •13 TACL papers •7 Student Research Workshop papers •25 system demonstrations •8 tutorials •15 workshops
  3. 3. Many Minions
  4. 4. A Computational Approach to Automatic Prediction of Drunk-Texting Alcohol abuse may lead to unsociable behavior such as crime, drunk driving, or privacy leaks. We introduce automatic drunk-texting prediction as the task of identifying whether a text was written when under the influence of alcohol. We experiment with tweets labeled using hashtags as distant supervision. Our classifiers use a set of N-gram and stylistic features to detect drunk tweets. Our observations present the first quantitative evidence that text contains signals that can be exploited to detect drunk-texting. • Dataset 1 (2435 drunk, 762 sober) • #drunk, #drank, #imdrunk • #notdrunk, #imnotdrunk, #sober • Dataset 2 (2435 drunk, 5644 sober) • Dataset H (193 drunk, 317 sober) http://ej.uz/Drunk-Texting
  5. 5. Prediction of Drunk-Texting http://ej.uz/Drunk-Texting Feature Set for Drunk-texting Prediction Drunk-Poster?
  6. 6. Modeling Argument Strength in Student Essays While recent years have seen a surge of interest in automated essay grading, including work on grading essays with respect to particular dimensions such as prompt adherence, coherence, and technical quality, there has been relatively little work on grading the essay dimension of argument strength, which is arguably the most important aspect of argumentative essays. We introduce a new corpus of argumentative student essays annotated with argument strength scores and propose a supervised, feature-rich approach to automatically scoring the essays along this dimension. Our approach significantly outperforms a baseline that relies solely on heuristically applied sentence argument function labels by up to 16.1%. http://ej.uz/ArgStrInStudEssays
  7. 7. Modeling Argument Strength in Student Essays http://ej.uz/ArgStrInStudEssays Novel features: • POS N-grams • Semantic Frames • Transitional Phrases • Coreference • Prompt Agreement • Argument Component Predictions • Argument Errors
  8. 8. Driving ROVER with Segment-based ASR Quality Estimation ROVER is a widely used method to combine the output of multiple automatic speech recognition (ASR) systems. Though effective, the basic approach and its variants suffer from potential drawbacks: i) their results depend on the order in which the hypotheses are used to feed the combination process, ii) when applied to combine long hypotheses, they disregard possible differences in transcription quality at local level, iii) they often rely on word confidence information. We address these issues by proposing a segment-based ROVER in which hypothesis ranking is obtained from a confidence-independent ASR quality estimation method. Our results on English data from the IWSLT2012 and IWSLT2013 evaluation campaigns significantly outperform standard ROVER and approximate two strong oracles. http://ej.uz/ROVER-SegASR-QEst
  9. 9. Driving ROVER with Segment-based ASR Quality Estimation 1. Split the utterance into segments (ideally at sentence level); 2. For each segment, automatically estimate the quality (e.g. in terms of WER) of the corresponding M (segment-level) hypotheses; 3. Use the estimates to rank the hypotheses and feed ROVER based on the ranking; 4. Reconstruct the entire utterance transcription by concatenating the combined segment level transcriptions produced by ROVER; 5. Measure the overall WER differences against standard ROVER and other oracles. http://ej.uz/ROVER-SegASR-QEst
  10. 10. Multi-level Translation Quality Prediction with QUEST++ This paper presents QUEST++ , an open source tool for quality estimation which can predict quality for texts at word, sentence and document level. It also provides pipelined processing, whereby prediction smade at a lower level (e.g. for words) can be used as input to build models for predictions at a higher level (e.g. sentences). QUEST++ allows the extraction of a variety of features, and provides machine learning algorithms to build and test quality estimation models. Results on recent datasets show that QUEST++ achieves state-of-the-art performance. http://ej.uz/QUESTpp • 148 sentence level features • 40 word level features • 67 document level features
  11. 11. QUEST++ http://ej.uz/QUESTpp
  12. 12. QUEST++
  13. 13. Unsupervised Decomposition of a Multi-Author Document Based on Naive-Bayesian Model This paper proposes a new unsupervised method for decomposing a multi-author document into authorial components. We assume that we do not know anything about the document and the authors, except the number of the authors of that document. The key idea is to exploit the difference in the posterior probability of the Naive- Bayesian model to increase the precision of the clustering assignment and the accuracy of the classification process of our method. Experimental results show that the proposed method outperforms two state-of-the-art methods. http://ej.uz/Mult-AuthDocDecomposition
  14. 14. Unsupervised Decomposition of a Multi-Author Document Based on Naive-Bayesian Model http://ej.uz/Mult-AuthDocDecomposition
  15. 15. Decomposition of a Multi-Author Document • Step 1 Divide the document into segments of fixed length. • Step 2 Represent the resulted segments as vectors using an appropriate feature set which can differentiate the writing styles among authors. • Step 3 Cluster the resulted vectors into l clusters using an appropriate clustering algorithm targeting on achieving high recall rates. • Step 4 Re-vectorize the segments using a different feature set to more accurately discriminate the segments in each cluster. • Step 5 Apply the ”Segment Elicitation Procedure” to select the best segments from each cluster to increase the precision rates. • Step 6 Re-vectorize all selected segments using another feature set that can capture the differences among the writing styles of all sentences in a document. • Step 7 Train the classifier using the Naive-Bayesian model. • Step 8 Classify each sentence using the learned classifier. • Step 9 Apply the ”Probability Indication Procedure” to increase the accuracy of the classification results using five criteria. http://ej.uz/Mult-AuthDocDecomposition
  16. 16. Automatic Identification of Age-Appropriate Ratings of Song Lyrics This paper presents a novel task, namely the automatic identification of age-appropriate ratings of a musical track, or album, based on its lyrics. Details are provided regarding the construction of a dataset of lyrics from 12,242 tracks across 1,798 albums along with age-appropriate ratings obtained from various web resources, along with results from various text classification experiments. The best accuracy of 71.02% for classifying albums by age groups is achieved by combining vector space model and psycholinguistic features. http://ej.uz/IDofSongAgeRatings Statistics of the dataset:
  17. 17. Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game Interpersonal relations are fickle, with close friendships often dissolving into enmity. In this work, we explore linguistic cues that presage such transitions by studying dyadic interactions in an on-line strategy game where players form alliances and break those alliances through betrayal. We characterize friendships that are unlikely to last and examine temporal patterns that foretell betrayal. We reveal that subtle signs of imminent betrayal are encoded in the conversational patterns of the dyad, even if the victim is not aware of the relationship’s fate. In particular, we find that lasting friendships exhibit a form of balance that manifests itself through language. In contrast, sudden changes in the balance of certain conversational attributes—such as positive sentiment, politeness, or focus on future planning—signal impending betrayal. http://ej.uz/LinguisticBetrayal
  18. 18. Linguistic Harbingers of Betrayal http://ej.uz/LinguisticBetrayal Features for recognizing imminent betrayal: in decreasing order
  19. 19. An analysis of the user occupational class through Twitter content Social media content can be used as a complementary source to the traditional methods for extracting and studying collective social attributes. This study focuses on the prediction of the occupational class for a public user profile. Our analysis is conducted on a new annotated corpus of Twitter users, their respective job titles, posted textual content and platform-related attributes. We frame our task as classification using latent feature representations such as word clusters and embeddings. The employed linear and, especially, non-linear methods can predict a user’s occupational class with strong accuracy for the coarsest level of a standard occupation taxonomy which includes nine classes. Combined with a qualitative assessment, the derived results confirm the feasibility of our approach in inferring a new user attribute that can be embedded in a multitude of downstream applications. http://ej.uz/occupationalClass-Twitter
  20. 20. Occupational class through Twitter content User level attributes for a Twitter user:Topics, represented by their most central and most frequent 10 words:
  21. 21. Beijing
  22. 22. Food
  23. 23. More photos & blog post www.lielakeda.lv

×