Marina Santini

283 Followers

63 SlideShares 283 Followers 34 Followings

I am a computational linguist with a strong interest in textual and linguistic features, machine learning and intensive textual data processing. My personal challenge is to extract "contextualized" information from big unstructured textual data leveraging on the concept of "genre". The word "genre" means "type of text". Nowadays all kinds of businesses, enterprises and customer care services produce huge amount of data in the form of many different "genres", i.e. emails, memos, notes from call-centers, news, user groups, chats, reports, tweets, Facebook pages, blogs, forums, marketing material and so on. All these textual genres contain valuable but unstructured data. The exploitation of ...

machine learning language technology supervised classification computational semantics weka decision trees nlp sentiment analysis supervised machine learning semantic analysis in language technology uppsala university entropy noise svm perceptron gain ratio information gain divide and conquer logistic regression text analytics genre marina santini corpus evaluation web corpora domain-specific semantics in language technology text mining mesh wordnet pointwise mutual information naive bayes baseline algorithm selectional restrictions evaluation crossvalidation induction rules formal languages automata wordle tag clouds word clouds description logics rdf owl semantic web thematic roles semantic roles predicate-argument structure unification sampling smoothing independence statistical inference flipped classroom conditional probability axioms of probability margin training set inductive bias structured data opinion mining unstructured data semantics formal semantics lexical semantics semantic analysis semi-supervised learning dependency parsing mira best split similarity nearest neighbors pruning supervised learning clustering emotion query log analysis automatic genre identification events kendall correlation coefficient mann-withney-wilcoxon test kullback– leibler divergence log-likelihood burstiness domainhood terminology extraction corpus quality lay-specialized sublanguage web corpus ecare domain ward’s linkage agglomerative hierarchical clustering unsupervised machine learning swedish-umeå corpus (suc) readability distortion star forest cycle cover automatic folksonomy construction social tagging folksonomy aspect ratio area utilization realized adjacencies inflate and push cpewcv context-preserving word cloud visualisation running time compactness quantitative metrics seam carving iri sparql ontology learning classes relations webprotege tree of porphyry ontology shared understanding web 3.0 tags dls shared semantic annotation single vs. multiple documents unsupervised content selection query-focused summarization abstractive summarization summarization in question answering snippets extractive summarization recall oriented understudy for gisting evaluation rouge topic signature-based content selection abstracting bootstrapping ace hand-written patterns databases of relations freebase unsupervised learning from the web dbpedia relation extractors knowledge graph distant supervision narrative questions passage retrieval hybrid approaches wolframalpha mean reciprocal rank ir-based question answering factoid questions mrr apple's siri answer type taxonomy complex questions ibm's watson knowledge-based approaches ir-based approaches word shapes standard evaluation per entity calendaring e-discovery sequence labeling information extraction sequence classifier standard evaluation per token named entity recognition ner cosine metric ppmi zellig harris distributional models positive pointwise mutual information vectors joint probability cosine similarity measure pmi john rupert firth marginals dot product term-context matrix information content simplified lesk extended lesk word sense disambiguation elesk michael lesk path-based similarity lesk algorithm supervised methods surprisal dictionary-based methods semcor lin method resnik method thesaurus-based methods wsd graph-based methods word relatedness word similarity corpus lesk hyponymy zeugma test meronymy metonymy wordform synonmy part-whole meronymy babelnet antonomy polysemy lemma membership meronymy senseval hypernymy homonymy word senses manually-built sentiment lexicons general inquirer learning sentiment lexicons semi-supervised methods sentiment lexicons sentiment mining likelihood sentiwordnet connotational aspects emotion classification affetctive meaning turney algorithm mutual information scherer typology sentiment lexica semantic role labeling shallow semantic representation shallow semantics propbank framenet semantic role labelling propositional logic connotation computational semantcs. predicate logic first-order logic meaning representation logic formal theories denotation logic and language leave-one-out bootstrap theoretical modelling unbalanced data representation holdout estimation real-world implementations multiclass classification counting the cost t-test recall-precision curves loss functon cost-sensitive measures lift charts k-statistic occam's razor roc curves confidence interval for the mean confidence interval for proportion z critical value confidence level interval estimation multiplier inferential statistics standard error confidence interval attribute selection machine leaning constructing decision trees suprisal loss function inductive bias of the decision tree greediness empirical error induction expected loss development set test set precision accuracy hyperparameters confusion matrix stratification recall parameters leave one out f-measure induction pipeline measures of central tendency sparse data mode arff format instances data measures of dispersion median mean outliers population features normal distribution sample attributes missing data concepts test data elements of machine learning inference algorithms overfitting learning algorithms training data underfitting generalization machine learning models deduction plagiarism hybrid teaching/learning model cheating scalable platform cooperation examination multiplication rule marginal probability bayes law probability theorems probability therory addition rule terminals backus-naur form cfgs phrase structure grammars context-free grammars non-terminals finite state machines regular expressions pumping lemma regular languages deterministic non-deterministic fsa finite state automata meaningful adjacencies semantically-related words layout evaluation criteria dissimilarity quantitative evaluation semantic word clouds ontologies roles semantic role labelling lamba calculus topic models latent semantic analysis formal and computational representations the semantics of first-order logic description logics & the web ontology language distributional semantics event representations corpus-based approaches compositionality minimum error max log-likelihood max margin support vectors machines margin infused relaxed algorithm maximizing margin the norm margin and separability feature representation main theorem k-nearest neighbors distance metric variance hypothesis testing maximum likelihood estimation (mle) expectations z-test conditional probabilities estimation joint probabilities frequency functions stochastic variables problems for hmms markov assumptions smoothing for pos tagging algorithms for hmms pos tagging with hmms hidden markov models (hmms) em for naive bayes hidden and latent variables maximum likelihood estimation expectation-maximization naive bayes classifiers bayesian classification naive bayes in nlp instance attributes spam filtering probabilities statistics learning outcomes lab sessions flip teaching video lectures notion of probability independence and incompatibility sample spaces theorems of probability statistical methods and natural language processin generalization model assessment unsupervised learning types of classification cross-validation classification in nlp empirical error classification definition of machine learning reinforcement learning supervised learning type of machine learning hypothesis class regression affect natural language processing affective states semantic-oriented applications professional profile job title job peer reviewing argumentation critical thinking academic writing topic sentence representation of meaning computational lexical semantics structured prediction multilinguality partial supervision named-entity recognition meetups ambiguous supervision indirect supervision latent-variable model incomplete supervision linguistic structure prediction multilingual learning part-of-speech tagging cross-lingual learning gavagai recorded future ensemble cascading boostrap resampling ensemble learner stacking base learner adaboost bagging boosting voting structured mira sequence tagging structured perceptron conditional random fields structured svms support vector machine classifiers k-nn machine learning workbench statistical software svms logistic regression/maximum entropy eager learning lazy learning modified value difference metric distance overlap measure unsupervised classification big data strata business intelligence hadhoop information discovery r actionable intelligence customer analytics crisis analysis stefan th. gries big textual data query logs cyberemotions sentistrength swedish italian findwise information architecture search query log actionable information contextualized information agi news products venues geographical information

Activity
About

Marina Santini

Presentations

Uppsala uni 4march2011

CityTimes

Towards Contextualized Information: How Automatic Genre Identification Can Help

SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence

How Emotional Are Users' Needs? Emotion in Query Logs

Text analytics and R - Open Question: is it a good match?

Lecture 01: Machine Learning for Language Technology - Introduction

Lecture 02: Machine Learning for Language Technology - Decision Trees and Nearest Neighbors

Lecture 03: Machine Learning for Language Technology - Linear Classifiers

Lecture 4: The Weka Package

Lecture 5: Structured Prediction

Lecture 6: Ensemble Methods

Lecture 7: Learning from Massive Datasets

Lecture 1: Semantic Analysis in Language Technology

Lecture 2: Introduction to the Essay Assignment

Lecture 2: Job Opportunities

Lecture 2: From Semantics To Semantic-Oriented Applications

Lecture 3: Structuring Unstructured Texts Through Sentiment Analysis

Lecture 2 Basic Concepts in Machine Learning for Language Technology

Lecture 3 Probability Theory

Lecture 1 introduction To The Course: The Flipped Classroom

Lecture 4: Statistical Inference

Lecture 5: Bayesian Classification

Lecture 6: Hidden Variables and Expectation-Maximization

Lecture 7: Hidden Markov Models (HMMs)

Lecture 8: Decision Trees & k-Nearest Neighbors

Lecture 9 Perceptron

Lecture 10: SVM and MIRA

Lecture11 logistic regression

Lecture 2: Computational Semantics

Documents

Towards a Quality Assessment of Web Corpora for Language Technology Applications

Likes

Il Booktrailer

Analytics Education in the era of Big Data

Evaluating Search Engines