Stockholm, Sweden Sweden

Computational Linguist, PhD

Technology / Software / Internet

www.forum.santini.se/
I am a computational linguist with a strong interest in textual and linguistic features, machine learning and intensive textual data processing. My personal challenge is to extract "contextualized" information from big unstructured textual data leveraging on the concept of "genre". The word "genre" means "type of text". Nowadays all kinds of businesses, enterprises and customer care services produce huge amount of data in the form of many different "genres", i.e. emails, memos, notes from call-centers, news, user groups, chats, reports, tweets, Facebook pages, blogs, forums, marketing material and so on. All these textual genres contain valuable but unstructured data. The exploitation of ...

machine learning
language technology
supervised classification
computational semantics
weka
decision trees
nlp
sentiment analysis
supervised machine learning
semantic analysis in language technology
uppsala university
entropy
noise
svm
perceptron
gain ratio
information gain
divide and conquer
logistic regression
text analytics
genre
marina santini
corpus evaluation
web corpora
domain-specific
semantics in language technology
text mining
mesh
wordnet
pointwise mutual information
naive bayes baseline algorithm
selectional restrictions
evaluation
crossvalidation
induction
rules
formal languages
automata
wordle
tag clouds
word clouds
description logics
rdf
owl
semantic web
thematic roles
semantic roles
predicate-argument structure
unification
sampling
smoothing
independence
statistical inference
flipped classroom
conditional probability
axioms of probability
margin
training set
inductive bias
structured data
opinion mining
unstructured data
semantics
formal semantics
lexical semantics
semantic analysis
semi-supervised learning
dependency parsing
mira
best split
similarity
nearest neighbors
pruning
supervised learning
clustering
emotion
query log analysis
automatic genre identification
events
kendall correlation coefficient
mann-withney-wilcoxon test
kullback– leibler divergence
log-likelihood
burstiness
domainhood
terminology extraction
corpus quality
lay-specialized sublanguage
web corpus
ecare
domain
ward’s linkage
agglomerative hierarchical clustering
unsupervised machine learning
swedish-umeå corpus (suc)
readability
distortion
star forest
cycle cover
automatic folksonomy construction
social tagging
folksonomy
aspect ratio
area utilization
realized adjacencies
inflate and push
cpewcv
context-preserving word cloud visualisation
running time
compactness
quantitative metrics
seam carving
iri
sparql
ontology learning
classes
relations
webprotege
tree of porphyry
ontology
shared understanding
web 3.0
tags
dls
shared semantic annotation
single vs. multiple documents
unsupervised content selection
query-focused summarization
abstractive summarization
summarization in question answering
snippets
extractive summarization
recall oriented understudy for gisting evaluation
rouge
topic signature-based content selection
abstracting
bootstrapping
ace
hand-written patterns
databases of relations
freebase
unsupervised learning from the web
dbpedia
relation extractors
knowledge graph
distant supervision
narrative questions
passage retrieval
hybrid approaches
wolframalpha
mean reciprocal rank
ir-based question answering
factoid questions
mrr
apple's siri
answer type taxonomy
complex questions
ibm's watson
knowledge-based approaches
ir-based approaches
word shapes
standard evaluation per entity
calendaring
e-discovery
sequence labeling
information extraction
sequence classifier
standard evaluation per token
named entity recognition
ner
cosine metric
ppmi
zellig harris
distributional models
positive pointwise mutual information
vectors
joint probability
cosine similarity measure
pmi
john rupert firth
marginals
dot product
term-context matrix
information content
simplified lesk
extended lesk
word sense disambiguation
elesk
michael lesk
path-based similarity
lesk algorithm
supervised methods
surprisal
dictionary-based methods
semcor
lin method
resnik method
thesaurus-based methods
wsd
graph-based methods
word relatedness
word similarity
corpus lesk
hyponymy
zeugma test
meronymy
metonymy
wordform
synonmy
part-whole meronymy
babelnet
antonomy
polysemy
lemma
membership meronymy
senseval
hypernymy
homonymy
word senses
manually-built sentiment lexicons
general inquirer
learning sentiment lexicons
semi-supervised methods
sentiment lexicons
sentiment mining
likelihood
sentiwordnet
connotational aspects
emotion classification
affetctive meaning
turney algorithm
mutual information
scherer typology
sentiment lexica
semantic role labeling
shallow semantic representation
shallow semantics
propbank
framenet
semantic role labelling
propositional logic
connotation
computational semantcs.
predicate logic
first-order logic
meaning representation
logic
formal theories
denotation
logic and language
leave-one-out
bootstrap
theoretical modelling
unbalanced data
representation
holdout estimation
real-world implementations
multiclass classification
counting the cost
t-test
recall-precision curves
loss functon
cost-sensitive measures
lift charts
k-statistic
occam's razor
roc curves
confidence interval for the mean
confidence interval for proportion
z critical value
confidence level
interval estimation
multiplier
inferential statistics
standard error
confidence interval
attribute selection
machine leaning
constructing decision trees
suprisal
loss function
inductive bias of the decision tree
greediness
empirical error induction
expected loss
development set
test set
precision
accuracy
hyperparameters
confusion matrix
stratification
recall
parameters
leave one out
f-measure
induction pipeline
measures of central tendency
sparse data
mode
arff format
instances
data
measures of dispersion
median
mean
outliers
population
features
normal distribution
sample
attributes
missing data
concepts
test data
elements of machine learning
inference algorithms
overfitting
learning algorithms
training data
underfitting
generalization
machine learning models
deduction
plagiarism
hybrid teaching/learning model
cheating
scalable platform
cooperation
examination
multiplication rule
marginal probability
bayes law
probability theorems
probability therory
addition rule
terminals
backus-naur form
cfgs
phrase structure grammars
context-free grammars
non-terminals
finite state machines
regular expressions
pumping lemma
regular languages
deterministic
non-deterministic
fsa
finite state automata
meaningful adjacencies
semantically-related words
layout
evaluation criteria
dissimilarity
quantitative evaluation
semantic word clouds
ontologies
roles semantic role labelling
lamba calculus
topic models
latent semantic analysis
formal and computational representations
the semantics of first-order logic
description logics & the web ontology language
distributional semantics
event representations
corpus-based approaches
compositionality
minimum error
max log-likelihood
max margin
support vectors machines
margin infused relaxed algorithm
maximizing margin
the norm
margin and separability
feature representation
main theorem
k-nearest neighbors
distance metric
variance
hypothesis testing
maximum likelihood estimation (mle)
expectations
z-test
conditional probabilities
estimation
joint probabilities
frequency functions
stochastic variables
problems for hmms
markov assumptions
smoothing for pos tagging
algorithms for hmms
pos tagging with hmms
hidden markov models (hmms)
em for naive bayes
hidden and latent variables
maximum likelihood estimation
expectation-maximization
naive bayes classifiers
bayesian classification
naive bayes in nlp
instance attributes
spam filtering
probabilities
statistics
learning outcomes
lab sessions
flip teaching
video lectures
notion of probability
independence and incompatibility
sample spaces
theorems of probability
statistical methods and natural language processin
generalization model assessment
unsupervised learning
types of classification
cross-validation
classification in nlp
empirical error
classification
definition of machine learning
reinforcement learning supervised learning
type of machine learning
hypothesis class
regression
affect
natural language processing
affective states
semantic-oriented applications
professional profile
job title
job
peer reviewing
argumentation
critical thinking
academic writing
topic sentence
representation of meaning
computational lexical semantics
structured prediction
multilinguality
partial supervision
named-entity recognition
meetups
ambiguous supervision
indirect supervision
latent-variable model
incomplete supervision
linguistic structure prediction
multilingual learning
part-of-speech tagging
cross-lingual learning
gavagai
recorded future
ensemble
cascading
boostrap resampling
ensemble learner
stacking
base learner
adaboost
bagging
boosting
voting
structured mira
sequence tagging
structured perceptron
conditional random fields
structured svms
support vector machine
classifiers
k-nn
machine learning workbench
statistical software
svms
logistic regression/maximum entropy
eager learning
lazy learning
modified value difference metric
distance
overlap measure
unsupervised classification
big data
strata
business intelligence
hadhoop
information discovery
r
actionable intelligence
customer analytics
crisis analysis
stefan th. gries
big textual data
query logs
cyberemotions
sentistrength
swedish
italian
findwise
information architecture
search
query log
actionable information
contextualized information
agi
news
products
venues
geographical information

## Presentations

### Il Booktrailer

ludam
•
15 years ago

### Analytics Education in the era of Big Data

Gregory Piatetsky-Shapiro
•
11 years ago

### Evaluating Search Engines

Ramzi Alqrainy
•
12 years ago

