- Leukemia
- Severe combined
immunodeficiency
A sample (ESA) - Cancer
-Non-Hodgkin lymphoma
The development of T-cell leukaemia - AIDS
following the otherwise successful -ICD-10 Chapter II:
treatment of three patients with X-linked
severe combined immune deficiency (X-
Neoplasms;
SCID) in gene-therapy trials using -Chapter III: Diseases of the
haematopoietic stem cells has led to a re- blood and blood-forming
evaluation of this approach. Using a
mouse model for gene therapy of X-
organs, and certain
SCID, we find that the corrective disorders involving the
therapeutic gene IL2RG itself can act as immune mechanism
a contributor to the genesis of T-cell
lymphomas, with one-third of animals
- Bone marrow transplant
being affected. Gene-therapy trials for X- - Immunosuppressive drug
SCID, which have been based on the - Acute lymphoblastic
assumption that IL2RG is minimally
oncogenic, may therefore pose some risk
leukemia
to patients. - Multiple sclerosis.
1-Glossary_of_cue_sports_terms
A sample (ESA) 2-Swimming,
3-Ian_Thorpe.
4-NCAA_football_bowl_games,
Being so tightly packed, Venice doesn't 2005-06,
make an ideal place to come to practise
5-Swimming_machine,
your favourite sport, although you'll get a
6-American_football_strategy,
decent workout just walking around and
up and down bridges! If you've got any 7-Contract_bridge_glossary,
energy left for some extra exercise, try a 8-Olympic_Games,
spot of swimming (although pools are 9-Pingu_episodes_series_6,
rare) or even a jog. Venice is a bit of a 10-Venice.
desert for swimmers. You can go in off …
the Lido (if you're game) or at one of 15 - Corruption_in_Ghana
Venice's two public swimming pools …
(handily, they close in summer). 27 - Legislative_system_of_the
Lonely Planet Tourist Guide Peopleʼs_Republic_of_China.
Clustering
Wikipedia is hyperlinked
Swimming is clustered with Olympic Games
1-Glossary_of_cue_sports_terms
A sample (ESA) 2-Swimming,
3-Ian_Thorpe.
4-NCAA_football_bowl_games,
Being so tightly packed, Venice doesn't 2005-06,
make an ideal place to come to practise
5-Swimming_machine,
your favourite sport, although you'll get a
6-American_football_strategy,
decent workout just walking around and
up and down bridges! If you've got any 7-Contract_bridge_glossary,
energy left for some extra exercise, try a 8-Olympic_Games,
spot of swimming (although pools are 9-Pingu_episodes_series_6,
rare) or even a jog. Venice is a bit of a 10-Venice.
desert for swimmers. You can go in off …
the Lido (if you're game) or at one of 15 - Corruption_in_Ghana
Venice's two public swimming pools …
(handily, they close in summer). 27 - Legislative_system_of_the
Lonely Planet Tourist Guide Peopleʼs_Republic_of_China.
Throw away:
Large aggregators
Category links
Numbers
Pages with more than (N=100) links
After clustering:
only 3 clusters with cardinality larger than 1.
The first cluster, with cardinality 21, was
automatically named Swimming.
The second and the third both have cardinality
equal to 2, and they are named Training and
Venice-bucentaur.
Which one is
machine -generated?
Validation: Turing test
Classification
Text Classification
Classification
20 texts of length
Outcome ranging between 60
and 200 words. Texts
were collected from
various sources like
newspaper articles,
text books, random
web pages, MSN
Encarta.
Further improvements
Using only nouns
Using a POS Tagger to identify syntactic
roles in document to be classified
Keep only names (throw away the rest)
No degradation in the results!
Define Multiwords
Lexical multiword identification approach:
The following generative pattern is considered
((Adj∣Noun) + ∣((Adj∣Noun) ∗ (Noun Prep)?)
(Adj∣Noun)∗)Noun
+: One or more *: Zero or more ?: Zero or one ∣: Or
Validation: A candidate multiword is valid if there
is a Wikipedia entry related to it.
Text with multiwords:
Keep all nouns
Keep all adjectives that are part of a
multiword
Evaluation (human inspection of
results)
100 samples (50 technical, 50 generic)
Multiword improved significanty 7 (5 technical)
It improved marginally 13
It worsened marginally 6
Overall improvement: 10/% on technical text
Work in progress
Concept-mediated mapping
among documents
How similar are two docs?
Jaccard Index
Concept 1
Concept 2 Concept 2
Doc 1 Doc 3
Concept 3 Concept 3
Concept 4
Syllabi comparison
Inter
links
Mapping documents in different
languages
Deploying Wikipedia Interlinks
Jaccard Index
Concept 1
Concept 2 Concept 2
Doc 1 Doc 3
Concept 3 Concept 3
INTERLINKS Concept 4
In this paper we present an algorithm that, using W more
In this paper we present an algorithm that, using Wikipedia as a reference, extracts semantic information from an arbitrary text. Our algorithm refines a procedure proposed by others, which mines all the text contained in the whole Wikipedia. Our refinement, based on a clustering approach, exploits the semantic information contained in certain types of Wikipedia hyperlinks, and also introduces an analysis based on multi-words. Our algorithm outperforms current methods in that the output contains many less false positives. We were also able to understand which (structural) part of the texts provides most of the semantic information extracted by the algorithm. less
0 comments
Post a comment