Submit Search
Upload
Text-mining practical
•
Download as PPT, PDF
•
0 likes
•
242 views
Lars Juhl Jensen
Follow
Text-mining practical
Read less
Read more
Science
Report
Share
Report
Share
1 of 71
Download now
Recommended
Text-mining practical
Text-mining practical
Lars Juhl Jensen
Text-mining practical
Text-mining practical
Lars Juhl Jensen
CRISPR-Cas9: The new frontier of Genome Engineering
CRISPR-Cas9: The new frontier of Genome Engineering
St Xaviers
Joining Separate Paradigms: Text Mining & Deep Neural Networks to Character...
Joining Separate Paradigms: Text Mining & Deep Neural Networks to Character...
Melissa Moody
Text mining exercise
Text mining exercise
Lars Juhl Jensen
Text-mining practical
Text-mining practical
Lars Juhl Jensen
Text and data mining
Text and data mining
Lars Juhl Jensen
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
Lars Juhl Jensen
Recommended
Text-mining practical
Text-mining practical
Lars Juhl Jensen
Text-mining practical
Text-mining practical
Lars Juhl Jensen
CRISPR-Cas9: The new frontier of Genome Engineering
CRISPR-Cas9: The new frontier of Genome Engineering
St Xaviers
Joining Separate Paradigms: Text Mining & Deep Neural Networks to Character...
Joining Separate Paradigms: Text Mining & Deep Neural Networks to Character...
Melissa Moody
Text mining exercise
Text mining exercise
Lars Juhl Jensen
Text-mining practical
Text-mining practical
Lars Juhl Jensen
Text and data mining
Text and data mining
Lars Juhl Jensen
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
Lars Juhl Jensen
Network biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and text
Lars Juhl Jensen
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
Protein association networks with STRING
Protein association networks with STRING
Lars Juhl Jensen
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
Medical informatics - Linking diseases, drugs, and adverse reactions
Medical informatics - Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
STRING - Protein networks from data and text mining
STRING - Protein networks from data and text mining
Lars Juhl Jensen
Text mining
Text mining
Lars Juhl Jensen
One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...
Lars Juhl Jensen
Open data and open access - A biomedical data- and text-mining perspective
Open data and open access - A biomedical data- and text-mining perspective
Lars Juhl Jensen
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
Lars Juhl Jensen
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
Lars Juhl Jensen
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
Lars Juhl Jensen
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
Lars Juhl Jensen
STRING & STITCH: Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous data
Lars Juhl Jensen
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
Lars Juhl Jensen
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
Lars Juhl Jensen
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
Lars Juhl Jensen
Cellular networks
Cellular networks
Lars Juhl Jensen
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Lars Juhl Jensen
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
Lars Juhl Jensen
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
Lars Juhl Jensen
More Related Content
Viewers also liked
Network biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and text
Lars Juhl Jensen
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
Protein association networks with STRING
Protein association networks with STRING
Lars Juhl Jensen
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
Medical informatics - Linking diseases, drugs, and adverse reactions
Medical informatics - Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
STRING - Protein networks from data and text mining
STRING - Protein networks from data and text mining
Lars Juhl Jensen
Text mining
Text mining
Lars Juhl Jensen
One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...
Lars Juhl Jensen
Open data and open access - A biomedical data- and text-mining perspective
Open data and open access - A biomedical data- and text-mining perspective
Lars Juhl Jensen
Viewers also liked
(9)
Network biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and text
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Protein association networks with STRING
Protein association networks with STRING
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Medical informatics - Linking diseases, drugs, and adverse reactions
Medical informatics - Linking diseases, drugs, and adverse reactions
STRING - Protein networks from data and text mining
STRING - Protein networks from data and text mining
Text mining
Text mining
One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...
Open data and open access - A biomedical data- and text-mining perspective
Open data and open access - A biomedical data- and text-mining perspective
More from Lars Juhl Jensen
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
Lars Juhl Jensen
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
Lars Juhl Jensen
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
Lars Juhl Jensen
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
Lars Juhl Jensen
STRING & STITCH: Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous data
Lars Juhl Jensen
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
Lars Juhl Jensen
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
Lars Juhl Jensen
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
Lars Juhl Jensen
Cellular networks
Cellular networks
Lars Juhl Jensen
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Lars Juhl Jensen
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
Lars Juhl Jensen
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
Lars Juhl Jensen
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
Cellular Network Biology
Cellular Network Biology
Lars Juhl Jensen
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
Lars Juhl Jensen
More from Lars Juhl Jensen
(20)
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
STRING & STITCH: Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous data
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Cellular Network Biology
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
Recently uploaded
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
muntazimhurra
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
nehabiju2046
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
Natural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
AArockiyaNisha
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
Nistarini College, Purulia (W.B) India
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
PRINCE C P
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Diwakar Mishra
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
Patrick Diehl
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
UmerFayaz5
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Swapnil Therkar
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Nistarini College, Purulia (W.B) India
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
jana861314
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Satoshi NAKAHIRA
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
aarthirajkumar25
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
AArockiyaNisha
Recently uploaded
(20)
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Natural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Text-mining practical
1.
Text-mining practical Lars Juhl
Jensen
2.
unix primer
3.
the command line
4.
some useful commands
5.
cat
6.
less
7.
head -10
8.
tail -10
9.
grep ‘needle’
10.
cut -f 2
11.
sort
12.
sort -nr
13.
uniq -c
14.
redirecting output
15.
write to file
16.
command > filename
17.
using pipes
18.
command1 | command2
19.
putting it all
together
20.
cut -f 4
infile | sort | uniq -c | sort -nr | head -100 > outfile
21.
the task
22.
disease gene finding
23.
named entity recognition
24.
human genes
25.
gene prioritization
26.
what I have
done
27.
information retrieval
28.
two diseases
29.
prostate cancer
30.
schizophrenia
31.
two sets of
documents
32.
82,373 abstracts
33.
89,904 abstracts
34.
one file with
each set
35.
one line per
abstract
36.
dictionary
37.
tab-delimited file
38.
human genes
39.
21,929 entities
40.
synonyms
41.
from many databases
42.
orthographic variation
43.
prefixes and suffixes
44.
automatically generated
45.
2,920,042 names
46.
tagcorpus program
47.
flexible matching
48.
upper- and lower-case
49.
spaces and hyphens
50.
tab-delimited output
51.
what you will
do
52.
named entity recognition
53.
find unfortunate names
54.
create “black list”
55.
information extraction
56.
co-mentioning
57.
within abstracts
58.
ank genes for
each disease
59.
find shared gene
60.
61.
wrap up
62.
Protein kinase B
63.
PKB
64.
Akt
65.
AKT1
66.
same protein
67.
synonyms matter
68.
“black list” is
crucial
69.
text mining is
useful
70.
not black magic
71.
Thanks for your
attention
Download now