Thoughts on the feasibility of an Assemblathon 3 contestKeith Bradnam
A *draft* version of a talk presented at the 2015 Genome 10K meeting. These are slides I prepared for my PI (Ian Korf) to use. The final version of the talk may differ substantially to what is shown here.
This talk sets out some ideas as to what was bad about the Assemblathon 2 contest and how we could learn from this should there be an Assemblathon 3 contest.
What's in a name? Better vocabularies = better bioinformatics?Keith Bradnam
Most of the pain and suffering that occurs in bioinformatics happens when database identifier 'A' in file 1, doesn't quite match database identifier 'B' in file 2...even when they are supposed to be the same identifier.
Things don't always match up for a number of reasons, most of which *should* be under our control. This talk covers a few points relating to this and briefly discusses how we should all be using curated ontologies to describe our data.
Thoughts on the feasibility of an Assemblathon 3 contestKeith Bradnam
A *draft* version of a talk presented at the 2015 Genome 10K meeting. These are slides I prepared for my PI (Ian Korf) to use. The final version of the talk may differ substantially to what is shown here.
This talk sets out some ideas as to what was bad about the Assemblathon 2 contest and how we could learn from this should there be an Assemblathon 3 contest.
What's in a name? Better vocabularies = better bioinformatics?Keith Bradnam
Most of the pain and suffering that occurs in bioinformatics happens when database identifier 'A' in file 1, doesn't quite match database identifier 'B' in file 2...even when they are supposed to be the same identifier.
Things don't always match up for a number of reasons, most of which *should* be under our control. This talk covers a few points relating to this and briefly discusses how we should all be using curated ontologies to describe our data.
This was a talk given on 2014-06-19 for the Genome Center’s Bioinformatics Core as part of a 1 week workshop on using Galaxy. It concerns the Assemblathon projects as well as other aspects relating to genome assembly.
A version of this talk is also available on Slideshare with embedded notes.
Note, this is an evolving talk. There are older and newer versions of the talk also available on slideshare.
This was a talk given on 2014-09-17 for the Genome Center’s Bioinformatics Core as part of a 1 week workshop. It concerns the Assemblathon projects as well as other aspects relating to genome assembly.
A version of this talk is also available on Slideshare with embedded notes.
Note, this is an evolving talk. There are older and newer versions of the talk also available on slideshare.
Genome assembly: then and now — with notes — v1.1Keith Bradnam
This was a talk given on 2014-06-19 for the Genome Center’s Bioinformatics Core as part of a 1 week workshop on using Galaxy. It concerns the Assemblathon projects as well as other aspects relating to genome assembly.
A version of this talk is also available on Slideshare without embedded notes.
Note, this is an evolving talk. There are older and newer versions of the talk also available on slideshare.
An update version of the genome assembly including the mention of techniques such as HiC and Bionano. Also include the QC. These are the same slides used in the course for the UNL in Argentina.
How to sequence a large eukaryotic genome - and how we sequenced the cod genome. A seminar I gave for the Computational Life Science (Univ. of Oslo) seminar series, September 28, 2011
Healthcare Costs And Performance in the OECDAlex Rascanu
The 'Healthcare Costs And Performance in the OECD' presentation was initially given to the University of Toronto class of Economics for Public Management – Expenses, in June 2009. For more information on the author:
http://www.rascanu.com
http://www.twitter.com/alexrascanu
This was a talk given on 2014-06-19 for the Genome Center’s Bioinformatics Core as part of a 1 week workshop on using Galaxy. It concerns the Assemblathon projects as well as other aspects relating to genome assembly.
A version of this talk is also available on Slideshare with embedded notes.
Note, this is an evolving talk. There are older and newer versions of the talk also available on slideshare.
This was a talk given on 2014-09-17 for the Genome Center’s Bioinformatics Core as part of a 1 week workshop. It concerns the Assemblathon projects as well as other aspects relating to genome assembly.
A version of this talk is also available on Slideshare with embedded notes.
Note, this is an evolving talk. There are older and newer versions of the talk also available on slideshare.
Genome assembly: then and now — with notes — v1.1Keith Bradnam
This was a talk given on 2014-06-19 for the Genome Center’s Bioinformatics Core as part of a 1 week workshop on using Galaxy. It concerns the Assemblathon projects as well as other aspects relating to genome assembly.
A version of this talk is also available on Slideshare without embedded notes.
Note, this is an evolving talk. There are older and newer versions of the talk also available on slideshare.
An update version of the genome assembly including the mention of techniques such as HiC and Bionano. Also include the QC. These are the same slides used in the course for the UNL in Argentina.
How to sequence a large eukaryotic genome - and how we sequenced the cod genome. A seminar I gave for the Computational Life Science (Univ. of Oslo) seminar series, September 28, 2011
Healthcare Costs And Performance in the OECDAlex Rascanu
The 'Healthcare Costs And Performance in the OECD' presentation was initially given to the University of Toronto class of Economics for Public Management – Expenses, in June 2009. For more information on the author:
http://www.rascanu.com
http://www.twitter.com/alexrascanu
Luis presented to Brazilian law firm Peixoto e Cury Advogados on April 12, 2012, in Sao Paulo, Brazil. Luis discussed the background of the Foreign Corrupt Practices Act, along with the rules, regulations and sanctions.
On October 22, 2016, the legal ethics team of Kegler Brown presented a professional responsibility CLE seminar at Cleveland-Marshall College of Law. Along side our lawyers during a panel discussion was the Honorable Judge Joan Synenberg, who gave important insight and a new perspective when dealing with professional conduct.
Genome assembly: the art of trying to make one big thing from millions of ver...Keith Bradnam
A talk about genome assembly. Largely aimed at people new to the field, this slide deck is an updated version of a talk that I first gave last year and which I recently presented as part of a UC Davis Bioinformatics Core training workshop.
Author: Keith Bradnam, Genome Center, UC Davis
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Data analysis & integration challenges in genomicsmikaelhuss
Presentation given at the Genomics Today and Tomorrow event in Uppsala, Sweden, 19 March 2015. (http://connectuppsala.se/events/genomics-today-and-tomorrow/) Topics include APIs, "querying by data set", machine learning.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxRASHMI M G
Abnormal or anomalous secondary growth in plants. It defines secondary growth as an increase in plant girth due to vascular cambium or cork cambium. Anomalous secondary growth does not follow the normal pattern of a single vascular cambium producing xylem internally and phloem externally.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Nucleic Acid-its structural and functional complexity.
2014 ucl
1. C.Titus Brown
Assistant Professor
MMG, CSE, BEACON
Michigan State University
May 2014
ctb@msu.edu
Large-scale transcriptome sequencing of non-model
organisms: coping mechanisms
2. We practice open science!
Everything discussed here:
Code: github.com/ged-lab/ ; BSD license
Blog: http://ivory.idyll.org/blog (‘titus brown blog’)
Twitter: @ctitusbrown
Grants on LabWeb site: http://ged.msu.edu/research.html
Preprints available.
Everything is > 80% reproducible.
3. We practice open science!
Everything discussed here:
Code: github.com/ged-lab/ ; BSD license
Blog: http://ivory.idyll.org/blog (‘titus brown blog’)
Twitter: @ctitusbrown
Grants on LabWeb site: http://ged.msu.edu/research.html
Preprints available.
Everything is > 80% reproducible by you.
4. The challenges of non-model
transcriptomics
Missing or low quality genome reference.
Evolutionarily distant.
Most extant computational tools focus on model organisms –
Assume low polymorphism (internal variation)
Assume reference genome
Assume somewhat reliable functional annotation
More significant compute infrastructure
…and cannot easily or directly be used on critters of interest.
5. Outline
1. Challenges of non-model transcriptomics.
2. Lamprey: too much data, not enough genome
3. Digital normalization as a coping mechanism
4. …applied to Molgulid ascidians…
5. …and back to lamprey.
6. More transcriptome challenges
7. What’s next? (Implications of free data + free
data analysis.)
6. Sea lamprey in the Great Lakes
Non-native
Parasite of
medium to large
fishes
Caused
populations of
host fishes to
crash
Li Lab /Y-W C-D
7. The problem of lamprey:
Diverged at base of vertebrates; evolutionarily
distant from model organisms.
Large, complicated genome (~2 GB)
Relatively little existing sequence.
We sequenced the liver genome…
8. Lamprey has incomplete genomic sequence
J. Smith et al., PNAS 2009
Evidence of somatic recombination; 100s of
mb of sequence eliminated from genome
during development.
More recent evidence (unpub, J. Smith et
al.) suggests that this loss is
developmentally regulated, results in
changes in gene expression (due to loss of
genes!), and is tissue specific.
Liver genome is not the entire
genome.
9. Lamprey tissues for which we have mRNAseq
embryo stages (late blastula,
gastrula, neurula, 22b, neural-
crest migration, 24c1,24c2)
metamorphosis 3 (intestine,
kidney)
ovulatory female head skin
adult intestine
metamorphosis 4 (intestine,
kidney)
preovulatory female eye
adult kidney
metamorphosis 5 (liver, intestine,
kidney)
preovulatory female tail skin
brain paired
metamorphosis 6 (intestine,
kidney)
prespermiating male gill
freshwater (gill, intestine, kidney)
metamorphosis 7 (intestine,
kidney)
mature adult male rope tissue
larval (gill, kidney, liver, intestine) monocytes
spermiating male gill
juvenile (intestine, liver, kidney) brain (0,3,21 dpi)
spermiating male head skin
lips spinal cord (0.3.21 dpi)
supraneural tissue
metamorphosis 1 (intestine,
kidney) spermiating male muscle
small parasite distal intestine,
kidney, proximal intestine
metamorphosis 2 (liver, intestine, salt water (gill, intestine)
10. Assembly
It was the best of times, it was the wor
, it was the worst of times, it was the
isdom, it was the age of foolishness
mes, it was the age of wisdom, it was th
It was the best of times, it was the worst of times, it was the age of
wisdom, it was the age of foolishness
…but for lots and lots of fragments!
12. Main problem (4 years ago):
We have a massive amount of data that
challenges existing computers when we try to
assemble it all together.
13. Solution: Digital normalization
(a computational version of library normalization)
Suppose you have a dilution
factor ofA (10) to B(1). To get
10x of B you need to get 100x
ofA! Overkill!!
This 100x will consume disk
space and, because of errors,
memory.
We can discard it for you…
20. Digital normalization approach
A digital analog to cDNA library normalization, diginorm:
Is single pass: looks at each read only once;
Does not “collect” the majority of errors;
Keeps all low-coverage reads;
Smooths out coverage of sequencing.
=> Enables analyses that are otherwise completely impossible.
21. Evaluating diginorm – how?
Can’t assemble lamprey w/o diginorm; are
results any good & how would we know?
Need comparative data set
…ascidians!
22. Looking at the Molgula…
Putnam et al., 2008,
Nature.Modified from Swalla 2001
24. Tail loss and notochord genes
a) M. oculata b) hybrid (occulta egg x oculata sperm) c) M. occulta
Notochord cells in orange Swalla, B. et al. Science, Vol 274, Issue 5290, 1205-1208 , 15 November 1996
27. Question: does it matter what
assembly pipeline you use? (No)
3
70
25
1
36
13563
35
13
7
4 23 8 1
6
5
Diginorm V/O Raw V/O
Diginorm trinity Raw trinity
Numbers are putative orthologs (reciprocal best hits)
w/Ciona intestinalis,calculated for each assembly.
Elijah Lowe
28. Why Trinity vs Oases?
Trinity is slightly better at picking out isoforms.
Elijah Lowe
30. Transcriptome assembly thoughts
We can (now) assemble really big data sets, and
get pretty good results.
We have lots of evidence (some presented here :)
that some assemblies are not strongly affected by
digital normalization.
(Note: normalization algorithm is now standard
part ofTrinity mRNAseq pipeline.)
31. Transcriptome results - lamprey
Started with 5.1 billion reads from 50 different tissues.
(4 years of computational research, and about 1 month of
compute time, GO HERE)
Ended with:
32. Lamprey transcriptome basic stats
616,000 transcripts (!)
263,000 transcript families (!)
(This seems like a lot.)
33. Lamprey transcriptome basic stats
616,000 transcripts
263,000 transcript families
Only 20436 transcript families have transcripts > 1kb
(compare with mouse: 17331 of 29769 genes are > 1kb)
So, estimation by thumb ~ not that off, for long transcripts.
34. Common vs rare genes
#transcripts
# samples
Camille Scott
35. Can look at transcripts by tissue --
Camille Scott
40. Pathway predictions vary dramatically
depending on data set, annotation
Likit Preeyanon
KEGG pathway
comparison
across several
different gene
annotation sets
for chicken
41. The problem of lopsided gene characterization is
pervasive: e.g., the brain "ignorome"
"...ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression
networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains.
The major distinguishing characteristic between these sets of genes is date of discovery, early
discovery being associated with greater research momentum—a genomic bandwagon effect."
Ref.: Pandey et al. (2014), PLoS One 11, e88889.Slide courtesy Erich Schwarz
42. Practical implications of diginorm
Data is (essentially) free;
For some problems, analysis is now cheaper
than data gathering (i.e. essentially free);
…plus, we can run most of our approaches in
the cloud (per-hour rental compute
resources).
43. 1. khmer-protocols
Effort to provide standard “cheap” assembly
protocols for the cloud.
Entirely copy/paste; ~2-6 days from raw
reads to assembly, annotations, and
differential expression analysis.
Open, versioned, forkable, citable.
(“Don’t bother me unless it doesn’t work.”
Read cleaning
Diginorm
Assembly
Annotation
RSEM differential
expression
45. A few thoughts on our approach…
Explicitly a “protocol” – explicit steps, copy-paste,
customizable.
No requirement for computational expertise or significant
computational hardware.
~1-5 days to teach a bench biologist to use.
$100-150 of rental compute (“cloud computing”)…
…for $1000 data set.
Adding in quality control and internal validation steps.
46. Can we crowdsource bioinformatics?
We already are! Bioinformatics is already a tremendously open and
collaborative endeavor. (Let’s take advantage of it!)
“It’s as if somewhere, out there, is a collection of totally free software
that can do a far better job than ours can, with open, published
methods, great support networks and fantastic tutorials. But that’s
madness – who on Earth would create such an amazing resource?”
-
http://thescienceweb.wordpress.com/2014/02/21/bioinformatics
-software-companies-have-no-clue-why-no-one-buys-their-
products/
47. 2. Data availability is important for
annotating distant sequences
Anything else Mollusc Cephalopod
no similarity
48. Can we incentivize data sharing?
~$100-$150/transcriptome in the cloud
Offer to analyze people’s existing data for free, IFF they open
it up within a year.
See:
• CephSeq white paper.
• “Dead Sea Scrolls & Open MarineTranscriptome Project”
blog post;
50. “Research singularity”
The data a researchers generates in their lab constitutes
an increasingly small component of the data used to reach
a conclusion.
Corollary:The true value of the data an individual investigator
generates should be considered in the context of aggregate data.
Even if we overcome the social barriers and incentivize sharing,
we are, needless to say, not remotely prepared for sharing all
the data.
51.
52. Acknowledgements
Lab members involved Collaborators
Adina Howe (w/Tiedje)
Jason Pell
Arend Hintze
Qingpeng Zhang
Elijah Lowe
Likit Preeyanon
Jiarong Guo
Tim Brom
Kanchan Pavangadkar
Eric McDonald
Camille Scott
Jordan Fish
Michael Crusoe
Leigh Sheneman
Billie Swalla (UW)
Josh Rosenthal (UPR)
Weiming Li, MSU
Ona Bloom (Feinstein),
Jen Morgan (MBL), Joe
Buxbaum (MSSM)
Funding
USDA NIFA; NSF IOS; NIH;
BEACON.
53. Efficient online
counting of k-mers
Trimming reads
on abundance
Efficient De
Bruijn graph
representations
Read
abundance
normalization
Streaming
algorithms for
assembly,
variant calling,
and error
correction
Cloud assembly
protocols
Efficient graph
labeling &
exploration
Data set
partitioning
approaches
Assembly-free
comparison of
data sets
HMM-guided
assembly
Efficient search
for target genes
Currentresearch
(khmer software)