Check out what Rockstar Intern Sean La https://www.linkedin.com/in/sean-la-718b60103/?ppe=1
made after being at NCBI only 10 days (with a little help from our friends)!
Bioinformatics is the application of statistics and computer science to biology. It is used for DNA sequencing and comparing genes and genomes between different or similar organisms. The document describes a bioinformatics workshop where students sequenced chicken and plant DNA using an NIH website to compare their genomes and answer questions. Bioinformatics is widely used in biology and genetics for sequencing DNA and comparing samples, as well as sequencing proteins and modeling protein structures.
Sazid Ibna Zaman successfully completed The Data Scientist's Toolbox course from Johns Hopkins University on Coursera with distinction in July 2015. The course provided an overview of the conceptual ideas and practical tools used by data analysts and scientists, including version control, markdown, git, GitHub, R, and RStudio. The course was offered by Jeffrey Leek, Roger Peng, and Brian Caffo of Johns Hopkins Bloomberg School of Public Health.
Bio 410 bio410 bio 410 education for service uopstudy.comUOPCourseHelp
This document outlines the coursework for BIO 410 Genetics over 5 weeks. It includes assignments such as a worksheet on cell division and chromosomes, developing an informational brochure about genetics for a museum opening, a paper on gene expression and cancer formation, a group presentation on a genetic disorder, writing a speech for the museum opening about genetics and behavior, and a team assignment using bioinformatics tools to identify gene sequences. Students will learn about topics like DNA, RNA, replication, transcription, translation, epigenetics, genetic analysis techniques, and results from the Human Genome Project.
This document contains definitions and information about genetic engineering, DNA fingerprinting, clones, and the human genome from various websites. It defines genetic engineering as changes made to DNA to change an organism, DNA fingerprinting as the analysis of DNA samples to identify individuals, clones as identical copies of an organism, and the human genome as the total number of genes found in humans, which is estimated to be between 20,000 and 25,000 genes. The document provides short explanations and cites multiple online sources for each term.
The document summarizes the Encyclopedia of DNA Elements (ENCODE) project. It describes ENCODE as a follow-up to the Human Genome Project that aims to identify all functional elements in the human genome, including regions that regulate genes. The document outlines the phases of the project and some of the high-throughput techniques used, such as ChIP-seq, DNase-seq, and MNase-seq. It also discusses how the data from ENCODE is being utilized and the future plans to expand the project.
This document provides a summary of Aaron M Bender's background and experience. It includes his contact information, educational background which includes a PhD in Molecular Biology from the University of Wyoming, and extensive research experience including positions at ArcherDX, the University of Kansas Molecular Probes Core Laboratory, the University of Kansas, and the Mayo Clinic where he conducted research in areas such as cancer genetics, chemical biology, next generation sequencing, and the use of model organisms like C. elegans. He has over 15 publications in peer-reviewed journals.
This document provides an outline for a library research session on finding scholarly sources for a geology course. It discusses how to find articles and books using the library catalog and research databases like GeoRef and Environment Complete. It also covers peer review, evaluating authoritative sources, developing effective search strategies using keywords and Boolean operators, and citing sources using EndNote. The document is intended to help students learn skills for conducting academic research in geology.
Andre Dewanto is seeking a full-time position applying his knowledge and skills in academic research. He has a Bachelor of Science degree in Environmental Systems from the University of California San Diego with a 3.52 GPA. His experience includes research at UCLA Pathology detecting biomarkers for early cancer detection and at UCSD isolating marine bacteria. He managed personnel, supplies, and records as a research associate and coordinated outreach events as an honor society officer.
Bioinformatics is the application of statistics and computer science to biology. It is used for DNA sequencing and comparing genes and genomes between different or similar organisms. The document describes a bioinformatics workshop where students sequenced chicken and plant DNA using an NIH website to compare their genomes and answer questions. Bioinformatics is widely used in biology and genetics for sequencing DNA and comparing samples, as well as sequencing proteins and modeling protein structures.
Sazid Ibna Zaman successfully completed The Data Scientist's Toolbox course from Johns Hopkins University on Coursera with distinction in July 2015. The course provided an overview of the conceptual ideas and practical tools used by data analysts and scientists, including version control, markdown, git, GitHub, R, and RStudio. The course was offered by Jeffrey Leek, Roger Peng, and Brian Caffo of Johns Hopkins Bloomberg School of Public Health.
Bio 410 bio410 bio 410 education for service uopstudy.comUOPCourseHelp
This document outlines the coursework for BIO 410 Genetics over 5 weeks. It includes assignments such as a worksheet on cell division and chromosomes, developing an informational brochure about genetics for a museum opening, a paper on gene expression and cancer formation, a group presentation on a genetic disorder, writing a speech for the museum opening about genetics and behavior, and a team assignment using bioinformatics tools to identify gene sequences. Students will learn about topics like DNA, RNA, replication, transcription, translation, epigenetics, genetic analysis techniques, and results from the Human Genome Project.
This document contains definitions and information about genetic engineering, DNA fingerprinting, clones, and the human genome from various websites. It defines genetic engineering as changes made to DNA to change an organism, DNA fingerprinting as the analysis of DNA samples to identify individuals, clones as identical copies of an organism, and the human genome as the total number of genes found in humans, which is estimated to be between 20,000 and 25,000 genes. The document provides short explanations and cites multiple online sources for each term.
The document summarizes the Encyclopedia of DNA Elements (ENCODE) project. It describes ENCODE as a follow-up to the Human Genome Project that aims to identify all functional elements in the human genome, including regions that regulate genes. The document outlines the phases of the project and some of the high-throughput techniques used, such as ChIP-seq, DNase-seq, and MNase-seq. It also discusses how the data from ENCODE is being utilized and the future plans to expand the project.
This document provides a summary of Aaron M Bender's background and experience. It includes his contact information, educational background which includes a PhD in Molecular Biology from the University of Wyoming, and extensive research experience including positions at ArcherDX, the University of Kansas Molecular Probes Core Laboratory, the University of Kansas, and the Mayo Clinic where he conducted research in areas such as cancer genetics, chemical biology, next generation sequencing, and the use of model organisms like C. elegans. He has over 15 publications in peer-reviewed journals.
This document provides an outline for a library research session on finding scholarly sources for a geology course. It discusses how to find articles and books using the library catalog and research databases like GeoRef and Environment Complete. It also covers peer review, evaluating authoritative sources, developing effective search strategies using keywords and Boolean operators, and citing sources using EndNote. The document is intended to help students learn skills for conducting academic research in geology.
Andre Dewanto is seeking a full-time position applying his knowledge and skills in academic research. He has a Bachelor of Science degree in Environmental Systems from the University of California San Diego with a 3.52 GPA. His experience includes research at UCLA Pathology detecting biomarkers for early cancer detection and at UCSD isolating marine bacteria. He managed personnel, supplies, and records as a research associate and coordinated outreach events as an honor society officer.
The document discusses recombinant DNA technology and the Human Genome Project. Some key points:
- The Human Genome Project, started in 1990 and completed in 2003, mapped the entire human genome sequence to further understand human genetics and hereditary disease.
- Recombinant DNA technology allows DNA from one organism to be cut and combined with DNA from another, and is used to produce medicines like human insulin in bacteria.
- Understanding an individual's genetic makeup through projects like the Human Genome Project could help develop personalized medicines tailored to each person's genetics.
- Nutrigenomics studies how different foods may affect gene expression and disease risk depending on a person's genetics. Understanding these interactions could help prevent chronic diseases.
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Jonathan Eisen
The document discusses Jonathan Eisen's work as a microbiology professor at UC Davis. It provides an overview of his research topics, which include microbial phylogenomics and evolvability, phylogenetic methods and tools, and using phylogenomics to study microbial communities and interactions between microbes and hosts under stress. The document also acknowledges collaborators and funding sources for Eisen's research over the years.
The document discusses the Human Genome Project, which was a research program where scientists around the world tried to map the entire human genome and locate genes. Scientists would isolate DNA samples and use gene mapping techniques like DNA markers to determine where certain genes, like disease genes, were located on the genome. The goals of the project were to complete a detailed genetic map of the human genome, obtain the full genome as clones, determine the complete DNA sequence, and find all human genes.
The Human Genome Project was a 15-year scientific effort that mapped the entire human genome. It was primarily funded by governments in the US, UK, Japan, and other countries and cost $3 billion total. The project successfully identified the locations of all genes within human DNA and provided insights that enable genetically modifying crops, locating cancer cells, and diagnosing genetic diseases prenatally. Key techniques included genetic mapping to locate gene pairs on chromosomes and linkage analysis to determine the distance between disease-causing genes. The project's outcomes include further enabling gene therapy and precisely locating genes responsible for diseases.
The document discusses applications of DNA technology including the Human Genome Project. The Human Genome Project was a 13-year international project completed in 2003 that mapped and sequenced the entire human genome. Its goals were to identify all human genes, determine the sequence of DNA's 3 billion base pairs, store this information in databases, improve analysis tools, and address ethical issues arising from the research. The project used genetic mapping, physical mapping, and DNA sequencing approaches.
This proposal requests $1.45 million from the NSF to develop a Curator Assistant to help communities annotate the rapidly increasing number of sequenced genomes. As sequencing costs decrease from $1 million to $10,000 per genome, the bottleneck has shifted to functional annotation, which currently relies on human curators. The proposed software will use natural language processing to extract gene functions from literature and suggest annotations to assist community curators with databases for non-model organisms lacking professional curation resources. It will initially focus on arthropod genomes through collaboration with the Arthropod Base Consortium.
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Surya Saha
The document discusses efforts to improve the genome assembly of the Asian citrus psyllid (Diaphorina citri), the insect vector of citrus greening disease. It describes using long read sequencing data from PacBio to generate a new assembly with an N50 of 83kb, a significant improvement over the previous N50 of 34kb. It further discusses additional efforts using technologies like Dovetail scaffolding, 10X Genomics, and optical mapping to further improve scaffolding and resolve haplotypes, with the goal of generating a high-quality reference genome for D. citri.
Computer science plays an important role in biotechnology by enabling the analysis and management of vast amounts of biological and genetic data. Bioinformatics tools allow researchers to gather, store, analyze and integrate various data sources to make new discoveries about gene and protein sequences, structures and functions. These tools include biological databases and software for tasks like sequence alignment, analysis and interpretation of data, and development of algorithms and statistics. The Human Genome Project was a landmark international scientific research project that mapped the human genome with the help of computational analysis and over 3300 billion lines of code.
Three's a crowd-source: Observations on Collaborative Genome AnnotationMonica Munoz-Torres
It is impossible for a single individual to fully curate a genome with precise biological fidelity. Beyond the problem of scale, curators need second opinions and insights from colleagues with domain and gene family expertise, but the communications constraints imposed in earlier applications made this inherently collaborative task difficult. Apollo, a client-side, JavaScript application allowing extensive changes to be rapidly made without server round-trips, placed us in a position to assess the difference this real-time interactivity would make to researchers’ productivity and the quality of downstream scientific analysis. To evaluate this, we trained and supported geographically dispersed scientific communities (hundreds of scientists and agreed-upon gatekeepers, in ~100 institutions around the world) to perform biologically supported manual annotations, and monitored their findings. We observed that: 1) Previously disconnected researchers were more productive when obtaining immediate feedback in dialogs with collaborators. 2) Unlike earlier genome projects, which had the advantage of more highly polished genomes, recent projects usually have lower coverage. Therefore curators now face additional work correcting for more frequent assembly errors and annotating genes that are split across multiple contigs. 3) Automated annotations were improved as exemplified by discoveries made based on revised annotations, for example ~2800 manually annotated genes from three species of ants granted further insight into the evolution of sociality in this group, and ~3600 manual annotations contributed to a better understanding of immune function, reproduction, lactation and metabolism in cattle. 4) There is a notable trend shifting from whole-genome annotation to annotation of specific gene families or other gene groups linked by ecological and evolutionary significance. 5) The distributed nature of these efforts still demand strong, goal-oriented (i.e. publication of findings) leadership and coordination, as these are crucial to the success of each project. Here we detail these and other observations on collaborative genome annotation efforts.
The rapid expansion of global trade and travel has increased the introduction of non-native pathogens. Climate change also influences pathogens directly and indirectly. Sudden Oak Death, caused by the oomycete Phytophthora ramorum, is provided as an example. Accurately identifying pathogen species and populations is critical for risk assessment and disease management, but this presents challenges. There is also limited understanding of global pathogen diversity and limited cooperation on knowledge sharing. The Phytophthora Database was created to address these issues through genetic characterization of isolates and providing analysis tools.
Johns Hopkins University - The Data Scientist's Toolbox - Certificate with Di...Jeff Capaldo
Jeff Capaldo successfully completed The Data Scientist's Toolbox course offered by Johns Hopkins University on Coursera with distinction in April 2015. The course provided an overview of the conceptual ideas and practical tools used by data analysts and scientists, including version control, markdown, git, GitHub, R, and RStudio. The course was instructed by Jeffrey Leek, Roger Peng, and Brian Caffo of the Johns Hopkins Bloomberg School of Public Health.
Marine Host-Microbiome Interactions: Challenges and OpportunitiesJonathan Eisen
This document summarizes a talk given by Jonathan Eisen on marine host-microbiome interactions. It discusses various topics researched in Eisen's lab, including phylogenomic methods and tools, microbial phylogenomics and evolvability, reference data resources, communication in science, and model systems. Specific projects are mentioned, such as automated genome trees, phylogenetic marker genes, the GEBA project, and dark matter microbes. The document then introduces the concept of the host-microbiome stress triangle and gives examples of stress types including nutrient acquisition, pathogens, and environmental change. It concludes by discussing a potential project on seagrass microbiomes in collaboration with Jay Stachowicz's lab.
Beniamin Zahiri-Coursera Data Scientist Toolbox 2015BZahiri
Beniamin Zahirisabzevar successfully completed The Data Scientist's Toolbox course from Johns Hopkins University on Coursera with distinction in February 2015. The course provided an overview of the conceptual ideas and practical tools used by data analysts and scientists, including version control, markdown, git, GitHub, R, and RStudio. The course was offered online through Coursera and did not reflect the entire Johns Hopkins University curriculum.
The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuksJonathan Eisen
This document discusses the need for a phylogeny-driven genomic encyclopedia of eukaryotes. It notes that current sources of information on sequenced eukaryotic genomes, such as Wikipedia and GOLD, are disorganized and lack phylogenetic context. The document then analyzes genomic data from the poorly studied protist Collodictyon using 18S and 28S ribosomal DNA sequencing and phylogenomic trees inferred from 124 genes. The analysis finds that Collodictyon is closely related to Diphylleia and occupies a very early divergence in the eukaryote tree of life, either alone or as the sister group to Malawimonas. This suggests Collodictyon represents an important new lineage for
Containerized attribute indexing and graph genomes for federated data accessBen Busby
This document discusses federating genomic sequencing data across cloud platforms for discovery. It presents four topic areas as examples: 1) virus characterization and discovery through building an index of viral signatures, 2) generating systems to analyze genome graphs to compare individuals to communities, 3) annotating haplotypes and graphs to query complex disease, and 4) indexing data flexibly for federated discovery anywhere through APIs. It emphasizes that metadata is needed to contextualize data and maximize its utility for answering biological questions.
Talk given at this conference covering prototypes involving ML, as well as data indexing hackathons capturing community consensus and where we are going with this.
Ben Busby is the founder of the Department of Bioinformatics and Data Science at the National Institutes of Health. He discusses hosting numerous data hackathons in 2019 focused on topics like RNA sequencing, human pan-genomics, variant datasets, computational medicine, prokaryotic virulence factors, single cell genomics, and structural variants. The hackathons bring researchers together to collaborate on bioinformatics tools and resources. Busby emphasizes the importance of open communication and creating a community to advance genomic and biomedical research.
Ben Busby is the founder of the Department of Bioinformatics and Data Science at NIH. He discusses several hackathon projects focused on analyzing RNAseq data and developing machine learning applications for genomics. These include tools for polygenic SNP search, genomic robots, and phenogenomics. Upcoming hackathons are planned on topics like RNAseq, human pan-genomics, variant datasets, and virus discovery. The goal is to create an open community around bioinformatics and data science.
Ben Busby is the founder of the Department of Bioinformatics and Data Science at NIH. He discusses several bioinformatics tools and resources developed during NIH hackathons, including tools for polygenic SNP search, RNAseq analysis, and virus discovery. Upcoming hackathon events focus on RNAseq, human pan-genomics, variant datasets, and more. The goal is to create an open bioinformatics community.
The document discusses recombinant DNA technology and the Human Genome Project. Some key points:
- The Human Genome Project, started in 1990 and completed in 2003, mapped the entire human genome sequence to further understand human genetics and hereditary disease.
- Recombinant DNA technology allows DNA from one organism to be cut and combined with DNA from another, and is used to produce medicines like human insulin in bacteria.
- Understanding an individual's genetic makeup through projects like the Human Genome Project could help develop personalized medicines tailored to each person's genetics.
- Nutrigenomics studies how different foods may affect gene expression and disease risk depending on a person's genetics. Understanding these interactions could help prevent chronic diseases.
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Jonathan Eisen
The document discusses Jonathan Eisen's work as a microbiology professor at UC Davis. It provides an overview of his research topics, which include microbial phylogenomics and evolvability, phylogenetic methods and tools, and using phylogenomics to study microbial communities and interactions between microbes and hosts under stress. The document also acknowledges collaborators and funding sources for Eisen's research over the years.
The document discusses the Human Genome Project, which was a research program where scientists around the world tried to map the entire human genome and locate genes. Scientists would isolate DNA samples and use gene mapping techniques like DNA markers to determine where certain genes, like disease genes, were located on the genome. The goals of the project were to complete a detailed genetic map of the human genome, obtain the full genome as clones, determine the complete DNA sequence, and find all human genes.
The Human Genome Project was a 15-year scientific effort that mapped the entire human genome. It was primarily funded by governments in the US, UK, Japan, and other countries and cost $3 billion total. The project successfully identified the locations of all genes within human DNA and provided insights that enable genetically modifying crops, locating cancer cells, and diagnosing genetic diseases prenatally. Key techniques included genetic mapping to locate gene pairs on chromosomes and linkage analysis to determine the distance between disease-causing genes. The project's outcomes include further enabling gene therapy and precisely locating genes responsible for diseases.
The document discusses applications of DNA technology including the Human Genome Project. The Human Genome Project was a 13-year international project completed in 2003 that mapped and sequenced the entire human genome. Its goals were to identify all human genes, determine the sequence of DNA's 3 billion base pairs, store this information in databases, improve analysis tools, and address ethical issues arising from the research. The project used genetic mapping, physical mapping, and DNA sequencing approaches.
This proposal requests $1.45 million from the NSF to develop a Curator Assistant to help communities annotate the rapidly increasing number of sequenced genomes. As sequencing costs decrease from $1 million to $10,000 per genome, the bottleneck has shifted to functional annotation, which currently relies on human curators. The proposed software will use natural language processing to extract gene functions from literature and suggest annotations to assist community curators with databases for non-model organisms lacking professional curation resources. It will initially focus on arthropod genomes through collaboration with the Arthropod Base Consortium.
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Surya Saha
The document discusses efforts to improve the genome assembly of the Asian citrus psyllid (Diaphorina citri), the insect vector of citrus greening disease. It describes using long read sequencing data from PacBio to generate a new assembly with an N50 of 83kb, a significant improvement over the previous N50 of 34kb. It further discusses additional efforts using technologies like Dovetail scaffolding, 10X Genomics, and optical mapping to further improve scaffolding and resolve haplotypes, with the goal of generating a high-quality reference genome for D. citri.
Computer science plays an important role in biotechnology by enabling the analysis and management of vast amounts of biological and genetic data. Bioinformatics tools allow researchers to gather, store, analyze and integrate various data sources to make new discoveries about gene and protein sequences, structures and functions. These tools include biological databases and software for tasks like sequence alignment, analysis and interpretation of data, and development of algorithms and statistics. The Human Genome Project was a landmark international scientific research project that mapped the human genome with the help of computational analysis and over 3300 billion lines of code.
Three's a crowd-source: Observations on Collaborative Genome AnnotationMonica Munoz-Torres
It is impossible for a single individual to fully curate a genome with precise biological fidelity. Beyond the problem of scale, curators need second opinions and insights from colleagues with domain and gene family expertise, but the communications constraints imposed in earlier applications made this inherently collaborative task difficult. Apollo, a client-side, JavaScript application allowing extensive changes to be rapidly made without server round-trips, placed us in a position to assess the difference this real-time interactivity would make to researchers’ productivity and the quality of downstream scientific analysis. To evaluate this, we trained and supported geographically dispersed scientific communities (hundreds of scientists and agreed-upon gatekeepers, in ~100 institutions around the world) to perform biologically supported manual annotations, and monitored their findings. We observed that: 1) Previously disconnected researchers were more productive when obtaining immediate feedback in dialogs with collaborators. 2) Unlike earlier genome projects, which had the advantage of more highly polished genomes, recent projects usually have lower coverage. Therefore curators now face additional work correcting for more frequent assembly errors and annotating genes that are split across multiple contigs. 3) Automated annotations were improved as exemplified by discoveries made based on revised annotations, for example ~2800 manually annotated genes from three species of ants granted further insight into the evolution of sociality in this group, and ~3600 manual annotations contributed to a better understanding of immune function, reproduction, lactation and metabolism in cattle. 4) There is a notable trend shifting from whole-genome annotation to annotation of specific gene families or other gene groups linked by ecological and evolutionary significance. 5) The distributed nature of these efforts still demand strong, goal-oriented (i.e. publication of findings) leadership and coordination, as these are crucial to the success of each project. Here we detail these and other observations on collaborative genome annotation efforts.
The rapid expansion of global trade and travel has increased the introduction of non-native pathogens. Climate change also influences pathogens directly and indirectly. Sudden Oak Death, caused by the oomycete Phytophthora ramorum, is provided as an example. Accurately identifying pathogen species and populations is critical for risk assessment and disease management, but this presents challenges. There is also limited understanding of global pathogen diversity and limited cooperation on knowledge sharing. The Phytophthora Database was created to address these issues through genetic characterization of isolates and providing analysis tools.
Johns Hopkins University - The Data Scientist's Toolbox - Certificate with Di...Jeff Capaldo
Jeff Capaldo successfully completed The Data Scientist's Toolbox course offered by Johns Hopkins University on Coursera with distinction in April 2015. The course provided an overview of the conceptual ideas and practical tools used by data analysts and scientists, including version control, markdown, git, GitHub, R, and RStudio. The course was instructed by Jeffrey Leek, Roger Peng, and Brian Caffo of the Johns Hopkins Bloomberg School of Public Health.
Marine Host-Microbiome Interactions: Challenges and OpportunitiesJonathan Eisen
This document summarizes a talk given by Jonathan Eisen on marine host-microbiome interactions. It discusses various topics researched in Eisen's lab, including phylogenomic methods and tools, microbial phylogenomics and evolvability, reference data resources, communication in science, and model systems. Specific projects are mentioned, such as automated genome trees, phylogenetic marker genes, the GEBA project, and dark matter microbes. The document then introduces the concept of the host-microbiome stress triangle and gives examples of stress types including nutrient acquisition, pathogens, and environmental change. It concludes by discussing a potential project on seagrass microbiomes in collaboration with Jay Stachowicz's lab.
Beniamin Zahiri-Coursera Data Scientist Toolbox 2015BZahiri
Beniamin Zahirisabzevar successfully completed The Data Scientist's Toolbox course from Johns Hopkins University on Coursera with distinction in February 2015. The course provided an overview of the conceptual ideas and practical tools used by data analysts and scientists, including version control, markdown, git, GitHub, R, and RStudio. The course was offered online through Coursera and did not reflect the entire Johns Hopkins University curriculum.
The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuksJonathan Eisen
This document discusses the need for a phylogeny-driven genomic encyclopedia of eukaryotes. It notes that current sources of information on sequenced eukaryotic genomes, such as Wikipedia and GOLD, are disorganized and lack phylogenetic context. The document then analyzes genomic data from the poorly studied protist Collodictyon using 18S and 28S ribosomal DNA sequencing and phylogenomic trees inferred from 124 genes. The analysis finds that Collodictyon is closely related to Diphylleia and occupies a very early divergence in the eukaryote tree of life, either alone or as the sister group to Malawimonas. This suggests Collodictyon represents an important new lineage for
Containerized attribute indexing and graph genomes for federated data accessBen Busby
This document discusses federating genomic sequencing data across cloud platforms for discovery. It presents four topic areas as examples: 1) virus characterization and discovery through building an index of viral signatures, 2) generating systems to analyze genome graphs to compare individuals to communities, 3) annotating haplotypes and graphs to query complex disease, and 4) indexing data flexibly for federated discovery anywhere through APIs. It emphasizes that metadata is needed to contextualize data and maximize its utility for answering biological questions.
Talk given at this conference covering prototypes involving ML, as well as data indexing hackathons capturing community consensus and where we are going with this.
Ben Busby is the founder of the Department of Bioinformatics and Data Science at the National Institutes of Health. He discusses hosting numerous data hackathons in 2019 focused on topics like RNA sequencing, human pan-genomics, variant datasets, computational medicine, prokaryotic virulence factors, single cell genomics, and structural variants. The hackathons bring researchers together to collaborate on bioinformatics tools and resources. Busby emphasizes the importance of open communication and creating a community to advance genomic and biomedical research.
Ben Busby is the founder of the Department of Bioinformatics and Data Science at NIH. He discusses several hackathon projects focused on analyzing RNAseq data and developing machine learning applications for genomics. These include tools for polygenic SNP search, genomic robots, and phenogenomics. Upcoming hackathons are planned on topics like RNAseq, human pan-genomics, variant datasets, and virus discovery. The goal is to create an open community around bioinformatics and data science.
Ben Busby is the founder of the Department of Bioinformatics and Data Science at NIH. He discusses several bioinformatics tools and resources developed during NIH hackathons, including tools for polygenic SNP search, RNAseq analysis, and virus discovery. Upcoming hackathon events focus on RNAseq, human pan-genomics, variant datasets, and more. The goal is to create an open bioinformatics community.
This document contains information about various bioinformatics tools and resources developed by Ben Busby at NCBI, including links to BLAST cloud software, Docker images, and GitHub repositories. It also lists upcoming hackathons focused on data analysis and encourages participation in the bioinformatics community through communication on LinkedIn and contributing to existing events.
This document discusses NCBI health resources and hackathons. It provides information about new BLAST features like CloudBLAST and Docker images. It also discusses using BLAST Docker images and magicBLAST. The document describes how to find data when metadata is insufficient by using the taxonomically indexed SRA. It lists several hackathon projects related to virus discovery, gene expression aging, and variant calling. It promotes creating a bioinformatics community and announces upcoming data analysis hackathons.
This document summarizes new features in NCBI BLAST that allow for more targeted searches based on taxonomy. It provides examples of using BLAST to search for Argonaute proteins in S. pombe, Ascomycota fungi, and excluding Viridiplantae. Execution times for each search are included. The document also demonstrates using BLAST docker images to search for Shiga toxin proteins in E. coli without needing the entire nr database locally. It concludes by advertising upcoming NCBI hackathons.
This document appears to be a slide deck presentation given by Ben Busby on making the transition from sharing data to sharing knowledge. Some of the key points covered in the presentation include EUtilities command line and EDirect tools for accessing NCBI resources, PubMed and PMC open FTP, tools like Pubrunner.org as an alternative to PubMed FTP, examples of bioinformatics resources and tools developed through NCBI hackathons posted on GitHub, and an opportunity to work at NCBI for 4-6 weeks through their bioinformatics training program.
This document discusses making the transition from simply sharing genomic data to sharing knowledge and insights gained from the data. It provides information on various tools from NCBI for searching, accessing, and analyzing genomic data including EUtils, EDirect, magicblast, and genomic analysis tools on GitHub. It also encourages creating an open community around sharing knowledge gained from genomic data analysis.
This document contains contact information for Ben Busby, who works as a Genomics Outreach Coordinator and Bioinformatics Training Lead at NCBI. It also lists several GitHub repositories created by NCBI Hackathons and provides links to resources for bioinformaticians, including datasets, educational materials, and computing resources.
The document appears to be from a presentation given by Ben Busby on making the transition from sharing data to sharing knowledge. It discusses various tools and resources for accessing genomic data from the National Center for Biotechnology Information including EUtils, EDirect, PubMed FTP, and more. The presentation also highlights several genomic analysis tools and hackathon projects developed by the NCBI community.
This document summarizes Ben Busby's presentation on making the transition from sharing data to sharing knowledge at NCBI. The presentation discusses NCBI tools and resources like EUtils, EDirect, PubMed FTP, and SRA. It also highlights several genomic analysis tools developed through NCBI hackathons and available on GitHub. The document lists Busby's contact information and encourages attendees to check his SlideShare profile and NCBI's bioinformatics training program for more details.
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...Advanced-Concepts-Team
Presentation in the Science Coffee of the Advanced Concepts Team of the European Space Agency on the 07.06.2024.
Speaker: Diego Blas (IFAE/ICREA)
Title: Gravitational wave detection with orbital motion of Moon and artificial
Abstract:
In this talk I will describe some recent ideas to find gravitational waves from supermassive black holes or of primordial origin by studying their secular effect on the orbital motion of the Moon or satellites that are laser ranged.
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
ESPP presentation to EU Waste Water Network, 4th June 2024 “EU policies driving nutrient removal and recycling
and the revised UWWTD (Urban Waste Water Treatment Directive)”
Contamination Detection and Taxonomic confirmation with magicBLAST
1. Sean La
Intern
Simon Fraser University
laseanl@sfu.ca
Cheryl Ames, Ph.D.
Research Fellow
Smithsonian National Museum
of Natural History
amesc@si.edu
Ben Busby, Ph.D.
Genomics Outreach
Coordinator
NCBI
ben.busby@nih.gov
2. 1. Image taken from https://media1.britannica.com/eb-media/82/126182-004-A23C1423.jpg
1
The scientific community wants to detect
viruses in SRA
SIDEARM
SRR
BLASTDB of
Viruses
Magic-BLAST
(Optimized version of BLAST)
BAM alignments
to viruses
Statistics Viral contigs
Motivation
2
2. Image taken from https://github.com/NCBI-Hackathons/Virus_Detection_SRA
3. 1
1 Image taken from http://www.newhealthguide.org/images/19999893/image001.jpg
2
2 Image taken from https://3c1703fe8d.site.internapcdn.net/newman/gfx/news/hires/2014/auroraakinase.png
Detect bacteria in metagenomics samples Identify proteins
3
Detect plasmid sequences in bacterial reads
.
3 Image taken from https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Plasmid_%28english%29.svg/300px-Plasmid_%28english%29.svg.png
Detect mitochondrial DNA
4
4 Image taken http://www.penrules.com/_Media/art_mito_300.png
4. Step 6: Convert mitochondria-free files from SAM to FASTA
format
samtools fasta trimmed.read.nomtDNA.sam > trimmed.read.nomtDNA.fasta
Step 5: Extract reads that don’t map to mtDNA database
awk '$4 == 0 {print $0}' trimmed.read.sam >> trimmed.read.nomtDNA.sam
Step 4: Create Magic-blast report (.sam) mapped &
unmapped reads
magicblast -query trimmed.read.fasta -db ala_mito_db -splice F
-perc_identity 90 -paired > trimmed.read.sam &
Step 3: Generate A. alata mtDNA database
makeblastdb -in alatina_mitochondria.fasta -ala_mito_db -dbtype nucl
Step 2: Trim adaptors from NGS data sets
-- trimmomatic Illumina.fasta > trimmed.Illumina.fasta
-- removesmartbell.sh Pacbio.fasta > trimmed.pacbio.fasta
Step 1: Generate A. alata NGS data sets (n=15)
Illumina.1.fasta=short reads (forward)
Illumina.2.fasta=short reads (reverse)
Pacbio.fasta =long reads
A. alata 8 mitochondrial chromosomes (Genbank)
Step 7: Pipe mitochondria-free reads (n=15) into downstream
pipelines
trimmed.reads.nomtDNA.fasta e.g., genome assembly
box jellyfish A. alata
5. Neisseria meningitides
genome (ERR1865236)
BLAST DB of known
bacterial plasmids
SIDEARM
1 Image taken from https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Plasmid_%28english%29.svg/300px-Plasmid_%28english%29.svg.png
1
6. Viral Metagenome
(ERR1301508 a la
Chris O’Sullivan)
BLAST DB of complete
bacterial genomes
SIDEARM
On SRA….
Using SIDEARM…
7. Greg Boratyn
Mike Muchow
Payl Cantalupo
Alex Goncearenco
Unix.systems
7
Editor's Notes
Alatina alata has one of the most unusual mtDNA organizations in Metazoa, where genes are distributed on eight linear chromosomes with long terminal inverted repeats.