Introductory slides for the Python hands-on session of the Research Data Visualisation Workshop run by the Software Sustainability Institute, University of Manchester 28th July 2016.
Materials for the session are available at https://github.com/widdowquinn/Teaching-Data-Visualisation
Presentation delivered 8th August 2016, at the European Association for Potato Research (EAPR) meeting, Dundee - outlining classification of bacterial plant pathogens with
Little Rotters: Adventures With Plant-Pathogenic BacteriaLeighton Pritchard
This document discusses a presentation on plant-pathogenic bacteria by Leighton Pritchard from The James Hutton Institute. The presentation covers several topics:
1) An introduction to The James Hutton Institute and its work on plant-pathogen interactions like soft-rot enterobacteria.
2) Genomics research including the first sequenced genome of a soft-rot enterobacterium in 2003 and subsequent sequencing of 25 Dickeya genomes in 2013.
3) Classification and diagnostics challenges around legislating by bacterial taxonomy when taxonomy is uncertain and mapping taxonomy to phenotypes is complex. Problems with existing classification and consequences of misclassification are discussed.
Whole genome taxonomic classication for prokaryotic plant pathogensLeighton Pritchard
This document summarizes a presentation on using whole genome sequencing and average nucleotide identity (ANI) for taxonomic classification of prokaryotic plant pathogens. It discusses how classification is critical for legislation and diagnostics but current bacterial taxonomy is problematic. It presents results of ANI analyses of 34 Dickeya species and 55 Pectobacterium species which identified nine Dickeya groups including three misclassified species, and ten Pectobacterium groups with P. carotovorum and P. wasabiae each splitting into multiple potential species. Whole genome analyses are cracking the taxonomy which will have real-world consequences.
Building bioinformatics resources for the global communityExternalEvents
1. The document evaluates different methods for inferring relationships between Salmonella samples based on whole genome sequencing data from large databases. It compares k-mer based methods and site-based methods using 18,997 Salmonella isolates from public databases.
2. Site-based methods like NUCmer and MLST produced more accurate results, but require more computing resources when dealing with large databases. K-mer based methods are faster but more sensitive to assembly and contamination issues.
3. While k-mer methods may be useful for initial filtering, site-based methods are superior for accuracy, though challenges remain in applying them to databases containing tens of thousands of samples. Quality control and computing resources are important considerations.
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsChristopher Mason
This document outlines plans for multi-site sequencing studies to generate standardized human and bacterial genome sequencing datasets. Samples include a human trio, bacterial isolates, and mixtures, which will be sequenced in triplicate across three sites on various platforms including Illumina HiSeq X Ten, HiSeq 4000, HiSeq 2500, NextSeq 500, Life Tech Ion Proton, Ion S5, Pacific Biosciences, Oxford Nanopore, and others. The goals are to measure intra- and inter-lab variation, sequencing performance at GC extremes, and establish molecular standards for assessing sequencing methods in DNA, RNA, and metagenomics. Data will be analyzed by a team to benchmark tools and published by October 2017.
Presentation delivered 8th August 2016, at the European Association for Potato Research (EAPR) meeting, Dundee - outlining classification of bacterial plant pathogens with
Little Rotters: Adventures With Plant-Pathogenic BacteriaLeighton Pritchard
This document discusses a presentation on plant-pathogenic bacteria by Leighton Pritchard from The James Hutton Institute. The presentation covers several topics:
1) An introduction to The James Hutton Institute and its work on plant-pathogen interactions like soft-rot enterobacteria.
2) Genomics research including the first sequenced genome of a soft-rot enterobacterium in 2003 and subsequent sequencing of 25 Dickeya genomes in 2013.
3) Classification and diagnostics challenges around legislating by bacterial taxonomy when taxonomy is uncertain and mapping taxonomy to phenotypes is complex. Problems with existing classification and consequences of misclassification are discussed.
Whole genome taxonomic classication for prokaryotic plant pathogensLeighton Pritchard
This document summarizes a presentation on using whole genome sequencing and average nucleotide identity (ANI) for taxonomic classification of prokaryotic plant pathogens. It discusses how classification is critical for legislation and diagnostics but current bacterial taxonomy is problematic. It presents results of ANI analyses of 34 Dickeya species and 55 Pectobacterium species which identified nine Dickeya groups including three misclassified species, and ten Pectobacterium groups with P. carotovorum and P. wasabiae each splitting into multiple potential species. Whole genome analyses are cracking the taxonomy which will have real-world consequences.
Building bioinformatics resources for the global communityExternalEvents
1. The document evaluates different methods for inferring relationships between Salmonella samples based on whole genome sequencing data from large databases. It compares k-mer based methods and site-based methods using 18,997 Salmonella isolates from public databases.
2. Site-based methods like NUCmer and MLST produced more accurate results, but require more computing resources when dealing with large databases. K-mer based methods are faster but more sensitive to assembly and contamination issues.
3. While k-mer methods may be useful for initial filtering, site-based methods are superior for accuracy, though challenges remain in applying them to databases containing tens of thousands of samples. Quality control and computing resources are important considerations.
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsChristopher Mason
This document outlines plans for multi-site sequencing studies to generate standardized human and bacterial genome sequencing datasets. Samples include a human trio, bacterial isolates, and mixtures, which will be sequenced in triplicate across three sites on various platforms including Illumina HiSeq X Ten, HiSeq 4000, HiSeq 2500, NextSeq 500, Life Tech Ion Proton, Ion S5, Pacific Biosciences, Oxford Nanopore, and others. The goals are to measure intra- and inter-lab variation, sequencing performance at GC extremes, and establish molecular standards for assessing sequencing methods in DNA, RNA, and metagenomics. Data will be analyzed by a team to benchmark tools and published by October 2017.
The presentation includes preliminary information about the big data mainly metagenomic data and discussions related to the hurdles in analyzing using conventional approaches. In the later part, brief introduction about machine learning approaches using biological example for each. In the last, work done with special focus on implementation of a machine learning approach Random Forest for the functional annotation and taxonomic classification of metagenomic data.
Polyketide Synthase type III Isolated from Uncultured Deep-Sea Proteobacteriu...Hadeel El Bardisy
Screening and Isolation of possible bacterial PKS type III in Atlantis II deep brine pool using a metagenomic approach
and gaining a deeper insights into the evolutionary origin of PKS type III among Prokaryotes and Eukaryotes
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
06.09.15
Invited Talk
2006 Synthetic Biology Symposium
Aliso Creek Inn
Title: Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics
Laguna Beach, CA
How Can We Make Genomic Epidemiology a Widespread Reality? - William HsiaoWilliam Hsiao
The document discusses genomic epidemiology and the requirements to bring genomic sequencing into routine public health practice. It outlines two parts: (1) what genomic epidemiology is and why it is important; and (2) the requirements for genomic sequencing to be used routinely in public health. Whole genome sequencing is seen as a way to generate high quality pathogen genomes quickly and allow for more detailed tracking of disease spread compared to traditional methods. However, bringing genomic sequencing into public health practice requires overcoming barriers such as the need for user-friendly analysis platforms, training public health personnel in genomics, and improving information sharing between organizations.
[2013.12.02] Mads Albertsen: Extracting Genomes from MetagenomesMads Albertsen
This document summarizes the process of extracting genomes from metagenomes. It discusses how metagenomics involves sequencing the collective DNA from an environmental sample to determine the community composition and functional potential. Full genomes cannot typically be assembled from metagenomic data due to high microbial diversity within samples and limitations in separating individual genomes (binning). Methods described to improve binning include reducing diversity through short-term enrichments and using multiple related samples. Validation of binned genomes involves checking for essential single copy genes and confirming bins with in situ techniques like fluorescence in situ hybridization.
This document provides an overview of bioinformatics and highlights several key points:
- Bioinformatics has emerged as a field to help analyze the vast amounts of biological data being generated through high-throughput technologies. It integrates biology, computer science, and information technology.
- The size of the human genome and rate of data generation has grown exponentially, necessitating computational approaches. International efforts like the Human Genome Project helped sequence the entire human genome.
- Bioinformatics tools and databases are used to study genomics, transcriptomics, proteomics and more to better understand living systems at the molecular level and enable applications in medicine, agriculture, forensics and more. This work also raises ethical, legal and social considerations.
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...Arvinder Singh
‘NATIONAL CONFERENCE ON MAN AND ENVIRONMENT’October 15 – 16, 2012
Organized by
Department of Zoology and Environmental Sciences, Punjabi University, Patiala (Pb.) – 147 002, India
Matthew W. McNatt has extensive experience in biomedical research and leadership. He co-founded Celleritas Bioscience where he served as Chief Science Officer and helped secure intellectual property protection. As a postdoctoral fellow, he designed bispecific antibodies and collaborated on elucidating protein structures. He has authored seven research papers and mentored students. McNatt has strong expertise in molecular biology, biochemistry, virology, and microscopy techniques. He holds a PhD from the University of Colorado and seeks new opportunities in science leadership.
Rajalaxmi M is seeking a position as a biotechnologist. She has research experience in identifying anti-virulent agents from marine microbiota through culture-dependent and metagenomic approaches. She has published papers in peer-reviewed journals and has presented posters at several international conferences. Rajalaxmi has an M.Phil. in Biotechnology and has developed skills in molecular biology techniques, bioinformatics, and microbiology.
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
This workshop will address critical issues related to Transcriptomics data:
Processing raw Next Generation Sequencing (NGS) data:
1. Next Generation Sequencing data preprocessing:
Trimming technical sequences
Removing PCR duplicates
2. RNA-seq based quantification of expression levels:
Conventional pipelines (looking at known transcripts)
Identification of novel isoforms
Analysis of Expression Data Using Machine Learning:
3. Unsupervised analysis of expression data:
Principal Component Analysis
Clustering
4. Supervised analysis:
Differential expression analysis
Classification, gene signature construction
5. Gene set enrichment analysis
The workshop will include hands-on exercises utilizing public domain datasets:
breast cancer cell lines transcriptomic profiles (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110),
patient-derived xenograft (PDX) mouse model of tumor and stroma transcriptomic profiles (http://www.oncotarget.com/index.php?journal=oncotarget&page=article&op=view&path[]=8014&path[]=23533), and
processed data from The Cancer Genome Atlas samples (https://cancergenome.nih.gov/).
Team: The workshops are designed by the researchers at the Tauber Bioinformatics Research Center at University of Haifa, Israel in collaboration with academic centers across the US. Technical support for the workshops is provided by the Pine Biotech team. https://edu.t-bio.info/a-critical-approach-to-transcriptomic-data-analysis/
Building an Information Infrastructure to Support Microbial Metagenomic SciencesLarry Smarr
06.01.14
Presentation for the Microbe Project Interagency Team
Title: Building an Information Infrastructure to Support Microbial Metagenomic Sciences
La Jolla, CA
This candidate has 12 years of experience in drug discovery, primarily focused on oncology therapeutic targets involving cell biology assays and general laboratory skills. They currently serve as a lead scientist and lab group head presenting data to project teams. They are seeking a senior scientist role and have experience in areas like cell culture, proliferation assays, microscopy, and data analysis software. They have worked at companies like Novartis and Piramed Pharma.
Web applications for rapid microbial taxonomy identification ExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
Web applications for rapid microbial taxonomy identification. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management -23-25 May 2016, Rome, Italy.
James J. Collins
Howard Hughes Medical Institute
Dept of Biomedical Engineering & Center of Synthetic Biology
Boston University
Wyss Institute for Biologically Inspired Engineering
Harvard University
Synthetic Biology & Global Health - Claire MarrisSTEPS Centre
The document summarizes the development of synthetic biology and efforts to synthesize bacterial genomes. It describes how researchers at the JCVI synthesized the Mycoplasma genitalium genome starting in pieces of 5-7 kilobases that were joined together, and how they later synthesized the larger 1.08 megabase Mycoplasma mycoides genome and transplanted it into a recipient cell to create a new self-replicating cell controlled by the synthetic chromosome. This was the first self-replicating species on Earth whose parent was a computer. The work demonstrated that genomes can be designed, chemically synthesized, and used to produce new cells controlled only by the synthetic genome.
This HIBB presentation provides background information on bases, amino acids, proteins, nucleotides and DNA. The presentation then explains what bioinformatics is, lists some examples, and demonstrates some tools. It demonstrates tools which compare parts of human and chimp genes, and illustrate drug resistance analysis and HIV subtype analysis. It then discusses some ethical and clinical aspects to bioinformatics.
Pran is a friendly and helpful colleague who always maintains a professional attitude. He provides updated information on security and risk management, and is flexible and accessible in helping colleagues with tasks and fieldwork guidance. Feedback from colleagues commends Pran's leadership, strong communication skills, ability to identify problems and solutions, and good people skills of being easy to work with and respecting others.
The presentation includes preliminary information about the big data mainly metagenomic data and discussions related to the hurdles in analyzing using conventional approaches. In the later part, brief introduction about machine learning approaches using biological example for each. In the last, work done with special focus on implementation of a machine learning approach Random Forest for the functional annotation and taxonomic classification of metagenomic data.
Polyketide Synthase type III Isolated from Uncultured Deep-Sea Proteobacteriu...Hadeel El Bardisy
Screening and Isolation of possible bacterial PKS type III in Atlantis II deep brine pool using a metagenomic approach
and gaining a deeper insights into the evolutionary origin of PKS type III among Prokaryotes and Eukaryotes
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
06.09.15
Invited Talk
2006 Synthetic Biology Symposium
Aliso Creek Inn
Title: Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics
Laguna Beach, CA
How Can We Make Genomic Epidemiology a Widespread Reality? - William HsiaoWilliam Hsiao
The document discusses genomic epidemiology and the requirements to bring genomic sequencing into routine public health practice. It outlines two parts: (1) what genomic epidemiology is and why it is important; and (2) the requirements for genomic sequencing to be used routinely in public health. Whole genome sequencing is seen as a way to generate high quality pathogen genomes quickly and allow for more detailed tracking of disease spread compared to traditional methods. However, bringing genomic sequencing into public health practice requires overcoming barriers such as the need for user-friendly analysis platforms, training public health personnel in genomics, and improving information sharing between organizations.
[2013.12.02] Mads Albertsen: Extracting Genomes from MetagenomesMads Albertsen
This document summarizes the process of extracting genomes from metagenomes. It discusses how metagenomics involves sequencing the collective DNA from an environmental sample to determine the community composition and functional potential. Full genomes cannot typically be assembled from metagenomic data due to high microbial diversity within samples and limitations in separating individual genomes (binning). Methods described to improve binning include reducing diversity through short-term enrichments and using multiple related samples. Validation of binned genomes involves checking for essential single copy genes and confirming bins with in situ techniques like fluorescence in situ hybridization.
This document provides an overview of bioinformatics and highlights several key points:
- Bioinformatics has emerged as a field to help analyze the vast amounts of biological data being generated through high-throughput technologies. It integrates biology, computer science, and information technology.
- The size of the human genome and rate of data generation has grown exponentially, necessitating computational approaches. International efforts like the Human Genome Project helped sequence the entire human genome.
- Bioinformatics tools and databases are used to study genomics, transcriptomics, proteomics and more to better understand living systems at the molecular level and enable applications in medicine, agriculture, forensics and more. This work also raises ethical, legal and social considerations.
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...Arvinder Singh
‘NATIONAL CONFERENCE ON MAN AND ENVIRONMENT’October 15 – 16, 2012
Organized by
Department of Zoology and Environmental Sciences, Punjabi University, Patiala (Pb.) – 147 002, India
Matthew W. McNatt has extensive experience in biomedical research and leadership. He co-founded Celleritas Bioscience where he served as Chief Science Officer and helped secure intellectual property protection. As a postdoctoral fellow, he designed bispecific antibodies and collaborated on elucidating protein structures. He has authored seven research papers and mentored students. McNatt has strong expertise in molecular biology, biochemistry, virology, and microscopy techniques. He holds a PhD from the University of Colorado and seeks new opportunities in science leadership.
Rajalaxmi M is seeking a position as a biotechnologist. She has research experience in identifying anti-virulent agents from marine microbiota through culture-dependent and metagenomic approaches. She has published papers in peer-reviewed journals and has presented posters at several international conferences. Rajalaxmi has an M.Phil. in Biotechnology and has developed skills in molecular biology techniques, bioinformatics, and microbiology.
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
This workshop will address critical issues related to Transcriptomics data:
Processing raw Next Generation Sequencing (NGS) data:
1. Next Generation Sequencing data preprocessing:
Trimming technical sequences
Removing PCR duplicates
2. RNA-seq based quantification of expression levels:
Conventional pipelines (looking at known transcripts)
Identification of novel isoforms
Analysis of Expression Data Using Machine Learning:
3. Unsupervised analysis of expression data:
Principal Component Analysis
Clustering
4. Supervised analysis:
Differential expression analysis
Classification, gene signature construction
5. Gene set enrichment analysis
The workshop will include hands-on exercises utilizing public domain datasets:
breast cancer cell lines transcriptomic profiles (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110),
patient-derived xenograft (PDX) mouse model of tumor and stroma transcriptomic profiles (http://www.oncotarget.com/index.php?journal=oncotarget&page=article&op=view&path[]=8014&path[]=23533), and
processed data from The Cancer Genome Atlas samples (https://cancergenome.nih.gov/).
Team: The workshops are designed by the researchers at the Tauber Bioinformatics Research Center at University of Haifa, Israel in collaboration with academic centers across the US. Technical support for the workshops is provided by the Pine Biotech team. https://edu.t-bio.info/a-critical-approach-to-transcriptomic-data-analysis/
Building an Information Infrastructure to Support Microbial Metagenomic SciencesLarry Smarr
06.01.14
Presentation for the Microbe Project Interagency Team
Title: Building an Information Infrastructure to Support Microbial Metagenomic Sciences
La Jolla, CA
This candidate has 12 years of experience in drug discovery, primarily focused on oncology therapeutic targets involving cell biology assays and general laboratory skills. They currently serve as a lead scientist and lab group head presenting data to project teams. They are seeking a senior scientist role and have experience in areas like cell culture, proliferation assays, microscopy, and data analysis software. They have worked at companies like Novartis and Piramed Pharma.
Web applications for rapid microbial taxonomy identification ExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
Web applications for rapid microbial taxonomy identification. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management -23-25 May 2016, Rome, Italy.
James J. Collins
Howard Hughes Medical Institute
Dept of Biomedical Engineering & Center of Synthetic Biology
Boston University
Wyss Institute for Biologically Inspired Engineering
Harvard University
Synthetic Biology & Global Health - Claire MarrisSTEPS Centre
The document summarizes the development of synthetic biology and efforts to synthesize bacterial genomes. It describes how researchers at the JCVI synthesized the Mycoplasma genitalium genome starting in pieces of 5-7 kilobases that were joined together, and how they later synthesized the larger 1.08 megabase Mycoplasma mycoides genome and transplanted it into a recipient cell to create a new self-replicating cell controlled by the synthetic chromosome. This was the first self-replicating species on Earth whose parent was a computer. The work demonstrated that genomes can be designed, chemically synthesized, and used to produce new cells controlled only by the synthetic genome.
This HIBB presentation provides background information on bases, amino acids, proteins, nucleotides and DNA. The presentation then explains what bioinformatics is, lists some examples, and demonstrates some tools. It demonstrates tools which compare parts of human and chimp genes, and illustrate drug resistance analysis and HIV subtype analysis. It then discusses some ethical and clinical aspects to bioinformatics.
Pran is a friendly and helpful colleague who always maintains a professional attitude. He provides updated information on security and risk management, and is flexible and accessible in helping colleagues with tasks and fieldwork guidance. Feedback from colleagues commends Pran's leadership, strong communication skills, ability to identify problems and solutions, and good people skills of being easy to work with and respecting others.
Scholarly research involves completing a research project, publishing the findings in a scholarly journal, and having other scholars in the field respond by citing the original research in their own new studies.
Carlos Julio Perdomo Hernández, auxiliar de bodega de EMCOSALUD, solicita a Abel Fernely Sepúlveda Ramos, gerente general, sus vacaciones correspondientes al periodo laborado entre el 28 de julio de 2012 y el 28 de julio de 2014.
La Asociación de Baloncesto del Estado de Aguascalientes convoca al 1er Torneo Sabatino 2014 para equipos femeniles y varoniles de baloncesto. El torneo tendrá lugar en gimnasios de la ciudad y se jugará en categorías mayores y de ascenso. Los equipos interesados deben registrarse pagando una cuota e inscribiendo a sus jugadores en la liga estatal. Habrá premios para los tres primeros lugares de cada categoría. Los árbitros serán proporcionados por la asociación de árbitros locales con
Dokumen tersebut membahas tentang pengelolaan obat dan keamanan lingkungan di puskesmas yang dilakukan secara efektif dan sesuai standar dengan adanya kebijakan dan prosedur yang jelas serta dilakukan pemantauan dan evaluasi secara berkala.
Dr. Jean N. Koster is a Professor Emeritus of Aerospace Engineering Sciences at the University of Colorado Boulder. He has over 35 years of experience in research and teaching in fields related to fluid mechanics, materials science, and alternative energy systems. Some of his notable achievements include developing experimental technologies in fields like particle image velocimetry, leading experiments aboard the Space Shuttle Columbia, and receiving over $3.2 million in research funding. He has also founded two startup companies focused on hybrid propulsion systems and has received several awards for his research and teaching work.
This document discusses re-entry horizontal drilling for enhanced oil recovery in Indonesia. It begins by outlining drivers for enhancing oil recovery through re-entry drilling such as declining production from existing wells. It then provides background on Indonesia's oil production history and challenges in meeting production targets. The document describes Geoglide's services for directional drilling, well planning, and risk reduction for re-entry horizontal wells. It discusses factors to consider such as well selection, drilling unit selection, horizontal drilling technology options, and information needed to plan a re-entry horizontal drilling project. The conclusion emphasizes that EOR projects require cost-effective and low-risk solutions.
Free webinar-introduction to bioinformatics - biologist-1Elia Brodsky
The Omics Logic Introduction to Bioinformatics program is a one-month online training program that provides an introduction to the field of bioinformatics for beginners. The program consists of six sessions taught by an international team of experts, covering topics like genomics, transcriptomics, statistical analysis, machine learning, and a final bioinformatics project. Participants will learn data analysis skills in Python and R and how to extract insights from multi-omics datasets with applications in biomedicine. The goal is to prepare students for data-driven research in life sciences through interactive lessons, coding exercises, and independent projects.
Curators are necessarily detail oriented -- a trait born of, and reinforced by, our efforts to describe biological data accurately and precisely. To ensure comprehensive coverage and meaningful integration of new and existing knowledge, however, it is important to periodically step back from this fine-grained view and assess emergent features in accumulated curation. I will explore how PomBase has used the global "big picture" view of curated data to provide biological summaries, modularise content, and improve data display and access for our users. The global perspective can also be used to detect annotation errors and identify knowledge gaps, thereby improving overall annotation quality. I will also describe the progress we have made in engaging fission yeast researchers in community curation. Finally, I will show that the global curation perspective and community engagement share a common theme: both improve overall understanding, accessibility and reuse of accumulated knowledge by our user community.
Computational methods to analyze biological data. It is a way to introduce some of the many resources available for analyzing sequence data with bioinformatics software. This paper will cover the theoretical approaches to data resources and we will get knowledge about some sequential alignments with its databases. As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics, and statistics to analyze and interpret biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques. Databases are essential for bioinformatics research and applications. Many databases exist, covering various information types for example, DNA and protein sequences, molecular structures, phenotypes, and biodiversity. Databases may contain empirical data. Conceptualizing biology in terms of molecules and then applying informatics techniques from math, computer science, and statistics to understand and organize the information associated with these molecules on a large scale. In this materialistic world, People are studying bioinformatics in different ways. Some people are devoted to developing new computational tools, both from software and hardware viewpoints, for the better handling and processing of biological data. They develop new models and new algorithms for existing questions and propose and tackle new questions when new experimental techniques bring in new data. Other people take the study of bioinformatics as the study of biology with the viewpoint of informatics and systems. Durgesh Raghuvanshi | Vivek Solanki | Neha Arora | Faiz Hashmi "Computational of Bioinformatics" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd30891.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/30891/computational-of-bioinformatics/durgesh-raghuvanshi
The Path to Open Science with Illustrations from Computational Biology - A presentation made at the Microsoft 2011 Latin America Faculty Summit Cartagena, Columbia, May 18, 2011.
Taverna is a free and open-source workflow management system that allows researchers to design and execute scientific workflows. It was developed by the University of Manchester to support in silico experiments in biology. Taverna provides a graphical user interface for designing workflows using a variety of distributed data sources and web services without having to learn complex programming. It has been widely adopted by researchers in fields such as biology, healthcare, astronomy, and cheminformatics to automate analysis pipelines and share workflows.
This document provides an overview and syllabus for a course on bioinformatics. It discusses the goals of learning about available bioinformatics programs and tools, and interpreting their outputs. The course will cover topics like sequence alignment, phylogenetics, genome comparison and using databases. Assessment will include homework, exams, a report, and participation. The document contrasts the "old" and "new" biology, noting how the new biology generates large datasets that require computational analysis to make sense of the data. It emphasizes that bioinformatics uses algorithms and databases to organize, analyze and interpret biological data at large scales.
1. The document discusses how a biologist, Marco Roos, became interested in e-science through his work in molecular and cellular biology, bioinformatics, and data integration projects.
2. Roos describes how e-science allows for collaboration between different experts and disciplines through technologies like workflows, semantic web, and virtual laboratories.
3. Roos emphasizes that e-science should empower scientists by making tools and resources easy to use, share, and build upon so that scientists can focus on scientific problems rather than technical challenges.
Data Visualization And Annotation Workshop at Biocuration 2015Monica Munoz-Torres
8th International Biocuration Conference. Beijing, China. April 23-26, 2015.
Workshop 2: Data Visualization and Annotation.
Chairs: Rama Balakrishnan, Stanford University, USA and Monica Munoz-Torres, Lawrence Berkeley National Laboratory, USA
Explaining the most intricate biological processes often requires a degree of detail beyond the scope of equations and algorithms; in fact, most biological knowledge is represented visually as illustrations, graphs, and diagrams. Genomics data in particular require specialized forms of visualization to improve our understanding and increase our chances of extracting meaningful conclusions from our analyses. Furthermore, the heterogeneity and abundance of genomic data include widely varied sources, techniques for their obtention, and intrinsic experimental error. And even data obtained under similar conditions from two or more individuals are loaded with biological variation. So what is the best way to interpret the stories the data are telling us? Given the questions we wish to answer and the data we are generating, which tools would be most useful and effective? In this workshop we will explore the tools available for human interpretation of genomic data, specifically in the context of annotation.
Presentations and perspectives, panelists/presenters:
- Lorna Richardson, IGMM, University of Edinburgh, United Kingdom
- Justyna Szostak, PMI Research & Development, Switzerland
The workshop included a brief introduction to a landscape of tools available - as updated as the constantly changing field allows-, brief presentations chosen from abstract submissions and invited speakers, as well as ample discussion to capture the contributions and questions from attendants. In the end, we hope participants walked away with a toolset in hand that may benefit the progress of their own research.
Tutorial on the DisGeNET Discovery Platform, with especial focus on its exploitation in the Semantic Web showing how to retrieve and integrate DisGeNET data with other RDF linked datasets.
This document introduces Marco Roos and discusses his transition from traditional molecular biology and bioinformatics work to e-science. It describes how e-science approaches can help address challenges in biology by enabling greater data and knowledge sharing, reuse of tools and workflows, and integrated analysis across multiple data types and sources. Examples discussed include semantic web technologies, workflow systems, and proposed e-laboratory platforms to empower scientists with virtual collaborative environments and intelligent assistance. The goal is to help biologists better exploit computational resources and expertise through enhanced and standardized e-science frameworks.
Scott Edmunds slides for class 8 from the HKU Data Curation (module MLIM7350 from the Faculty of Education) course covering science data, medical data and ethics, and the FAIR data principles.
This document summarizes a talk titled "AWager for 2016: How SoftwareWill Beat Hardware in Biological Data Analysis". The talk discusses how software approaches can outpace hardware for analyzing large biological datasets. It notes that current variant calling approaches have limitations due to being I/O intensive and requiring multiple passes over data. The talk introduces approaches using lossy compression and streaming algorithms that can perform analysis more efficiently using less memory and in a single pass. This could enable analyzing a human genome on a desktop computer by 2016 as wagered. The talk argues that with better algorithmic tools, biological data analysis need not require large computers and can scale with the information content of data rather than just data size.
The literature contains a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data reuse. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data.
Acting as Advocate? Seven steps for libraries in the data decadeLizLyon
UKOLN advocates that libraries take seven steps to support data management and open science in the data decade:
1) Provide briefings on cloud data services in partnership with IT services.
2) Build usable data management tools in partnership with researchers.
3) Develop data sustainability strategies and articulate the costs and benefits.
4) Publish case studies on open science to show benefits of universal data sharing.
5) Present at university ethics committees to highlight open data issues.
6) Raise awareness of citizen science opportunities and guidelines for good practice.
7) Promote data citation and attribution to embed in publication practice.
Slides contain information about why bioinformatics appeared,
who bioinformaticians are, what they do, what kind of cool applications and challenges in bioinformatics there are.
Slides were prepared for the Bioinformatics seminar 2016, Institute of Computer Science, University of Tartu.
- Biology is generating vast amounts of data from techniques like DNA sequencing that is growing exponentially. However, biology is unprepared to effectively analyze and utilize this "big data".
- There are few researchers trained in both data analysis and biology. Data is often not shared between researchers, and the current publishing system discourages open sharing of knowledge. Most computational research also lacks reproducibility.
- The presenter advocates for open science, data sharing, and improved training for the next generation of researchers in both biology and data analysis skills to help address these challenges and better leverage the growing stores of biological data.
The poster was made during a 2.5 day NSF sponsored workshop on August 5th – 7th on the University of Washington, Seattle campus, which brought together 100 graduate students from diverse domain sciences and engineering with Data Scientists from industry and academia to discuss and collaborate on Big Data / Data Science challenges. In addition to keynote presentations from high profile speakers, the participants presented posters covering their own research and worked collaboratively to begin to solve some of the Grand Challenge problems facing Data Enabled Science & Engineering disciplines.
This presentation was made by 10 graduate students, whose united them was high-dimensional biological data.
For more info, see http://depts.washington.edu/dswkshp/
Knowledge Science for AI-based biomedical and clinical applicationsCatia Pesquita
The great barrier to AI adoption in healthcare and biomedical research is lack of trust.
Assessing trustworthiness requires data, domain and user context, which can be supported by ontologies, knowledge graphs and FAIR data.
The document discusses several "sins" or bad practices that are commonly seen in bioinformatics, including reinvention, lack of reuse, inconsistent naming schemes, and lack of collaboration and data sharing. It provides examples of these issues and argues that greater emphasis should be placed on standards, collaboration, and leveraging existing tools and data. The document also acknowledges that some reinvention may be necessary due to evolving technologies and unmet needs in the field.
Reverse-and forward-engineering specificity of carbohydrate-processing enzymesLeighton Pritchard
This document summarizes a project to engineer plant cell wall degrading enzymes from Dickeya bacteria for enhanced biofuel production. The project aims to compile a library of diverse enzyme specificities by exploiting natural diversity found in Dickeya genomes and sequences. The project plan involves identifying candidate enzyme families from literature and sequence analysis, determining additional enzyme structures, and applying techniques like gene shuffling and saturated mutagenesis to generate novel diversity. The goal is to understand structure-function relationships and epistasis to enable rational engineering of enzyme specificity for new waste processing applications and biofuel production.
Guest lecture on comparative genomics for University of Dundee BS32010, delivered 21/3/2016
Workshop/other materials available at DOI:10.5281/zenodo.49447
The document summarizes a talk on microbial agrogenomics. It discusses how genomics has advanced the understanding of plant-microbe interactions in several ways:
1) Genomes allow cataloging of genes and features in organisms and comparing common features between organisms that associate with phenotypes.
2) Multiple pathogen and plant genomes have enabled studying epidemiology through tracing historical origins, emergence in new areas, and host jumps.
3) Genomics provides diagnostic tools for precise pathogen identification needed for quarantine and legislation.
The document discusses the strengths, weaknesses, opportunities, and threats (SWOT) of using whole genome sequencing (WGS) for surveillance and diagnostics of zoonotic bacteria. It provides a case study of using WGS to track the nosocomial transmission of Pseudomonas aeruginosa between patients and the hospital water supply. WGS was able to identify transmission routes and microevolution of the bacteria with single nucleotide resolution. However, challenges include the need for robust and standardized analysis methods as well as experimental design considerations. Overall, WGS provides opportunities for improved outbreak tracking, classification, and diagnostics if its strengths are leveraged and weaknesses addressed.
Highly Discriminatory Diagnostic Primer Design From Whole Genome DataLeighton Pritchard
Presented at the GMI (Global Microbial Identifier) satellite meeting, sponsored by the UK Department for Environment, Food and Rural Affairs (DEFRA), organised by the Food and Environment Research Agency (FERA), Bedern Hall, York, 10th September 2014.
Presentation summarising the 2013 ICSB conference in Copenhagen, a requirement of James Hutton Institute Visits Abroad funding. Presented at the Cellular and Molecular Sciences seminar series.
Golden Rules of Bioinformatics.
Presented as part of a full-day introductory bioinformatics course - the example data and source for the slides can be found at https://github.com/widdowquinn/Teaching-Intro-to-Bioinf
Keynote presentation from Plant and Pathogen Bioinformatics workshop at EMBL-EBI, 8-11 July 2014
Slides and teaching material are available at https://github.com/widdowquinn/Teaching-EMBL-Plant-Path-Genomics
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataLeighton Pritchard
Presentation on use of Galaxy for plant pathology bioinformatics, presented by Peter Cock, at the Genomics for Non-Model Organisms workshop, ISMB/ECCB, Vienna, Austria, 19 July 2011
Presentation delivered 29th October 2012, at the CoZee workshop in Dundee (see CoZee zoonosis network site for more information: http://www.cozee-zoonosis.net/).
[For clarity: our diagnostics work did not at the time form part of the excellent E.coli O104:H4 genome analysis crowd-sourcing consortium work, which can be found at https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki - we talked about it here because it was good work, and without their efforts we couldn't have done what we did]
Slides for the afternoon session on "Introduction to Bioinformatics", delivered at the James Hutton Institute, 29th, 20th May and 5th June 2014, by Leighton Pritchard and Peter Cock.
Slides cover introductory guidance and links to resources, theory and use of BLAST tools, and a workshop featuring some common tools and tasks.
Presentation given as part of the EMBO Workshop on Plant-Microbe Interactions, at The Sainsbury Laboratory, Norwich, 20th June 2012. This presentation describes bioinformatic and statistical considerations for the prediction of plant pathogen effectors from genome sequences and annotation, with several literature examples.
Slides from a Comparative Genomics and Visualisation course (part 2) presented at the University of Dundee, 11th March 2014. Other materials are available at GitHub (https://github.com/widdowquinn/Teaching)
This document provides an overview of comparative genomics. It defines comparative genomics as combining genomic data and evolutionary biology to study genome structure, evolution and function. It discusses three levels of genome comparison: bulk properties like chromosome size and number, whole genome sequence similarity and organization, and functional genome features. The history of experimental comparative genomics is reviewed, noting that practical comparisons predated widespread genome sequencing.
A Systems Biology Perspective on Plant-Pathogen Interactions 2012-05-08, TurinLeighton Pritchard
My presentation from 8th May 2012, at a workshop on Plant-Microbe Interactions, held at the Turin Botanical Gardens, University of Turin. The talk expands on concepts from this paper: Pritchard L, Birch P (2011) A systems biology perspective on plant-microbe interactions: Biochemical and structural targets of pathogen effectors. Plant Science 180: 584–603. doi:10.1016/j.plantsci.2010.12.008.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
1. Hands-on session: Python
Research Data Visualisation Workshop
Leighton Pritchard1,2,3
1
Information and Computational Sciences,
2
Centre for Human and Animal Pathogens in the Environment,
3
Dundee Effector Consortium,
The James Hutton Institute, Invergowrie, Dundee, Scotland, DD2 5DA
2. Acceptable Use Policy
Recording of this talk, taking photos, discussing the content using
email, Twitter, blogs, etc. is permitted (and encouraged),
providing distraction to others is minimised.
These slides will be made available at
http://www.slideshare.net/leightonp
3. Table of Contents
1 Introduction
Why listen to me?
What is visualisation?
Elementary perceptual tasks
2 Evidence-based representation
What representations work best?
Pie charts
Bars and lines
Scatterplots
Interactive plots
3 Hands-on session
Python libraries
Exercises
Let’s get started
4. What I do
Computational biologist (1996-present)
protein sequence-structure-function (1996-1999)
yeast metabolism (1999-2003)
plant pathology (2003-present)
Large datasets
sequence/genomic
metabolomics
statistics
geographical
Visualisation as communication to (wet) biologists
protein structures
metabolic flux
comparative genomics/evolution
statistical plots
6. GenomeDiagram a b c
a
Pritchard et al. (2006) Bioinformatics doi:10.1093/bioinformatics/btk021
b
Toth et al. (2006) Ann. Rev. Phytopath. doi:10.1146/annurev.phyto.44.070505.143444
c
http://biopython.org
7. Functional adaptation in Pba a b
a
Toth et al. (2006) Ann. Rev. Phytopath. doi:10.1146/annurev.phyto.44.070505.143444
b
http://biopython.org
8. GenomeDiagram/SciArt a b c
a
Pritchard et al. (2006) Bioinformatics doi:10.1093/bioinformatics/btk021
b
Shemilt (2009) in“Digital Visual Culture: Theory and Practice” ISBN 978-1-84150-248-9
c
Shemilt (2010) in “Art Practice in a Digital Culture”, ISBN 978-0-7546-7623-2
Influence
Free open-source comparative
genomics visualisation library
Impact
Artwork (prints, audio-visual
installation) exhibited in UK and
internationally
9. Comparative metabolism a b c
a
Biopython KGML/KEGG visualisation module
b
https://github.com/widdowquinn/Notebooks-Bioinformatics
c
http://biopython.org
10. PyANI: prokaryote classification a b
a
http://widdowquinn.github.io/pyani/
b
Pritchard et al. (2016) Anal. Methods doi:10.1039/C5AY02550H
Pectobacterium_atrosepticum_SCRI1043_uid57957Pectobacterium_atrosepticum_NCPPB8549Pectobacterium_atrosepticum_NCPPB3404Pectobacterium_atrosepticum_21APectobacterium_atrosepticum_JG10-08Pectobacterium_carotovorum_PC1_uid59295Pectobacterium_carotovorum_subsp_carotovorum_NCPPB312Pectobacterium_carotovorum_subsp_oderiferum_NCPPB3841Pectobacterium_carotovorum_subsp_oderiferum_NCPPB3839Pectobacterium_carotovorum_subsp_carotovorum_NCPPB3395Pectobacterium_carotovorum_PCC21_uid174335Pectobacterium_carotovorum_subsp_brasiliensis_B5Pectobacterium_carotovorum_subsp_brasiliensis_B4Pectobacterium_betavasculorum_NCPPB2293Pectobacterium_betavasculorum_NCPPB2795Pectobacterium_wasabiae_NCPPB3702Pectobacterium_wasabiae_NCPPB3701Pectobacterium_wasabiae_WPP163_uid41297Pectobacterium_SCC3193_uid193707Dickeya_solani_AMYI01Dickeya_solani_AMWE01Dickeya_solani_GBBC2040Dickeya_solani_IPO2222Dickeya_solani_MK16Dickeya_solani_MK10Dickeya_dianthicola_NCPPB_3534Dickeya_dianthicola_GBBC2039Dickeya_dianthicola_NCPPB_453Dickeya_dianthicola_IPO980Dickeya_spp_NCPPB_3274Dickeya_spp_MK7Dickeya_dadantii_NCPPB_2976Dickeya_dadantii_NCPPB_898Dickeya_dadantii_NCPPB_3537Dickeya_dadantii_3937_uid52537Pantoea_ananatis_AJ13355_uid162073Pantoea_ananatis_LMG_20103_uid46807Pantoea_ananatis_PA13_uid162181Pantoea_ananatis_uid86861Erwinia_amylovora_CFBP1430_uid46839Erwinia_amylovora_ATCC_49946_uid46943Erwinia_Ejp617_uid159955Erwinia_pyrifoliae_Ep1_96_uid40659Erwinia_pyrifoliae_DSM_12163_uid159693Dickeya_dadantii_Ech703_uid59363Dickeya_paradisiaca_NCPPB_2511Dickeya_aquatica_DW_0440Dickeya_aquatica_CSL_RW240Erwinia_tasmaniensis_Et1_99_uid59029Pantoea_At_9b_uid55845Pantoea_vagans_C9_1_uid49871Erwinia_billingiae_Eb661_uid50547Dickeya_zeae_APMV01Dickeya_zeae_AJVN01Dickeya_zeae_CSL_RW192Dickeya_zeae_NCPPB_3531Dickeya_dadantii_Ech586_uid42519Dickeya_zeae_APWM01Dickeya_zeae_NCPPB_2538Dickeya_zeae_MK19Dickeya_zeae_NCPPB_3532Dickeya_spp_NCPPB_569Dickeya_chrysanthami_NCPPB_402Dickeya_chrysanthami_NCPPB_516Dickeya_zeae_Ech1591_uid59297Dickeya_chrysanthami_NCPPB_3533
Pectobacterium_atrosepticum_SCRI1043_uid57957Pectobacterium_atrosepticum_NCPPB8549Pectobacterium_atrosepticum_NCPPB3404Pectobacterium_atrosepticum_21APectobacterium_atrosepticum_JG10-08Pectobacterium_carotovorum_PC1_uid59295Pectobacterium_carotovorum_subsp_carotovorum_NCPPB312Pectobacterium_carotovorum_subsp_oderiferum_NCPPB3841Pectobacterium_carotovorum_subsp_oderiferum_NCPPB3839Pectobacterium_carotovorum_subsp_carotovorum_NCPPB3395Pectobacterium_carotovorum_PCC21_uid174335Pectobacterium_carotovorum_subsp_brasiliensis_B5Pectobacterium_carotovorum_subsp_brasiliensis_B4Pectobacterium_betavasculorum_NCPPB2293Pectobacterium_betavasculorum_NCPPB2795Pectobacterium_wasabiae_NCPPB3702Pectobacterium_wasabiae_NCPPB3701Pectobacterium_wasabiae_WPP163_uid41297Pectobacterium_SCC3193_uid193707Dickeya_solani_AMYI01Dickeya_solani_AMWE01Dickeya_solani_GBBC2040Dickeya_solani_IPO2222Dickeya_solani_MK16Dickeya_solani_MK10Dickeya_dianthicola_NCPPB_3534Dickeya_dianthicola_GBBC2039Dickeya_dianthicola_NCPPB_453Dickeya_dianthicola_IPO980Dickeya_spp_NCPPB_3274Dickeya_spp_MK7Dickeya_dadantii_NCPPB_2976Dickeya_dadantii_NCPPB_898Dickeya_dadantii_NCPPB_3537Dickeya_dadantii_3937_uid52537Pantoea_ananatis_AJ13355_uid162073Pantoea_ananatis_LMG_20103_uid46807Pantoea_ananatis_PA13_uid162181Pantoea_ananatis_uid86861Erwinia_amylovora_CFBP1430_uid46839Erwinia_amylovora_ATCC_49946_uid46943Erwinia_Ejp617_uid159955Erwinia_pyrifoliae_Ep1_96_uid40659Erwinia_pyrifoliae_DSM_12163_uid159693Dickeya_dadantii_Ech703_uid59363Dickeya_paradisiaca_NCPPB_2511Dickeya_aquatica_DW_0440Dickeya_aquatica_CSL_RW240Erwinia_tasmaniensis_Et1_99_uid59029Pantoea_At_9b_uid55845Pantoea_vagans_C9_1_uid49871Erwinia_billingiae_Eb661_uid50547Dickeya_zeae_APMV01Dickeya_zeae_AJVN01Dickeya_zeae_CSL_RW192Dickeya_zeae_NCPPB_3531Dickeya_dadantii_Ech586_uid42519Dickeya_zeae_APWM01Dickeya_zeae_NCPPB_2538Dickeya_zeae_MK19Dickeya_zeae_NCPPB_3532Dickeya_spp_NCPPB_569Dickeya_chrysanthami_NCPPB_402Dickeya_chrysanthami_NCPPB_516Dickeya_zeae_Ech1591_uid59297Dickeya_chrysanthami_NCPPB_3533
0.00
0.25
0.50
0.75
1.00
ANIm_percentage_identity
11. Table of Contents
1 Introduction
Why listen to me?
What is visualisation?
Elementary perceptual tasks
2 Evidence-based representation
What representations work best?
Pie charts
Bars and lines
Scatterplots
Interactive plots
3 Hands-on session
Python libraries
Exercises
Let’s get started
12. Data visualisation is art and science
. . .storytelling in pictorial or graphical format
Stories to yourself (sense-making)
Stories to others (communication)
Cautionary tales
13. Data visualisation is art and science
. . .storytelling in pictorial or graphical format
Stories to yourself (sense-making)
summarise big stories quickly
data exploration and mining
identify areas/items of importance
find relationships and patterns
Stories to others (communication)
Cautionary tales
14. Data visualisation is art and science
. . .storytelling in pictorial or graphical format
Stories to yourself (sense-making)
summarise big stories quickly
data exploration and mining
identify areas/items of importance
find relationships and patterns
Stories to others (communication)
present your interpretation of data
make a specific point
assert a relationship or pattern
demonstrate significance
Cautionary tales
15. Data visualisation is art and science
. . .storytelling in pictorial or graphical format
Stories to yourself (sense-making)
summarise big stories quickly
data exploration and mining
identify areas/items of importance
find relationships and patterns
Stories to others (communication)
present your interpretation of data
make a specific point
assert a relationship or pattern
demonstrate significance
Cautionary tales
avoid distortion
make the reader think about the data, not the presentation
avoid chartjunk (excessive decoration)
aim for high data:ink ratio
16. The point of data visualisation a
a
https://en.wikipedia.org/wiki/Data visualization
Where does visualisation belong?
19. Communicating effectively
Understand the data
size
cardinality
meaning
relationships
Know (or be receptive to) the message
what does pictorial representation mean?
match graphical relationships to data relationships
Know your audience
20. Communicating effectively
Understand the data
size
cardinality
meaning
relationships
Know (or be receptive to) the message
what does pictorial representation mean?
match graphical relationships to data relationships
Know your audience
how do people process pictorial information
how does your audience process information
domain-specific representations
21. A model of communication a
a
Randy Olson (2009) Don’t Be Such a Scientist
22. Table of Contents
1 Introduction
Why listen to me?
What is visualisation?
Elementary perceptual tasks
2 Evidence-based representation
What representations work best?
Pie charts
Bars and lines
Scatterplots
Interactive plots
3 Hands-on session
Python libraries
Exercises
Let’s get started
25. Gestalt principles a
a
https://emeeks.github.io/gestaltdataviz
proximity: close objects perceived as
groups
enclosure: bounded objects perceived
as groups
continuity: aligned objects perceived
as continuous
similarity: similar attributes perceived
as groups
closure: open objects perceived as
complete
connection: connected items
perceived as groups
26. Table of Contents
1 Introduction
Why listen to me?
What is visualisation?
Elementary perceptual tasks
2 Evidence-based representation
What representations work best?
Pie charts
Bars and lines
Scatterplots
Interactive plots
3 Hands-on session
Python libraries
Exercises
Let’s get started
27. What works best? Experiment a b
a
Cleveland & McGill (1984) J. Am. Stat. Ass.
b
Heer & Bostock (2010) CHI 2010
Empirical measurements of interpretation
Subjects shown graphs representing same data
(log2) error in subjects’ accuracy compared by graph type
Judgement types
1-3: Position on a common scale (bar chart, stacked bar
chart)
4-5: Length encoding (stacked bar chart)
6: Angle (pie chart)
7-9: Area (bubble chart, aligned rectangles, treemap)
28. What works best? Result a b
a
Cleveland & McGill (1984) J. Am. Stat. Ass.
b
Heer & Bostock (2010) CHI 2010
We have inherent biases
that can distort
information recovered
Position > Angle ≈
Length > Area
Accuracy plateaus as
charts increase in size
Gridlines improve
accuracy
Aspect ratios affect area
judgements (squares
worst)
29. Table of Contents
1 Introduction
Why listen to me?
What is visualisation?
Elementary perceptual tasks
2 Evidence-based representation
What representations work best?
Pie charts
Bars and lines
Scatterplots
Interactive plots
3 Hands-on session
Python libraries
Exercises
Let’s get started
30. People hate pie charts
http://www.storytellingwithdata.com/blog/2011/07/death-to-pie-charts
especially Edward Tufte
A table is nearly always better than a dumb pie chart; the only worse design than a pie
chart is several of them[...] pie charts should never be used. - ”The Visual Display of
Quantitative Information”
31. ”E pur si muove. . .” a b
a
Eells (1926) J Am. Stat. Ass.
b
Simkin & Hastie (1987) J Am. Stat. Ass.
For proportions of a whole:
Pie charts read as accurately as bar charts
As number of components in the chart increases, bars are less efficient than pie
charts
32. Table of Contents
1 Introduction
Why listen to me?
What is visualisation?
Elementary perceptual tasks
2 Evidence-based representation
What representations work best?
Pie charts
Bars and lines
Scatterplots
Interactive plots
3 Hands-on session
Python libraries
Exercises
Let’s get started
33. Bar charts are bad. . .mmmkay?
There is an ongoing backlash against bar charts
(and I’m not picking on Nick, he just tweets a lot. . .)
But are they really that bad?
34. Interpretation of bars and lines a
a
Zacks & Tversky (1999) Mem. Cognit.
People interpret bars and lines differently
Experiment 1: In absence of context (arbitrary X, Y )
bars: discrete comparison (24:0)
lines: trend assessment (0:35)
35. Interpretation of bars and lines a
a
Zacks & Tversky (1999) Mem. Cognit.
People interpret bars and lines differently
Experiment 2: With context (discrete or continuous data)
36. Bars vs. lines
People naturally interpret bar charts as categorical data
People naturally interpret line graphs as trends
Using bars for trend data or lines for categorical data can
mislead the reader
37. Bar charts can mislead a
a
Weissgerber et al. (2015) PLoS Biol. doi:10.1371/journal.pbio.1002128
Do these bars differ in value?
38. Bar charts can mislead a
a
Weissgerber et al. (2015) PLoS Biol. doi:10.1371/journal.pbio.1002128
Do these bars differ in value?
Bar charts represent data as a
single point: lossy compression.
Could different datasets give the
same bar chart?
39. Bars are lossy compression a
a
Weissgerber et al. (2015) PLoS Biol. doi:10.1371/journal.pbio.1002128
Bars hide detail:
Number of data points
Variance of data points
Distribution of data points (outliers, etc.)
40. Bars may mislead on statistics a
a
Weissgerber et al. (2015) PLoS Biol. doi:10.1371/journal.pbio.1002128
Bars may imply incorrect test statistics:
Overlaps, outliers, covariates, sample sizes masked
41. Bars for paired data a
a
Weissgerber et al. (2015) PLoS Biol. doi:10.1371/journal.pbio.1002128
Bars imply independence of data:
42. Better than bar charts?
Bar chart with SE bars suggests group 2 is highest
43. Better than bar charts?
Bar chart with SD bars suggests there is overlap
44. Better than bar charts?
Univariate scatterplots show sample sizes, outliers, variance
48. Table of Contents
1 Introduction
Why listen to me?
What is visualisation?
Elementary perceptual tasks
2 Evidence-based representation
What representations work best?
Pie charts
Bars and lines
Scatterplots
Interactive plots
3 Hands-on session
Python libraries
Exercises
Let’s get started
50. Framing affects interpretation a
a
Cleveland et al. (1982) Science doi:10.1126/science.216.4550.1138
Point cloud size affects interpretation of correlation
(more diffuse interpreted as lower correlation coefficient)
51. Interpreting correlation is difficult a
a
Fisher et al. (2014) PeerJ doi:10.7717/peerj.589
People don’t judge significance well
47.4% of significant relationships correctly classified
74.6% of non-significant relationships correctly classified
52. Table of Contents
1 Introduction
Why listen to me?
What is visualisation?
Elementary perceptual tasks
2 Evidence-based representation
What representations work best?
Pie charts
Bars and lines
Scatterplots
Interactive plots
3 Hands-on session
Python libraries
Exercises
Let’s get started
53. Latency affects usage
Increasing latency to 0.5s:
decreases user activity
decreases dataset
coverage
reduces rate of hypothesis
generation
changes data exploration
strategy
reduces future interaction
with other graphics
54. Table of Contents
1 Introduction
Why listen to me?
What is visualisation?
Elementary perceptual tasks
2 Evidence-based representation
What representations work best?
Pie charts
Bars and lines
Scatterplots
Interactive plots
3 Hands-on session
Python libraries
Exercises
Let’s get started
56. Table of Contents
1 Introduction
Why listen to me?
What is visualisation?
Elementary perceptual tasks
2 Evidence-based representation
What representations work best?
Pie charts
Bars and lines
Scatterplots
Interactive plots
3 Hands-on session
Python libraries
Exercises
Let’s get started
58. Table of Contents
1 Introduction
Why listen to me?
What is visualisation?
Elementary perceptual tasks
2 Evidence-based representation
What representations work best?
Pie charts
Bars and lines
Scatterplots
Interactive plots
3 Hands-on session
Python libraries
Exercises
Let’s get started
60. Licence: CC-BY-SA
By: Leighton Pritchard
This presentation is licensed under the Creative Commons
Attribution ShareAlike license
https://creativecommons.org/licenses/by-sa/4.0/