Event: Plant and Animal Genomes Conference
Speaker: Bert Overduin
The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. As of Ensembl release 65 (December 2011), 56 species are fully supported. Ensembl data are accessible through an interactive web site, flat files, the data mining tool BioMart, direct database querying and a set of Perl APIs. Moreover, Ensembl is not just a data visualisation tool, but a suite of programs for data production (e.g. gene calling and comparative genomics) that can be deployed individually according to the needs of an individual community. Ensembl Genomes (http://www.ensemblgenomes.org) consists of five sub-portals (for bacteria, protists, fungi, plants and invertebrate metazoa) designed to complement the genomes available in Ensembl. It currently contains data for over 300 species. Many of the databases that support Ensembl Genomes have been built by, or in close collaboration with, groups that maintain specialist data resources for individual species, and we are actively seeking to extend the range of these collaborations. Together Ensembl and Ensembl Genomes offer a single unified interface across the taxonomic space. This presentation will consist of a short introduction to Ensembl and Ensembl Genomes followed by a demonstration of the respective websites and the BioMart data retrieval tool. Special attention will be given to recently developed functionality like the Variant Effect Predictor, which predicts the consequences of substitutions, insertions and deletions on transcripts and protein sequences, and the possibility to visualize your own data by attaching BAM and VCF files (for example).
GRC Workshop held at Churchill College on Sep 21, 2014. Talk by Bronwen Aken discussing the Ensembl approach to annotating the complete human reference assembly.
There are many characteristics of biological data. All these characteristics make the management of biological information a particularly challenging problem. Here mainly we will focus on characteristics of biological information and multidisciplinary field called bioinformatics. Bioinformatics, now a days has emerged with graduate degree programs in several universities.
This is the webinar presented on the 14th April as part of the Ensembl Online Webinar series. You can view the recorded webinar on the Ensembl Helpdesk youtube channel https://www.youtube.com/watch?v=blbhuqiiDoA
GRC Workshop held at Churchill College on Sep 21, 2014. Talk by Bronwen Aken discussing the Ensembl approach to annotating the complete human reference assembly.
There are many characteristics of biological data. All these characteristics make the management of biological information a particularly challenging problem. Here mainly we will focus on characteristics of biological information and multidisciplinary field called bioinformatics. Bioinformatics, now a days has emerged with graduate degree programs in several universities.
This is the webinar presented on the 14th April as part of the Ensembl Online Webinar series. You can view the recorded webinar on the Ensembl Helpdesk youtube channel https://www.youtube.com/watch?v=blbhuqiiDoA
Course: Bioinformatics for Biomedical Research (2014).
Session: 1.3- Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
BITs: Genome browsers and interpretation of gene lists.BITS
Module 5 Genome browsers and interpreting gene lists.
Part of training session "Basic Bioinformatics concepts, databases and tools" - http://www.bits.vib.be/training
Genomic databases are referred to as online repositories of genomic variants, described for a single (locus-specific) or more (general) genes or specifically for a population or ethnic group (national/ethnic).
These are the first lecture slides of the BITS bioinformatics training session on the UCSC Genome Browser.
See http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203990:orange-genome-browsers-ucsc-training&catid=81:training-pages&Itemid=190
The NCBI Boot Camp for Beginners was designed to offer an overview of the NCBI suite of resources. In the first half of the presentation, highlighted databases were covered in four main categories: literature, sequences, genes & genomes and expression & structure. The second half of the class used the apolipoprotein A as a query that was explored through many of the NCBI databases, from identifying the reference sequences to a structural analysis of the Cys130Arg variant.
The National Center for Biotechnology Information is part of the United States National Library of Medicine, a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper.
Archive of experimentally determined 3D structures of biological macromolecules.
Established in 1971, by Research Collaboratory for Structural Bioinformatics (RCSB), Brookhaven National Laboratories, USA.
Archive contain atomic coordinates, bibliographic citations, primary and secondary structure information, crystallographic structure factors, NMR experimental data.
WHAT IS BIOINFORMATICS?
Computational Biology/Bioinformatics is the application of computer sciences and allied technologies to answer the questions of Biologists, about the mysteries of life. It has evolved to serve as the bridge between:
Observations (data) in diverse biologically-related disciplines and
The derivations of understanding (information)
APPLICATIONS OF BIOINFORMATICS
Computer Aided Drug Design
Microarray Bioinformatics
Proteomics
Genomics
Biological Databases
Phylogenetics
Systems Biology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
A half day course presented during the Earlham Institute summer school on bioinformatics 2016, in Norwich, UK, http://www.earlham.ac.uk/earlham-institute-summer-school-bioinformatics
This seminar aims at answering the question of what to make of the identified variants, specifically how to evaluate the quality, prioritize and functionally annotate the variants.
Course: Bioinformatics for Biomedical Research (2014).
Session: 1.3- Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
BITs: Genome browsers and interpretation of gene lists.BITS
Module 5 Genome browsers and interpreting gene lists.
Part of training session "Basic Bioinformatics concepts, databases and tools" - http://www.bits.vib.be/training
Genomic databases are referred to as online repositories of genomic variants, described for a single (locus-specific) or more (general) genes or specifically for a population or ethnic group (national/ethnic).
These are the first lecture slides of the BITS bioinformatics training session on the UCSC Genome Browser.
See http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203990:orange-genome-browsers-ucsc-training&catid=81:training-pages&Itemid=190
The NCBI Boot Camp for Beginners was designed to offer an overview of the NCBI suite of resources. In the first half of the presentation, highlighted databases were covered in four main categories: literature, sequences, genes & genomes and expression & structure. The second half of the class used the apolipoprotein A as a query that was explored through many of the NCBI databases, from identifying the reference sequences to a structural analysis of the Cys130Arg variant.
The National Center for Biotechnology Information is part of the United States National Library of Medicine, a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper.
Archive of experimentally determined 3D structures of biological macromolecules.
Established in 1971, by Research Collaboratory for Structural Bioinformatics (RCSB), Brookhaven National Laboratories, USA.
Archive contain atomic coordinates, bibliographic citations, primary and secondary structure information, crystallographic structure factors, NMR experimental data.
WHAT IS BIOINFORMATICS?
Computational Biology/Bioinformatics is the application of computer sciences and allied technologies to answer the questions of Biologists, about the mysteries of life. It has evolved to serve as the bridge between:
Observations (data) in diverse biologically-related disciplines and
The derivations of understanding (information)
APPLICATIONS OF BIOINFORMATICS
Computer Aided Drug Design
Microarray Bioinformatics
Proteomics
Genomics
Biological Databases
Phylogenetics
Systems Biology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
A half day course presented during the Earlham Institute summer school on bioinformatics 2016, in Norwich, UK, http://www.earlham.ac.uk/earlham-institute-summer-school-bioinformatics
This seminar aims at answering the question of what to make of the identified variants, specifically how to evaluate the quality, prioritize and functionally annotate the variants.
The Application of the Human Phenotype Ontology mhaendel
Presented at the II International Summer School for Rare Disease and Orphan Drug Registries, September 15-19, 2014, Organized by the National Centre for Rare Diseases
Istituto Superiore di Sanità (ISS), Rome, Italy.
Note the extensive contribution by many consortium members and partners listed in the acknowledgements slide.
A preview of Microsoft Windows Vista. A look at the features to be introduced in the operating system when it debuts , two months later, on December 1, 2006.
The UCSC genome browser: A Neuroscience focused overviewVictoria Perreau
An self guided tutorial based overview of the UCSC genome browser for accessing public neuroscience data, in particular data from the ENCODE project. Including additional transcriptomic resources for the Neurosciences.
Ensembl Plants: Visualising, mining and analysing crop genomics dataDan Bolser
Ensembl Plants is a genome centric platform for visualisation and analysis of plant genomics data. It hosts assembly, sequence, expression, variation and comparative datasets for a growing number of plant species (currently 26) covering a range of economically important crops, including brassica, tomato, grape, barley, potato, maize and wheat, and taxonomically diverse model organisms. The web-based genome browser visually integrates sequence and assembly information with genes, markers, probes, repeats and other public or user-supplied datasets. It includes a web-based data mining tool, allowing specific sets of data to be queried and downloaded for offline analysis. In addition to the browser, all data can be accessed computationally via extensive Perl and REST APIs and is available for FTP download or direct database access.
Event: Plant and Animal Genomes conference 2012
Speaker: Rachael Huntley
The Gene Ontology (GO) is a well-established, structured vocabulary used in the functional annotation of gene products. GO terms are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently, GO consists of more than 35,000 terms describing the molecular function, biological process and subcellular location of a gene product in a generic cell. The UniProt-Gene Ontology Annotation (UniProt-GOA) database1 provides high-quality manual and electronic GO annotations to proteins within UniProt. By annotating well-studied proteins with GO terms and transferring this knowledge to less well-studied and novel proteins that are highly similar, we offer a valuable contribution to the understanding of all proteomes. UniProt-GOA provides annotated entries for over 387,000 species and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. Annotation files for various proteomes are released each month, including human, mouse, rat, zebrafish, cow, chicken, dog, pig, Arabidopsis and Dictyostelium, as well as a file for the multiple species within UniProt. The UniProt-GOA dataset can be queried through our user-friendly QuickGO browser2 or downloaded in a parsable format via the EBI3 and GO Consortium FTP4 sites. The UniProt-GOA dataset has increasingly been integrated into tools that aid in the analysis of large datasets resulting from high-throughput experiments thus assisting researchers in biological interpretation of their results. The annotations produced by UniProt-GOA are additionally cross-referenced in databases such as Ensembl and NCBI Entrez Gene.
1 http://www.ebi.ac.uk/GOA
2 http://www.ebi.ac.uk/QuickGO
3 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa
4 ftp://ftp.geneontology.org/pub/go/gene-associations
Precise elucidation of the many different biological features encoded in any genome requires careful examination and review by researchers, who gather and evaluate the available evidence to corroborate and modify gene predictions and other biological elements. This curation process allows them to resolve discrepancies and validate automated gene model hypotheses and alignments. This approach is the well-established practice for well-known genomes such as human, mouse, zebrafish, Drosophila, et cetera. Desktop Apollo was originally developed to meet these needs.
The cost of sequencing a genome has been dramatically reduced by several orders of magnitude in the last decade, and the natural consequence is that more and more researchers are sequencing more and more new genomes, both within populations and across species. Because individual researchers can now readily sequence many genomes of interest, the need for a universally accessible genomic curation tool logically follows. Each new exome or genome sequenced requires visualization and curation to obtain biologically accurate genomic features sets, even for limited set of genes, because computational genome analysis remains an imperfect art. Additionally, unlike earlier genome projects, which had the advantage of more highly polished genomes, recent projects usually have lower coverage. Therefore researchers now face additional work correcting for more frequent assembly errors and annotating genes split across multiple contigs.
Genome annotation is an inherently collaborative task; researchers only very rarely work in isolation, turning to colleagues for second opinions and insights from those with with expertise in particular domains and gene families. The new JavaScript based Apollo, allows researchers real-time interactivity, breaking down large amounts of data into manageable portions to mobilize groups of researchers with shared interests. We are also focused on training the next generation of researchers by reaching out to educators to make these tools available as part of curricula via workshops and webinars, and through widely applied systems such as iPlant and DNA Subway. Here we offer details of our progress.
Presentation at Genome Informatics, Session (3) on Databases, Data Mining, Visualization, Ontologies and Curation.
Authors: Monica C Munoz-Torres, Suzanna E. Lewis, Ian Holmes, Colin Diesh, Deepak Unni, Christine Elsik.
ICAR 2015
Workshop 10 (TUESDAY, JULY 7, 2015, 4:30-6:00 PM)
The Arabidopsis information portal for users and developers
Nick Provart (University of Toronto)
A Community Collaborator Perspective: Case study 1 - BioAnalytic Resource
Event: Plant and Animal Genomes Conference 2012.
Speaker: Bert Overduin
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) provides a comprehensive record of the world's nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. Major components of ENA include the Sequence Read Archive (SRA) for next generation data and EMBL-Bank for assembled and annotated sequences. ENA works closely together with NCBI and DDBJ as partners in the International Nucleotide Sequence Database Collaboration (INSDC). Data arrive at ENA from a variety of sources. These include submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centres and routine and comprehensive exchange with our INSDC partners. Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and provides a portfolio of interactive and programmatic submission services to ensure the smoothest flow possible of data into the public domain. ENA data can be searched using rapid sequence similarity and text search services provided both within web-based tools and under programmatic interfaces. Data can be retrieved in a variety of appropriate widely adopted formats through a web browser and extensive REST services. This presentation will consist of an introduction to ENA, followed by a short demonstration of the various ways data can be browsed and retrieved.
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesMonica Munoz-Torres
Precise elucidation of the many different biological features encoded in a genome requires a careful curation process that involves reviewing all available evidence to allow researchers to resolve discrepancies and validate automated gene models, protein alignments, and other biological elements. Genome annotation is an inherently collaborative task; researchers only rarely work in isolation, turning to colleagues for second opinions and insights from those with expertise in particular domains and gene families.
The i5k initiative seeks to sequence the genomes of 5,000 insect and related arthropod species. The selected species are known to be important to worldwide agriculture, food safety, medicine, and energy production as well as many used as models in biology, those most abundant in world ecosystems, and representatives in every branch of the insect phylogeny in an effort to better understand arthropod evolution and phylogeny. Because computational genome analysis remains an imperfect art, each of these new genomes sequenced will require visualization and curation.
Apollo is an instantaneous, collaborative, genome annotation editor, and the new JavaScript based version allows researchers real-time interactivity, breaking down large amounts of data into manageable portions to mobilize groups of researchers with shared interests. The i5K is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process and Apollo is serving as the platform to empower this community. Here we offer details about this collaboration.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
1. Genome Resources at the EBI -
Ensembl and Ensembl Genomes
Bert Overduin, Ph.D.
PAG XX, January 15th 2012, San Diego
EBI is an Outstation of the European Molecular Biology Laboratory.
EBI Database Workshop
2. Outline
• Introduction to Ensembl / Ensembl Genomes
• Highlights in 2011
• Demo 1: Browser basics
• Demo 2: Variant Effect Predictor
• Demo 3: Adding custom tracks
• Demo 4: BioMart
• Future plans for 2012
• Help & Workshops
• Acknowledgements
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
3. Goal
To provide access to genome-scale data from
completely sequenced species of scientific
interest from across the taxonomy
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
5. Species Ensembl
Primates
Rodents etc.
Laurasiatheria
Afrotheria
Xenartha
Other mammals
Birds & reptiles
Amphibians
Fish
Other chordates
Other eukaryotes
On Pre! Ensembl
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
7. Annotation
• Inclusion of species depends on various criteria (model organism?
community interest / demand? funding? completeness / quality of
genome assembly?)
• A broad taxonomic coverage is aimed for
• Annotation in-house by the Ensembl team
• Annotation preferably by or in collaboration with the scientific
community for the species in question
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
8. Ensembl genebuild
Genome
assembly
+ Genebuild pipeline
Ensembl
Experimental Genes
evidence
(cDNAs & proteins)
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
9. Data
• Genomic sequence
• Gene/transcript/protein models
• External references
• Mapped cDNAs, proteins, microarray probes, BACs, cytogenetic
bands, markers, repeats etc.
• Comparative data: orthologs and paralogs, protein families, whole
genome alignments, syntenic regions
• Variation data: sequence variants, structural variants
• Regulatory data: “best guess” set of regulatory elements
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
10. Access to data
• Web browser http://www.ensembl.org
(with US West, US East and Asia mirrors
and Pre! and Archive! sites)
http://www.ensemblgenomes.org
• BioMart http://www.biomart.org
• FTP ftp.ensembl.org/pub
ftp.ensemblgenomes.org/pub
• Public MySQL server ensembldb.ensembl.org:5306:anonymous
mysql.ebi.ac.uk:4157:anonymous
• Ensembl API http://www.ensembl.org/info/docs/api
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
11. Highlights in 2011
• Genebuilds for turkey and cod
• Genebuild on new cow assembly (UMD 3.1)
• Added rabbit to whole-genome multiple alignments
• 3-way avian whole-genome alignment and constrained elements
(chicken, turkey, zebra finch)
• Variation db for cat (dbSNP127)
• Updated variation data for cow (dbSNP133), dog (DGVa), pig (Illumina
PorcineSNP60 Bead Chip, DGVa)
• Improved Variant Effect Predictor (VEP) and failed variation pipeline
• Sortable tracks, saving of configurations and configuration sets
• Support for large file formats (BAM, BigWig, VCF)
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
12. Highlights in 2011
• 31 new species
• Plants: Chlamydomonas reinhardtii, Cyanidioschyzon merolae, Glycine
max, Oryza glaberrima, Selaginella moellendorffii
• Fungal plant pathogens: Ashbya gossypii, Fusarium oxysporum,
Gibberella moniliformis, Gibberella zeae, Mycosphaerella graminicola,
Nectria haematococca, Phaeosphaeria nodorum, Puccinia triticina,
Ustilago maydis
• Oomycete plant pathogens: Phytophthora infestans, Phytophthora
ramorum, Phytophthora sojae, Pythium ultimum
• Active collaborations within PhytoPath (http://www.phytopathdb.org/)
and PomBase projects
• Variation db for Arabidopsis thaliana contains over 14 million variants
from over 1600 strains
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
13. Demo 1 - Browser basics
Background:
The CYN gene encodes cyanate hydratase,
an enzyme found in bacteria and plants that
catalyses the reaction of cyanate with
bicarbonate to produce ammonia and
carbon dioxide:
NCO- + HCO3- + 2H+ = NH3 + 2CO2
Task:
Explore the CYN gene of Vitis vinifera
(grape).
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
14. Variant Effect Predictor (VEP)
• Predicts functional consequences of variants on Ensembl genes
• Web interface, standalone Perl script and Perl API
• Accepts tab-delimited, VCF and pileup format as input
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
16. Adding custom tracks
• Upload data to Ensembl (5 MB size limit) or attach file on web-
accessible server (http or ftp) to Ensembl (no size limit)
• Possible formats:
BAM sequence alignments (no upload)
BED genes / features
BedGraph continuous-valued data
BigWig continuous-valued data (no upload)
GBrowse genes / features
GFF genes / features
GTF genes / features
PSL sequence alignments
VCF variants (no upload)
WIG continuous-valued data
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
17. Demo 3 - Adding custom tracks
Background:
The file SRR070570.bam contains alignments
of Illumina RNAseq reads from a wildtype
Arabidopsis thaliana strain.
The bam file and its bam.bai index file are
located at http://www.ebi.ac.uk/~bert/.
Task:
Attach SRR070570.bam to Ensembl Genomes.
Check the expression of a constitutive and a
non-constitutive Arabidopsis gene, e.g.
RBCS1A (ribulose bisphosphate carboxylase
small chain 1A) and PR1 (pathogenesis-related
protein 1).
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
18. BioMart
• Data retrieval tool
• Originally developed for Ensembl (EnsMart)
• Now used by many large data resources
• Integrated with several widely used software packages
• Joint project between the European Bioinformatics Institute (EBI)
and the Ontario Institute for Cancer Research (OICR)
• Website : http://www.biomart.org
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
19. Principle
• Step 1 – Dataset
Choose your dataset
• Step 2 – Filters
Limit your dataset
• Step 3 – Attributes
Specify what information you want to output
• Step 4 – Results
Preview and output your results
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
20. Demo 4 - BioMart
Background:
“Lactation” (GO:0007595) is the
Gene Ontology (GO) term for the
biological process of “the secretion
of milk by the mammary gland”.
Task:
Retrieve all cow genes that are
annotated with the GO term
“lactation”.
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
21. Future plans for 2012
• Genebuilds for duck (?), salmon (?), sheep (?), tilapia
• Genebuilds on new assemblies for cat (Felis_catus-6.2), chicken
(Gallus_gallus-4.0), dog (CanFam3.1), pig (Sscrofa10.2)
• Include RNAseq data in genebuild
• VEP support for structural variants
• New BLAST/BLAT interface
• http://www.ensembl.info/roadmap
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
22. Future plans for 2012
• New species: barley, Brassica (from BrassEnsembl), foxtail millet,
Oryza brachyanta, potato, tomato, Gaeumannomyces graminis,
Magnaporthe oryzae, Magnaporthe poae, tsetse fly
• New assemblies: maize (B73_RefGen_v3), Oryza sativa ssp.
japonica cu. Nipponbare (Os-Nipponbare-Reference-IRGSP-1.0;
IRGSP1.0), poplar
• Variation db and new gene annotation for wheat stem rust pathogen
• New query interface for data re plant-fungal pathogen interactions
(PhytoPath; http://www.phytopathdb.org/)
• Widened development of community annotation pipelines
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
23. Help
• Helpdesk:
helpdesk@ensembl.org
helpdesk@ensemblgenomes.org
• Mailing lists:
http://www.ensembl.org/info/about/contact/mailing.html
http://plants.ensembl.org/info/about/contact/mailing.html
• Ensembl YouTube and YouKu ( ) channels:
http://www.youtube.com/user/EnsemblHelpdesk
http://u.youku.com/user_show/uid_Ensemblhelpdesk
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
24. EBI Train online
http://www.ebi.ac.uk/training/online/course/ensembl-browsing-chordate-genomes
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
26. Workshops
• Browser (0.5-2 days) and API (1-3 days) workshops
• Combination of lectures and hands-on exercises
• Advertised on http://www.ensembl.info/workshops/calendar/
• You can host your own workshop!
• For academic institutions there is, apart from the instructor’s
expenses, no fee
• You only need a computer room and participants
• You can get more info from me (bert@ebi.ac.uk) or at the EBI booth
(302)
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
27. Stay in touch
• Blog:
http://www.ensembl.info
• Facebook:
http://www.facebook.com/Ensembl.org
• Twitter:
http://twitter.com/Ensembl
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
28. Acknowledgements
• WTSI • CADRE • OICR
• Gramene
• VectorBase
• WormBase
• PomBase
• Wellcome Trust • EMBL
• NIH-NHGRI • BBSRC
• EMBL • Wellcome Trust
• EU • Bill and Melinda Gates
Foundation
• EU
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
29. Acknowledgements
Paul Flicek, Ridwan Amode, Daniel Barrell, Kathryn Beal, Simon Brent, Denise Carvalho-Silva,
Clapham P, Guy Coates, Susan Fairley, Stephen Fitzgerald, Laurent Gil, Leo Gordon, Maurice
Hendrix, Thibaut Hourlier, Nathan Johnson, Andreas Kähäri, Damian Keefe, Stephen Keenan,
Rhoda Kinsella, Monika Komorowska, Gautier Koscielny, Eugene Kulesha, Pontus Larsson,
Ian Longden, Will McLaren, Matthieu Muffato, Bert Overduin, Miguel Pignatelli, Bethan
Pritchard, Harpreet Riat, Graham Ritchie, Magali Ruffier, Michael Schuster, Daniel Sobral,
Amy Tang, Kieron Taylor, Stephen Trevanion, Jana Vandrovcova, Simon White, Mark Wilson,
Steven Wilder, Bronwen Aken, Ewan Birney, Fiona Cunningham, Ian Dunham, Richard Durbin,
Xosé Fernández-Suarez, Jennifer Harrow, Javier Herrero, Tim Hubbard, Anne Parker, Glenn
Proctor, Giulietta Spudich, Jan Vogel, Andy Yates, Amonida Zadissa, Steve Searle
Paul Kersey, Dan Staines, Dan Lawson, Eugene Kulesha, Paul Derwent, Jay Humphrey,
Daniel Hughes, Stephen Keenan, Arnaud Kerhornou, Gautier Koscielny, Nick Langridge, Mark
McDowall , Karyn Megy, Uma Maheswari, Michael Nuhn, Michael Paulini, Helder Pedro, Iliana
Toneva, Derek Wilson, Andy Yates, Ewan Birney
30. Posters
P941
Genome Annotation in Ensembl
Susan Fairley
P942
Ensembl Plants: An Integrating Resource for Plant
Genomics and Variation
Paul Kersey
PAG XX, January 15th 2012, San Diego
EBI Database Workshop
31. Training courses Careers
Meet the experts
Brochures and factsheets
Come and see us at booth 302!
PhD and post doc opportunities
Industry programme
Research and services
Visitor’s programme
PAG XX, January 15th 2012, San Diego
EBI is an Outstation of the European Molecular Biology Laboratory.
EBI Database Workshop
32. PDF of this presentation
http://www.ebi.ac.uk/~bert/past_workshops.html
PAG XX, January 15th 2012, San Diego
EBI Database Workshop