This document discusses enabling phylogenetic data to be more accessible to non-specialists. It describes current barriers like technical obstacles in data standards and social obstacles of data hoarding. The National Evolutionary Synthesis Center (NESCent) aims to address these issues through various initiatives. This includes developing ontologies, databases, and software to integrate phylogenetic and phenotypic data, as well as promoting open development practices.
This document discusses linking data on the semantic web. It questions when data providers will value links enough to consistently create and maintain them between resources. It also questions how to link data in the absence of persistent identifiers. Specifically, it raises challenges around making persistent links without identifiers that remain constant over time.
Working with Trees in the Phyloinformatic Age. WH PielRoderic Page
The document discusses various methods for querying and searching phylogenetic tree databases, including:
- Storing tree data in different formats like path enumerations, nested sets, and adjacency lists
- Using techniques like transitive closure and shortest path algorithms to find relationships between nodes
- Implementing tree queries using SQL against a BioSQL database with a phylo extension
- Developing more complex queries to find trees or subtrees that match certain criteria based on included and excluded nodes
Phyloinformatics is the linking of biodiversity data together, such as information on Apomys specimens, to integrate various data sources and learn more about organisms than what is known from any single source. Key aspects of phyloinformatics include linking taxonomy, geography, phylogeny, and extracting information through data and text mining to build a more complete understanding of organisms.
The document discusses a Russian man named Nikolay who wants to be rich and healthy. He buys an iPad and installs an app called "VHS-machine for iPhone&iPad" that allows him to watch physical and spiritual exercise videos by Dr. Volz from a VHS tape. By watching the videos on his iPad, Nikolay becomes both rich and healthy.
Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...Roderic Page
Isthmohyla rivularis is a rare species of frog found in Costa Rica and Panama. It lives along fast-moving streams in rainforests. The species was thought to be extinct in the 1980s but was rediscovered in 2007 in Costa Rica and spotted again in 2008.
This document discusses linking data on the semantic web. It questions when data providers will value links enough to consistently create and maintain them between resources. It also questions how to link data in the absence of persistent identifiers. Specifically, it raises challenges around making persistent links without identifiers that remain constant over time.
Working with Trees in the Phyloinformatic Age. WH PielRoderic Page
The document discusses various methods for querying and searching phylogenetic tree databases, including:
- Storing tree data in different formats like path enumerations, nested sets, and adjacency lists
- Using techniques like transitive closure and shortest path algorithms to find relationships between nodes
- Implementing tree queries using SQL against a BioSQL database with a phylo extension
- Developing more complex queries to find trees or subtrees that match certain criteria based on included and excluded nodes
Phyloinformatics is the linking of biodiversity data together, such as information on Apomys specimens, to integrate various data sources and learn more about organisms than what is known from any single source. Key aspects of phyloinformatics include linking taxonomy, geography, phylogeny, and extracting information through data and text mining to build a more complete understanding of organisms.
The document discusses a Russian man named Nikolay who wants to be rich and healthy. He buys an iPad and installs an app called "VHS-machine for iPhone&iPad" that allows him to watch physical and spiritual exercise videos by Dr. Volz from a VHS tape. By watching the videos on his iPad, Nikolay becomes both rich and healthy.
Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...Roderic Page
Isthmohyla rivularis is a rare species of frog found in Costa Rica and Panama. It lives along fast-moving streams in rainforests. The species was thought to be extinct in the 1980s but was rediscovered in 2007 in Costa Rica and spotted again in 2008.
This document discusses the importance of making data "sticky" by using shared identifiers that create links between data sources. It notes that identifiers should be globally unique, resolvable by both humans and machines, and widely used. Examples of shared identifiers that enable discovery and metrics include DOIs, specimen identifiers from databases like GBIF, and citations. The value, it argues, comes from the links between nodes, not just the nodes themselves.
Трекшн-карта позволяет находить узкие места вашего бизнеса, воздействие на которые дает максимальный эффект при минимальных усилиях, и привести ваш стартап к состоянию, когда он готов к масштабированию.
Трекшн карта, HADI циклы, узкое место и цикл улучшения бизнеса, проблемное интервью и сегментация клиентов. Видео этой лекции - на http://startupmagic.ru
This document discusses the need for annotation of genomic data given the deluge of information from next generation sequencing. It outlines that clinical-grade annotation is important for application. Many sources of annotation are discussed, including databases, literature, testing labs, and crowdsourcing. However, it emphasizes that specialized human curation remains essential for high quality annotation.
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseHilmar Lapp
Jane's lab uses a freely available open source software package called PhyloDOM to create a phylogenetic database of their molecular data resolving the phylogeny of endangered frog species, allowing their results to be easily shared and integrated by other researchers through web interfaces, data aggregators, and visualization tools that take advantage of standardized metadata.
Keynote at the AI in Medicine Conference (AIME 2005), giving an overview of the work in Ontology Mapping to people in Medical Informatics (which includes explaining the what and why of ontologies in general).
This document discusses teaching biology at the Institute of Biology at the University of the Philippines, Diliman. It covers several topics:
1. The challenges of teaching the broad topic of biology and helping students retain key concepts rather than just facts.
2. Examples of how biological concepts like bioinformatics and biogeography are taught, including using databases, molecular phylogeny, and geographic patterns of species distribution.
3. Different student reactions to concepts like evolution and how the teacher addresses these concepts by laying out facts and asking students to critically analyze the evidence.
Biological databases are collections of experimental and theoretical biological data that are organized so their contents can be easily accessed, managed, updated, and retrieved. The activity of preparing a database can be divided into collecting data in an accessible form and making it available to a multi-user system. Two important biological databases are GenBank, which contains publicly available nucleotide and protein sequences, and the Protein Data Bank, which houses 3D structures of proteins, nucleic acids, and carbohydrates.
The document discusses DataONE, a project aimed at improving data repository interoperability and advancing best practices in data lifecycle management. It focuses on enabling access to multiple external data repositories from within a HUB environment. This would allow users to aggregate and integrate disparate datasets for new analyses, and enable reproducible workflows. The goal is to address issues around scattered and dispersed data by improving discovery, integration and long-term preservation of datasets.
This document discusses opportunities and constraints related to DNA sequencing and analysis. It describes how DNA sequencing is used in academic research, oncology, gene therapy, developing genetically modified organisms, clinical diagnosis, forensics, and pedigree analysis. It also outlines some of the agencies and databases involved and how the capability and cost of sequencing has grown exponentially over time. Finally, it discusses some of the practical constraints in analyzing large DNA sequence data, including reading frames, exons/introns, errors, and the significance of non-coding DNA.
The Sanger Mouse Resources Portal - A Testbed for Collaborative Data IntegrationDarren Oakley
The document describes the Sanger Mouse Resources Portal, an attempt at a federated approach to creating a collaborative data portal for mouse genomic data. The portal aggregates data from 5 sources using a search engine and data services that allow each group to host their own data and expose it via defined interfaces. This avoids any single group having total control while allowing new data to be easily added. However, it also risks redundancy and lacks centralized curation of the whole collection.
Global Biodiversity Information Facility (GBIF) - 2012Dag Endresen
Presentation of the Global Biodiversity Information Facility (GBIF) and GBIF Norway for the Department of Technical and Scientific Conservation (CONSERV) at the Natural History Museum, University of Oslo. Tøyen, Oslo, 7 November 2012.
This document outlines the goals and activities for a classroom unit on DNA, heredity, and genetics. The unit includes lectures, discussions, and hands-on labs. Students will build DNA models, analyze family trees, and conduct virtual genetics labs. They will also run plasmid and gel electrophoresis labs to visualize and analyze DNA. The unit aims to help students understand how genes influence traits and to think critically about biotechnology and its impact on society. A final project challenges students to design a GMO and debate the ethics of genetic engineering.
Ontologies for the Real World by Deborah L. McGuinness. Invited talk for the 2011 Future Worlds Microsoft Faculty Summit in the Semantic Knowledge for Commodity Computing.
This document discusses the importance of making data "sticky" by using shared identifiers that create links between data sources. It notes that identifiers should be globally unique, resolvable by both humans and machines, and widely used. Examples of shared identifiers that enable discovery and metrics include DOIs, specimen identifiers from databases like GBIF, and citations. The value, it argues, comes from the links between nodes, not just the nodes themselves.
Трекшн-карта позволяет находить узкие места вашего бизнеса, воздействие на которые дает максимальный эффект при минимальных усилиях, и привести ваш стартап к состоянию, когда он готов к масштабированию.
Трекшн карта, HADI циклы, узкое место и цикл улучшения бизнеса, проблемное интервью и сегментация клиентов. Видео этой лекции - на http://startupmagic.ru
This document discusses the need for annotation of genomic data given the deluge of information from next generation sequencing. It outlines that clinical-grade annotation is important for application. Many sources of annotation are discussed, including databases, literature, testing labs, and crowdsourcing. However, it emphasizes that specialized human curation remains essential for high quality annotation.
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseHilmar Lapp
Jane's lab uses a freely available open source software package called PhyloDOM to create a phylogenetic database of their molecular data resolving the phylogeny of endangered frog species, allowing their results to be easily shared and integrated by other researchers through web interfaces, data aggregators, and visualization tools that take advantage of standardized metadata.
Keynote at the AI in Medicine Conference (AIME 2005), giving an overview of the work in Ontology Mapping to people in Medical Informatics (which includes explaining the what and why of ontologies in general).
This document discusses teaching biology at the Institute of Biology at the University of the Philippines, Diliman. It covers several topics:
1. The challenges of teaching the broad topic of biology and helping students retain key concepts rather than just facts.
2. Examples of how biological concepts like bioinformatics and biogeography are taught, including using databases, molecular phylogeny, and geographic patterns of species distribution.
3. Different student reactions to concepts like evolution and how the teacher addresses these concepts by laying out facts and asking students to critically analyze the evidence.
Biological databases are collections of experimental and theoretical biological data that are organized so their contents can be easily accessed, managed, updated, and retrieved. The activity of preparing a database can be divided into collecting data in an accessible form and making it available to a multi-user system. Two important biological databases are GenBank, which contains publicly available nucleotide and protein sequences, and the Protein Data Bank, which houses 3D structures of proteins, nucleic acids, and carbohydrates.
The document discusses DataONE, a project aimed at improving data repository interoperability and advancing best practices in data lifecycle management. It focuses on enabling access to multiple external data repositories from within a HUB environment. This would allow users to aggregate and integrate disparate datasets for new analyses, and enable reproducible workflows. The goal is to address issues around scattered and dispersed data by improving discovery, integration and long-term preservation of datasets.
This document discusses opportunities and constraints related to DNA sequencing and analysis. It describes how DNA sequencing is used in academic research, oncology, gene therapy, developing genetically modified organisms, clinical diagnosis, forensics, and pedigree analysis. It also outlines some of the agencies and databases involved and how the capability and cost of sequencing has grown exponentially over time. Finally, it discusses some of the practical constraints in analyzing large DNA sequence data, including reading frames, exons/introns, errors, and the significance of non-coding DNA.
The Sanger Mouse Resources Portal - A Testbed for Collaborative Data IntegrationDarren Oakley
The document describes the Sanger Mouse Resources Portal, an attempt at a federated approach to creating a collaborative data portal for mouse genomic data. The portal aggregates data from 5 sources using a search engine and data services that allow each group to host their own data and expose it via defined interfaces. This avoids any single group having total control while allowing new data to be easily added. However, it also risks redundancy and lacks centralized curation of the whole collection.
Global Biodiversity Information Facility (GBIF) - 2012Dag Endresen
Presentation of the Global Biodiversity Information Facility (GBIF) and GBIF Norway for the Department of Technical and Scientific Conservation (CONSERV) at the Natural History Museum, University of Oslo. Tøyen, Oslo, 7 November 2012.
This document outlines the goals and activities for a classroom unit on DNA, heredity, and genetics. The unit includes lectures, discussions, and hands-on labs. Students will build DNA models, analyze family trees, and conduct virtual genetics labs. They will also run plasmid and gel electrophoresis labs to visualize and analyze DNA. The unit aims to help students understand how genes influence traits and to think critically about biotechnology and its impact on society. A final project challenges students to design a GMO and debate the ethics of genetic engineering.
Ontologies for the Real World by Deborah L. McGuinness. Invited talk for the 2011 Future Worlds Microsoft Faculty Summit in the Semantic Knowledge for Commodity Computing.
The BioMANTA Network project aims to develop a Semantic Web infrastructure for computational modelling and analysis of large-scale protein-protein interaction and compound activity networks. This will involve creating a Semantic Interactome Model using an OWL-based ontology to represent public interaction data. The goals are to perform network inference and knowledge discovery through network meta-analysis and global network inference. Results will be visualized using COBALT software and high quality ontologies and data sets will be produced.
The document discusses gene tree reconciliation, which involves projecting gene trees onto a species tree to account for evolutionary events like gene duplications, losses, and horizontal transfer. It outlines existing cyberinfrastructure for generating and visualizing reconciliations, and proposes ways to extend this, such as allowing users to submit their own gene trees and alignments for reconciliation, integrating visualization tools, and storing multiple reconciliations per gene tree. A goal is to "make tree reconciliation phylotastic" by building components to allow users more flexibility in generating reconciliations from their own data.
Unison: Enabling easy, rapid, and comprehensive proteomic miningReece Hart
Unison is an online database and data integration platform that aggregates proteomic and genomic data from multiple sources and provides over 200 million precomputed predictions on protein sequences, domains, structures, and more. It aims to enable easy, rapid, and comprehensive proteomic mining through semantic integration of distinct data types and automated querying of predictions. Custom data mining projects using Unison have led to discoveries about proteins like Bcl-2 that regulate apoptosis.
ESI Supplemental Webinar 2 - DataONE presentation slides DuraSpace
This document provides an overview of a webinar on DataONE, a project that aims to provide tools and approaches for supporting the data life cycle. The webinar covered three key challenges in data management: preservation and planning, discovery, and innovation. It discussed how DataONE is working to address these challenges through its coordinated network of member nodes that allow for data preservation, sharing and discovery. The webinar also demonstrated some of DataONE's tools like the DMPTool for data management planning and the Investigator Toolkit for data analysis and visualization.
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesMonica Munoz-Torres
Precise elucidation of the many different biological features encoded in a genome requires a careful curation process that involves reviewing all available evidence to allow researchers to resolve discrepancies and validate automated gene models, protein alignments, and other biological elements. Genome annotation is an inherently collaborative task; researchers only rarely work in isolation, turning to colleagues for second opinions and insights from those with expertise in particular domains and gene families.
The i5k initiative seeks to sequence the genomes of 5,000 insect and related arthropod species. The selected species are known to be important to worldwide agriculture, food safety, medicine, and energy production as well as many used as models in biology, those most abundant in world ecosystems, and representatives in every branch of the insect phylogeny in an effort to better understand arthropod evolution and phylogeny. Because computational genome analysis remains an imperfect art, each of these new genomes sequenced will require visualization and curation.
Apollo is an instantaneous, collaborative, genome annotation editor, and the new JavaScript based version allows researchers real-time interactivity, breaking down large amounts of data into manageable portions to mobilize groups of researchers with shared interests. The i5K is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process and Apollo is serving as the platform to empower this community. Here we offer details about this collaboration.
The hippocampus receives input from the entorhinal cortex and sends projections to multiple targets in the brain. Its main outputs are to the subiculum, which projects to regions like the nucleus accumbens, amygdala, and medial prefrontal cortex. The hippocampus plays an important role in memory formation and spatial navigation.
DNA profiling uses regions of non-coding DNA that vary between individuals, called short tandem repeats (STRs), to identify a person from their DNA. STR analysis has replaced older DNA fingerprinting techniques. It can analyze small amounts of DNA quickly and is used in forensics, parentage testing, and other applications. Parentage testing involves collecting samples from a child and putative parents, analyzing STR regions through PCR and electrophoresis, and using statistical analysis to calculate probability of paternity.
Presented during the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'12). Part of the workshop 'New Models and Modes for Data Sharing: Experiences from Neuroscience'. Presented by Jeffrey S. Grethe, Ph.D. from the Center for Research in Biological Systems at the University of California, San Diego.
This workshop featured several large scale efforts to establish data sharing platforms, standards and tools to promote data intensive analysis in the neurosciences. As we head into the second decade of the 21st century, many scientists realize that current methods for publishing and accessing data are outmoded and inefficient. Neuroscience, with its large diverse and highly competitive community, has been slow to adopt more open sharing of data and has lacked effective tools to do so. There has been a significant investment in databases and tools for biological science, and frequent calls for more of them, but few calls to the biological community to adopt practices and frameworks for making their resources more easily discoverable and data more accessible. Data are contained within diverse sources, from web pages, databases, literature to personal lab systems, making for a haphazard mechanism for data and tool discovery. Although these mechanisms are effective for small communities, they are parochial for the totality of resources available, leading to fragmentation in the resource ecosystem. Neuroscience, with its diverse subdisciplines, complex data types and broad domain, presents the perfect exemplar of the current practices, bottlenecks and issues surrounding open access to data. This situation is changing, however, as groups have started to work together to define new models and tools for sharing and analyzing neuroscience data on an international scale. In this workshop, we bring together experts from national and international projects to discuss issues of data access and progress towards establishing platforms and best practices for effective sharing of neuroscience data in support of basic and clinical neuroscience.
Similar to Data Mining GenBank for Phylogenetic inference - T. Vision (20)
This document describes ALEC (A List of Everything Cool), a project that aims to create a "Bibliography of Life" containing information on every taxonomic paper, taxonomist, and species in Wikidata. It provides an overview of the current status and capabilities of ALEC, including live queries to Wikidata and examples of information that can be viewed. It also discusses the major tools used, challenges faced populating Wikidata with bibliographic data, and plans for future improvements to the interface and expanding the data in Wikidata.
Wikidata and the Biodiversity Knowledge GraphRoderic Page
Wikidata could serve several roles in building a biodiversity knowledge graph:
1) Wikidata could provide identifiers, labels, and a way to query relationships (nodes and edges) between biodiversity items to form the basis of a knowledge graph.
2) Individual biodiversity databases could link their data to relevant Wikidata items while maintaining their own specialized knowledge graphs.
3) Wikidata could serve as a centralized source of both core biodiversity information and related contextual information like people, organizations, and locations, essentially becoming the biodiversity knowledge graph itself due to its large community and scope.
This document discusses challenges with accessing and processing biodiversity literature and describes efforts to make literature from Biodiversity Heritage Library (BHL) more accessible and machine-readable. BioStor extracts text and figures from over 200,000 BHL articles and makes them searchable online. Efforts are underway to link BHL content to databases and Wikidata to connect literature to species and other entities. The goal is to extract more articles from BHL, enable geographic searches of content, break articles into smaller pieces, and embed literature within a biodiversity knowledge graph.
Ozymandias - from an atlas to a knowledge graph of living AustraliaRoderic Page
This document discusses building a knowledge graph called Ozymandias to link biodiversity data for Australian fauna. It describes linking specimens, publications, researchers and more using identifiers. Demos are proposed to link the Australian Fauna Database to literature to add images and information about taxonomists. Knowledge graphs integrate diverse data sources and make more connections and information visible than traditional databases. They rely on unique identifiers for people, publications and other objects.
SLiDInG6 talk on biodiversity knowledge graphRoderic Page
This document discusses building knowledge graphs by connecting various data sources using identifiers and vocabularies. It identifies technical and social obstacles to doing so, including the need for globally unique identifiers, agreed-upon vocabularies, and standards for transmitting the graph. It notes that some of these obstacles are being overcome with solutions like DOIs, JSON-LD, and Wikidata, but that measuring progress on connectivity is more difficult than linear growth. It concludes that knowledge graphs already exist but are not evenly distributed, and that any combined graph should be free, open, and used for good.
Wild idea for TDWG17 Bitcoins, biodiversity and micropaymentsRoderic Page
Open data is freely available with attribution as currency, while funding work and demonstrating value can be difficult. Closed data requires monetary access through subscriptions, where development is funded and value is obvious financially. The document proposes using micropayments of bitcoins for accessing raw or cleaned biodiversity data, with the community mining bitcoins to fund processing raw data into cleaned datasets.
Towards a biodiversity knowledge graphRoderic Page
The document discusses obstacles to and progress in building knowledge graphs. It outlines technical obstacles like needing globally unique identifiers and agreed-upon vocabularies. Social obstacles include economic issues. Identifiers are key to connecting knowledge graphs. Metrics are needed to measure graph connectivity rather than just growth. Network effects are important to make graphs truly useful. Wikidata and Google's knowledge graph are examples of existing large knowledge graphs.
The document discusses several topics related to taxonomy and biodiversity data. It summarizes that sequencing may be focused more on phylogeny than taxonomy. It also discusses visualizing data using tools like OneZoom and web maps. Another topic is rethinking how publications are linked and referenced, comparing current one-way links to the potential of two-way links. It proposes building a Biodiversity Knowledge Graph by linking data, descriptions, specimens, and publications. Finally, it suggests treating biodiversity data like source code by allowing edits and tracking changes.
In praise of grumpy old men: Open versus closed data and the challenge of cre...Roderic Page
The document discusses the history and evolution of linking in documents from printed floras to Xanadu, the World Wide Web, and modern annotations. It advocates for open data and open access to scientific literature to enable the creation of a 21st century digital flora that can embed multimedia content like descriptions, specimens, and data. Making scientific names and literature openly accessible remains a challenge.
This document summarizes the capabilities and potential of the Biodiversity Heritage Library (BHL) and BioStor resources. It discusses how BHL provides open access to digitized biodiversity literature but does not easily surface the journal articles. However, the data is downloadable and has APIs, allowing the articles within to be programmatically extracted. It proposes linking specimens to articles for improved findability and creating a "PubMed Central for biodiversity" to better organize and surface taxonomic facts and documents.
Digitisation involves making physical specimens, literature, and metadata available in digital form, including DNA barcodes. This allows joining up different types of biological data that were previously separate, such as specimens, published literature, and genetic sequences. Digitising collections allows broader access and use of biodiversity data through databases like GBIF, which has over 500 million occurrence records available online. However, different intellectual property policies can apply to literature, specimens, and genetic data. Efforts are underway to better link related biological records and publications through identifiers like ORCID for people, DOIs for publications, and LSIDs for plant names. The goal is to build a comprehensive knowledge graph that integrates all these different types of digital biological data.
Built in the 19th century, rebuilt for the 21stRoderic Page
The document discusses several topics related to digitization of biological data collections:
1) It describes how databases like GenBank rely on both experimentalist and natural history traditions by collecting and comparing natural facts from experiments.
2) Debates around creating GenBank in 1982 illuminated different moral economies regarding collecting/sharing data and attributing credit.
3) Both experimentalism and natural history traditions have shaped new ways of producing knowledge in life sciences through articulating these approaches in databases.
This document discusses two graphs related to taxonomy and barcoding. It then presents three potential responses: 1) Modernize taxonomy by digitizing specimens, literature, and linking data sources. 2) Imagine a world where only barcodes exist and how we would visualize biodiversity. 3) Integrate taxonomy and barcodes by linking names to barcodes using specimens as the connection point, though challenges remain around building complete trees and visualizing communities. Integrating data from sources like BOLD, GBIF, and literature would help but requires resolving issues of duplicate records and name interpretations.
This document discusses three problems related to linking specimens to evidence in biological databases: 1) linking specimens to DNA barcodes, sequences, and literature citations in databases like GBIF, 2) the relationship between physical biological collections and institutions and their digital representations in databases like GBIF, and 3) making Wikipedia pages on species more structured and machine-readable similar to Bouchout declarations. Suggestions are made to map specimen identifiers to other databases, provide live updates of digitization efforts, and allow for user-editable structured data.
This document discusses building knowledge graphs to connect biodiversity data from various sources like publications, specimens, sequences, and locations. It notes some challenges like an increasing amount of sequencing data not connected to formal taxonomic names, disconnects between taxonomy and genomics, and much of the historical taxonomic literature being difficult to access. It also discusses imbalances in available data across regions, taxa, and between well-studied and under-studied species. Integrating these diverse data sources into knowledge graphs could help address these issues but also comes with its own challenges.
Visualing phylogenies: a personal viewRoderic Page
This document discusses challenges and opportunities in visualizing phylogenetic trees. It notes that while maps have predictable navigation, trees can be drawn in many different ways. The document explores using additional dimensions like time, latitude and longitude to represent evolutionary relationships. It highlights projects like GenGIS, OneZoom and Dendroscope that attempt to make large trees more intuitive to navigate. Finally, it questions why more phylogenetic trees are not archived in databases and how software could generate publication-quality visualizations directly from analyses.
Biodiversity informatics: digitising the living worldRoderic Page
The document discusses biodiversity informatics and efforts to digitize information about the living world. It notes that GBIF has over 500 million specimen records but that none of the Kenya data actually comes from Kenya. It also discusses challenges with the Biodiversity Heritage Library in emphasizing older publications over recent scientific articles and representing information by scanned book pages rather than articles. The document raises issues about ensuring all scientific names are published and unique, and about making publications containing names accessible online.
The document discusses the Ebbe Nielsen GBIF Challenge, which is a competition that offers cash prizes. The challenge is open to individuals, teams, and companies and asks participants to submit innovative uses of data mobilized by the Global Biodiversity Information Facility (GBIF). Winning submissions could include new visualizations of biodiversity data or utilizing new types of data. Prizes will be awarded to semifinalists and top finishers, with the top prize being €20,000.
The GBIF Science Committee met in Glasgow where they awarded prizes and developed the Ebbe Nielsen Challenge. CSIRO's Tony Rees was named the 2014 Ebbe Nielsen Prize winner. Young Researcher Awards were given to Vijay Barve and Caoimhe Marshall. The GBIF Ebbe Nielsen Challenge will foster innovation using GBIF data and have cash prizes. The committee discussed challenges around data types from sources like DNA sequences and literature, as well as data quality and coverage issues.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Essentials of Automations: The Art of Triggers and Actions in FME
Data Mining GenBank for Phylogenetic inference - T. Vision
1. Prospects for enabling Suppose you have the sequence of a protein-coding
phylogenetically informed gene, and are interested in its function. What is
the first thing you would do?
comparative biology on the web
• If it were me, I would search for conserved
domains that match records in Pfam and other
Todd Vision & Hilmar Lapp
1,2 1
protein domain databases.
1U.S. National Evolutionary Synthesis Center
• Are these databases complete?
2Dept. Of Biology, University of North Carolina
• Are they infallible?
at Chapel HIll
• Are they still useful?
Why are these data useful?
• You needn’t have mastery of the specialist
literature before the search
• A match connects you to a vast interconnected
world of information
• Why not worry about completeness?
! A negative result is not expensive
! Many broadly useful records are already present
• Why not worry about fallibility?
! The user can weigh the evidence once a match is
found
! Assertions should be exposed to scrutiny
1
2. Some observations The case of phylogenetic data
• This infrastructure is designed to disseminate data • There is a broad audience for phylogenetic data
to non-specialists ! Organismal phylogeny (e.g. Encyclopedia of Life)
• The relevant data may be derived from multiple ! Gene/protein trees
“studies”, not all of which are published • Many of the available resources are geared
toward specialist researchers & students
• Data is hoarded neither by the researcher nor by
the domain database • Non-specialists turn to taxonomic classifications
when they need organismal phylogenetic
• The search service is as widely disseminated as
information
the data
• Few know where to find gene/protein trees at all
• Semantic-level machine-to-machine
communication facilitates human comprehensive
TreeBase
Tree of Life Web Project
• screenshot
2
3. The NCBI taxonomy
• Provides
! A hierarchy for all species represented by DNA
sequences in Genbank
! Names and IDs for internal nodes
! An FTP dump
• But does NOT
! Include unsequences species
! Report confidence in topology or monophyly
! Taxonomic nuance (it has synonyms & common
names)
Node-oriented web services from
What if the NCBI taxonomy… the Tree of Life Web Project
Name
• Listed all taxa, including fossils? •
Description
•
• Allowed one to assess where there are
Authority
•
conflicting topologies?
Date
•
• Reported support values for clades? Other names
•
• Reported divergence time estimates for Completeness of children
•
nodes (e.g. from TimeTree) Extinction status
•
Confidence of position
•
• Reported the provenance of the data?
Monophyly
•
3
4. Further barriers to dissemination
Outline
of phylogenetic information
• Informatics @ NESCent
• Technical obstacles
• An example of a phylogenetically-informed
Technology for storing and querying trees
!
semantic web application for phenotype
Difficulties with exchange standards
!
data
Inference of consensus trees and supertrees
!
• Promoting interoperability and closing
Taxonomic intelligence
!
technical gaps in phyloinformatics through
Globally unique identifiers
!
open development
• Social obstacles
! Reluctance to provide incomplete or fallible
information
NESCent sponsored science
• Catalysis Meetings (large, one-time events)
! To foster new collaborations and synthetic research
• Working Groups
! Smaller, focused, multiple meetings
• Sabbatical Scholars
• Postdoctoral fellows
• Short-term visitor program
! 2 weeks to 3 months
! Encourage collaborative projects
• Application info: http://www.nescent.org
4
5. NESCent Informatics
Evolutionary Informatics WG
• Support for sponsored science and scientists
• Organizers: Arlin Stoltzfus and Rutger Vos ! Facilitating electronic collaboration
• Selected goals: ! Software/database development
! Providing HPC and other IT infrastructure
! XML serialization of NEXUS
• Cyberinfrastructure for synthetic science
! Formal grammar for validation and interconversion of
Data sharing
!
NEXUS & other formats
Software interoperability
!
! A transition model language for evolutionary models
Training
!
used in statistical inference
In partnership with major national and international
!
! An ontology for evolutionary comparative data analysis
efforts
• http://www.nescent.org/wg_evoinfo
Phylogenetic cyberinfrastructure to enable
GeoPhyloBuilder
comparative biology
• Two traditions in the recording of phenotype data
“Putting the ! Natural language descriptions and character matrices
geography into ! Statements made using anatomical and trait ontologies,
designed to capitalize on the semantic web
phylogeography”
• NESCent WG on morphological evolution in fish
! Organized by Paula Mabee and Monte Westerfield
David Kidd & Xianhua Liu
! Led to a larger project
• Aim is to integrate
• Extension for ArcGIS Software that creates a spatiotemporal
! Mutant phenotype data for zebrafish
GIS network model from a tree with georeferenced nodes.
! Comparative morphology data for the Ostariophysi
• 3D visualizations are possible through ArcSCENE.
• http://www.nescent.org/informatics/software.php
5
6. Describing phenotypes using
Ontologies
ontologies
• Defined terms with defined relationships • Entity-Quality system (EQ)
! e.g. Gene Ontology, Cell Ontology
• Entity term from an anatomy ontology
! zebrafish anatomy cell ontology, etc.
cell part_of
part_of • Quality term from Phenotype and Trait
Ontology (PATO)
cell
membrane
projection • e.g. Entity=dorsal fin, Shape=round
is_a is_a
axolemma part_of axon
Phenotype and Trait Ontology
Evolutionary character matrices
(PATO)
...
• Common phenotypic data format in
physical
evolutionary biology (e.g. NEXUS)
quality
optical
quality
• Characters + character states, similar to
chromatic
buoyancy
EQ
property
dorsal fin shape character 2
color
amplitude
round state
Species one
blue
pointed state
Species two
green
undulate state
Species three
bright blue dark blue
6
7. Character Matrix vs. EQ A scenario
• A geneticist observes a reduction in the number
Character of a particular bone type (e.g. branchiostegal ray)
Character
in a zebrafish mutant of her favorite gene.
State AO
• She asks: is this bone variable in number among
Entity Attribute Value PATO species in nature?
dorsal fin shape round
• She could query the evolutionary phenotype
database using:
Entity Quality ! Entity = Branchiostegal ray (from TAO)
! Qualities pertaining to attribute ‘count’ (from PATO)
• By examining additional changes on these same
• She could examine a visualization of the branches, she sees several parallelisms:
phylogenetic relationships of the taxa with ! loss of the swimbladder, pelvic fins, and scales
the relevant character changes mapped. ! elongation of the mandibular or hyoid arches
! reduction or loss of the opercle in syngnathids and
• She would see that most Ostariophysi have 3
saccopharyngoids.
rays, but that reduction has occurred ! a variety of other bones and soft tissues are lost or
multiple times: greatly modified
! solenostomids and syngnathids (ghost pipefishes • She might hypothesize that these trait
and pipefishes) correlations are all due to alterations in the
expression of the same suite of morphogens.
! giganturids
• She can select appropriate species from these
! saccopharyngoid (gulper and swallower) eels
lineages to follow-up experimentally.
7
8. Some anatomical ontologies
What data are needed to enable
this scenario? Amphibia
•
C. elegans
•
• Anatomy and trait ontologies
Fish (zebrafish, medaka, teleosts)
•
• Phenotypes in EQ syntax for
Insects (Drosophila, Mosquito, Hymenoptera)
•
! Zebrafish mutants (already exist)
Mammals (mouse, human)
•
! Species/clades of Ostariophysi
Plants (Arabidopsis, cereals, maize, all plants)
•
• Phylogenetic relationships among the
Ostariophysi
! Taxonomy ontology
Preserving published data for
NESCent
(Vision, Lapp,
Software Developers)
future integration efforts
Working groups U. Oregon
(Westerfield)
Curator interface
Usability testing
EQSYTE database
Sequence alignments (e.g. Treebase)
•
Liason to ZFIN
EQSYTE public interface
Liason to NCBO
Long-term population records (e.g. pedigrees)
•
USD
(Mabee, EQSYTE contents
2D and 3D images
Data Curator)
•
Zebrafish
Ostariophysan
phenotypic
Collection and locality information
phenotypic
•
& genetic
Morphology data NCBO
data
collaborators
(Arratia, Coburn,
Behaviorial observations
•
Applications
Ontologies
Hilton Lunderg, Mayden)
(Phenote, OBO-Edit)
(taxonomy, TAO,
PATO, homology)
Numerical tables
•
OBO
(host of TAO, PATO,
taxonomy ontology)
Etc.
•
Tulane U.
Phenotype Ontologies
(Rios/Ontology Curator)
for Evolutionary Biology
Ichthyology community
Liason to CToL Workshops
(DeepFin, Fishbase)
• Most of these data are lost upon publication
• These are the stuff of comparative biology
8
9. Dryad: A digital repository for published data
Journals and societies involved
in evolutionary biology
so far
American Naturalist (ASN)
•
Evolution (SSE)
•
Journal of Evolutionary Biology (ESEB)
•
Integrative and Comparative Biology (SICB)
•
Molecular Biology and Evolution (SMBE)
•
Molecular Ecology
•
Molecular Phylogenetics and Evolution
•
Systematic Biology (SSB)
•
NCSU Digital Library Initiative
2006 Phyloinformatics Hackathon
Open development
ATV NCL NESCent HyPhy PAUP* CIPRES GARLI TreeBase
• Open source refers only to the licensing of the
software code Bio::CDAT Biojava BioSQL JEBL Bioruby BioPerl Biopython
• At NESCent, we have been experimenting with
practices in open development
! Community contributes to a shared code base
! Higher barrier to entry
! Can be a substantial payoff in terms of interoperability,
functionality, usability, maintenance
! Surprisingly rare in academia
9
10. Hackathon mechanics
• Before the meeting
! Participants and users suggested integrative workflows
• At the meeting
Gaps in existing toolkits were identified
!
Subgroups collaborated on high priority targets
!
Followed a “use case” model
!
Subgroups and targets were allowed to be fluid
!
Users were on hand to provide datasets, test code,
!
provide their perspective
! Dedicated participants tasked with documentation
• All code is open-source and deposited in
established repositories
Accomplishments
• Reconciling trees
• Sequence family evolution ! BioPerl: Support for NJTree
! BioPerl: Support for TribeMCL, QuickTree, ! Biopython: Wrapper for Softparsmap
ClustalW, Phylip, PAML ! BioRuby: Model for phylogenetic trees and
networks with graph algorithms
! BioPerl & Biopython: Support for dN/dS-based
tests for selection in HyPhy ! BioSQL: Model for phylogenetic trees and
networks with optimization methods and
! Biojava: Parser for Phylip alignment format
topological queries
! BioRuby: Support for T-Coffee, MAFFT, and
Phylip
10
11. • Phylogenetic inference on non-molecular
• NEXUS compliance
characters
! BioPerl: Interoperability between Bio::Phylo and ! Biojava: Interoperability between Biojava and JEBL
BioPerl APIs ! Biojava & BioRuby: Level II-compliant NEXUS parsers
! BioRuby: NEXUS-compliant data model and parser for
! All:
PAUP and TNT results
Evaluated major APIs
!
Proposed compliance levels
!
• Phylogenetic footprinting Gathered test files exposing common errors
!
! BioPerl: Support for Footprinter, PhastCons, and using Fixed compliance issues in NCL and Bio::NEXUS reference
!
ClustalW over a sliding window implementations
Worked on integrating those into GARLI and BioPerl,
!
respectively
• Estimation of divergence times
! BioPerl: Draft design of r8s wrapper
Next hackathon
• Comparative Phylogenetic Methods in R
• December 10-14, 2007 • Student internships in open-source software
• Organizers: S. Kembel, H. Lapp, B. O'Meara, S. development
Price, T. Vision, A. Zanne ! Students work with any of a large number of
established OS projects
• http://hackathon.nescent.org/R_Hackathon_1
! Students and mentors work & communicate remotely
• NESCent recruited mentors and oversaw student
• Have an idea for a future event? Submit a progress
whitepaper! ! Eleven students worked on projects in visualization,
usability, interoperability & implementation of new
methods
11
12. NEXML Command-line BioSQL
• Student: Jamie Estill
Student: Jason Caravas
•
• Mentor: Hilmar Lapp
Mentor: Rutger Vos
•
• Commands for
Flexible serialization of phylogenetic objects
• Database initialization
!
Bio::TreeIO import
!
Perl Bio::Phylo module tools for NEXML
•
Bio::TreeIO export
!
parsing and serialization Tree query
!
Tree optimization
!
Tree manipulation
!
Conservation of phylogenetic
diversity
• Student: Klaas Hartmann
• Mentor: Tobias Thierer
• Implementation of algorithm and GUI for
optimal allocation of a finite budget to
individual species to maximize phylogenetic
diversity.
12
13. Bayesian calibration of Phyloinformatics Summer Course
divergence times
Teaching advanced
•
programming skills to
• Student: Michael Nowak phylogenetic methods
• Mentor: Derrick Zwickl developers
Focus is on software
•
technologies rather than
methodology
First year
•
• Fossil occurrence data is used to ! 10 days in July 2007
construct informative priors on ! Organized by Bill Piel of
TreeBASE
divergence times for Bayesian ! 8 co-instructors
analysis in, e.g. BEAST ! 23 students (11 female) in the
first year
Additional acknowledgements
Conclusions
Hackathon participants
• The future of web-enabled comparative biology is •
beginning to become clearer. GSoC mentors and students
•
! For a preview, see genomics! Summer course instructors
•
• The facile exchange of phylogenetic data is what Phenotype evolution project
•
will enable it. ! Jim Balhoff, Wasila Dahdul, John Lundberg, Paula
• Expect to be using technologies such as Mabee, Peter Midford, Monte Westerfield
ontologies and web services, which are now • Data depository:
largely foreign to phylogenetic researchers. ! Ryan Scherle, Jane Greenberg
• Also expect a shift toward open development.
! This will necessitate new modes of training for
academic phyloinformaticists.
13