This document provides information about Eishi Co., Ltd. and their lightweight steel villa construction system. It describes the advantages of the system such as high load bearing capacity, insulation, sustainability, and faster construction. The document then outlines the building method from foundation to finishes. Finally, it provides a quotation for a sample 120 square meter villa, including materials and a total estimated cost.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms for those who already suffer from conditions like anxiety and depression.
This document is a resume for Schelby Sweeney, providing contact information, a career objective, and descriptions of professional experience in content creation, writing, on-air work, technical skills, and employment history in radio broadcasting. Sweeney has extensive experience creating social media and multimedia content, writing for various publications, hosting radio programs, and working in programming and on-air roles for radio stations in multiple major markets.
First oslo solr community meetup lightning talk janhoyCominvent AS
The document discusses setting up a Solr cluster using Solr Cloud. It describes distributing an index across multiple shards each with replicas for redundancy. Zookeeper is used to manage the cluster configuration and routing of queries to shards. An example 4-node cluster is outlined with 2 shards, each containing a replica, across 4 Jetty instances to demonstrate a basic Solr Cloud setup.
Gene Wiki at Phenotype RCN annual meetingBenjamin Good
The document discusses knowledge synthesis and annotation of human genes. It introduces the Gene Wiki project, which aims to harness the "Long Tail" of scientists to directly participate in the gene annotation process by enabling the creation of a collaboratively written, continuously updated, high-quality review article for every human gene. The Gene Wiki has been successful in generating over 10,000 gene articles with structured annotations. However, many genes still lack complete annotation. The document explores using text mining of the Gene Wiki articles to generate additional structured gene ontology and disease ontology annotations, though this comes with challenges around precision. It discusses potential applications of the mined annotations and the Gene Wiki+ platform for integrative queries across genes, diseases and other biomedical data
This document discusses biohackathons, which are events where participants collaborate to create new tools or applications in bioscience over a weekend. It provides examples of two biohackathons held in 2014 at Stanford University and University of California San Diego. At these events, around 30 participants formed teams and worked on projects like tools for visualizing and editing bio data or finding similarities between biomedical concepts. The goals were to foster collaborative creativity, learning, and networking. The best projects received small cash prizes.
This document provides information about Eishi Co., Ltd. and their lightweight steel villa construction system. It describes the advantages of the system such as high load bearing capacity, insulation, sustainability, and faster construction. The document then outlines the building method from foundation to finishes. Finally, it provides a quotation for a sample 120 square meter villa, including materials and a total estimated cost.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms for those who already suffer from conditions like anxiety and depression.
This document is a resume for Schelby Sweeney, providing contact information, a career objective, and descriptions of professional experience in content creation, writing, on-air work, technical skills, and employment history in radio broadcasting. Sweeney has extensive experience creating social media and multimedia content, writing for various publications, hosting radio programs, and working in programming and on-air roles for radio stations in multiple major markets.
First oslo solr community meetup lightning talk janhoyCominvent AS
The document discusses setting up a Solr cluster using Solr Cloud. It describes distributing an index across multiple shards each with replicas for redundancy. Zookeeper is used to manage the cluster configuration and routing of queries to shards. An example 4-node cluster is outlined with 2 shards, each containing a replica, across 4 Jetty instances to demonstrate a basic Solr Cloud setup.
Gene Wiki at Phenotype RCN annual meetingBenjamin Good
The document discusses knowledge synthesis and annotation of human genes. It introduces the Gene Wiki project, which aims to harness the "Long Tail" of scientists to directly participate in the gene annotation process by enabling the creation of a collaboratively written, continuously updated, high-quality review article for every human gene. The Gene Wiki has been successful in generating over 10,000 gene articles with structured annotations. However, many genes still lack complete annotation. The document explores using text mining of the Gene Wiki articles to generate additional structured gene ontology and disease ontology annotations, though this comes with challenges around precision. It discusses potential applications of the mined annotations and the Gene Wiki+ platform for integrative queries across genes, diseases and other biomedical data
This document discusses biohackathons, which are events where participants collaborate to create new tools or applications in bioscience over a weekend. It provides examples of two biohackathons held in 2014 at Stanford University and University of California San Diego. At these events, around 30 participants formed teams and worked on projects like tools for visualizing and editing bio data or finding similarities between biomedical concepts. The goals were to foster collaborative creativity, learning, and networking. The best projects received small cash prizes.
The National Society For The Protection Of Hmmmguest0233e9d0
The document appears to be celebrating the first birthday of the National Society for the Protection of HMMMTM. It lists various statistics about the group's social media presence and number of admins. It also includes fake quotes from public figures wishing the group a happy birthday.
This document announces a Gene Wiki Jamboree to improve Wikipedia pages for genes relevant to FaceBase. The goal is to create continually updated, collaboratively written reviews for every human gene and FaceBase concept. Such articles provide quick summaries and reference compilations, communicating knowledge to a wide audience. In contrast to model genes like Reelin, pages for important FaceBase genes like IRF6 are brief. The jamboree will teach wiki editing basics and start improving FaceBase gene pages through boisterous collaboration. Participants can join online discussions and the main coordination page is provided. Feedback after the jamboree should be sent to the organizers.
Microtask crowdsourcing for disease mention annotation in PubMed abstractsBenjamin Good
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Benjamin M. Good, Max Nanis, Andrew I. Su
Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses that would otherwise be impossible. As a result, many biological natural language processing (BioNLP) projects attempt to address this challenge. However, the state of the art in BioNLP still leaves much room for improvement in terms of precision, recall and the complexity of knowledge structures that can be extracted automatically. Expert curators are vital to the process of knowledge extraction but are always in short supply. Recent studies have shown that workers on microtasking platforms such as Amazon’s Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text.
Here, we investigated the use of the AMT in capturing disease mentions in Pubmed abstracts. We used the recently published NCBI Disease corpus as a gold standard for refining and benchmarking the crowdsourcing protocol. After merging the responses from 5 AMT workers per abstract with a simple voting scheme, we were able to achieve a maximum f measure of 0.815 (precision 0.823, recall 0.807) over 593 abstracts as compared to the NCBI annotations on the same abstracts. Comparisons were based on exact matches to annotation spans. The results can also be tuned to optimize for precision (max = 0.98 when recall = 0.23) or recall (max = 0.89 when precision = 0.45). It took 7 days and cost $192.90 to complete all 593 abstracts considered here (at $.06/abstract with 50 additional abstracts used for spam detection).
This experiment demonstrated that microtask-based crowdsourcing can be applied to the disease mention recognition problem in the text of biomedical research articles. The f-measure of 0.815 indicates that there is room for improvement in the crowdsourcing protocol but that, overall, AMT workers are clearly capable of performing this annotation task.
This document discusses using Wikidata as a platform for integrating and curating biomedical knowledge. Wikidata allows distributed curation of information by many contributors. It currently contains data on human and mouse genes, proteins, diseases, Gene Ontology terms and FDA approved drugs to link biomedical concepts. Applications being developed include integrating gene data into Wikipedia pages and loading microbial genome data. Wikidata aims to centralize biomedical content while distributing the work of curation to scale with the large volume of literature.
The document proposes developing a library and tools to more easily manage iptables rules. It would include a C library to parse and modify rules, Python bindings for the library, and a CLI tool. The tool would allow viewing and modifying rules and reverting changes. It would determine if packets can be transmitted or received based on attributes. The project would be developed over 6 phases from April to August as part of Google Summer of Code to improve the user experience of managing iptables rules on Fedora systems.
This presentation describes two modes of web-based knowledge acquisition in the domain of bioinformatics. "Pull" models such as social tagging systems that engage passive altruism and "push" models such as the Mechanical Turk that actively guide and incentivise the knowledge acquisition process.
Update on the gene wiki project, introduction to knowledge.bio semantic search application, introduction to biobranch.org collaborative decision tree creator
The document discusses a proposed software solution called IM Safer that aims to protect children from online predators by monitoring instant messaging conversations in real-time. It analyzes text for predatory patterns and alerts parents via email or SMS. The founders believe this focused approach can be more effective than existing parental control software and create online reputations for user screen names to track potentially harmful users. Market research suggests parents are concerned and willing to pay for effective solutions.
The document discusses building machine learning predictors for biological problems that are accurate using few variables from noisy, high-dimensional datasets. It proposes a Human Guided Random Forest approach where each decision tree is constructed from gene sets selected by domain experts to leverage biological knowledge not captured by networks alone. A game called Combo is presented that interfaces biological training data with machine learning algorithms via a card game interface to generate annotated decision trees.
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
Jan Høydahl presented what is new in Solr 4.0 including near real-time search capabilities, SolrCloud for distributed search across multiple cores, an improved spellchecker, smaller indexes using Flex, pluggable ranking, new sorting functions, and an updated admin GUI. Some key features being added in Solr 4.0 are support for Apache ZooKeeper, auto load balancing of queries across collections, and fault tolerant indexing.
The document discusses porting Python 2 code to Python 3. It notes there are many small differences between Python 2 and 3 like print becoming a function and some common data types being changed. The 2to3 tool can automatically port some code but may require manual fixes. Maintaining a single codebase with compatibility for both versions can become complicated over time. The best approach depends on factors like the expected development schedule. Porting large libraries requires testing and fixing failures, and the community can help with porting efforts.
Mark Hopper Product And Marketing Exec 2010Mark Hopper
This document is an executive summary for Mark Hopper, a business leader with over 15 years of experience in technology industries including mobile, internet, and media. It outlines his experience as a seasoned senior executive, business "decathlete" with skills in marketing, product management, and business development. It also notes his passions for mobility, the internet, media, and strategy leadership.
This document discusses expanded polystyrene (EPS) and expanded polypropylene (EPP) foam materials. It describes their properties, such as EPS being a lightweight closed-cell plastic used for insulation, packaging, and construction applications. EPP is used in automotive parts, industrial packaging, furniture, and food containers due to its durability, energy absorption, and thermal insulation. Pictures show machinery for EPS and EPP production, including batch pre-expanders, molding machines, cutters, and recycling systems.
This document discusses learning through games and scientific discovery games. It covers how games can be used for learning, such as allowing exploration and feedback. Games have also been shown to improve skills like speed of processing, multitasking, and vision. Games can teach useful skills and concepts in areas like biology, medicine, geography and more. Scientific discovery games engage players in real scientific work, such as protein folding games like Foldit and gene selection games like The Cure that have yielded real scientific results. Citizen science games like Phylo have players align gene sequences to help scientists.
The Cure: Making a game of gene selection for breast cancer survival predictionBenjamin Good
The document describes a study that developed an online game called "The Cure" to capture knowledge from over 1,000 players regarding genes that could be used to predict breast cancer survival. Gene sets assembled from the game data showed significant enrichment for cancer-related genes and provided prediction accuracy comparable to other methods. The game successfully tapped into the collective knowledge and reasoning of many players to identify predictive gene signatures.
On the relation between learning, teaching, science and games. Presentation for the course on simulation in medical pedagogy at Paris Descartes university.
Games can be used for learning in several ways:
1) People can learn from playing games through trial and error feedback loops. Games allow exploration without real-world consequences.
2) Games have been used successfully to teach a variety of subjects in classrooms from elementary school through university levels. They increase student motivation and information retention.
3) Scientific discovery games engage many participants in solving research problems through game mechanics. Players of the protein folding game Foldit have contributed to scientific papers by developing strategies to solve protein structures better than computer algorithms.
The National Society For The Protection Of Hmmmguest0233e9d0
The document appears to be celebrating the first birthday of the National Society for the Protection of HMMMTM. It lists various statistics about the group's social media presence and number of admins. It also includes fake quotes from public figures wishing the group a happy birthday.
This document announces a Gene Wiki Jamboree to improve Wikipedia pages for genes relevant to FaceBase. The goal is to create continually updated, collaboratively written reviews for every human gene and FaceBase concept. Such articles provide quick summaries and reference compilations, communicating knowledge to a wide audience. In contrast to model genes like Reelin, pages for important FaceBase genes like IRF6 are brief. The jamboree will teach wiki editing basics and start improving FaceBase gene pages through boisterous collaboration. Participants can join online discussions and the main coordination page is provided. Feedback after the jamboree should be sent to the organizers.
Microtask crowdsourcing for disease mention annotation in PubMed abstractsBenjamin Good
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Benjamin M. Good, Max Nanis, Andrew I. Su
Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses that would otherwise be impossible. As a result, many biological natural language processing (BioNLP) projects attempt to address this challenge. However, the state of the art in BioNLP still leaves much room for improvement in terms of precision, recall and the complexity of knowledge structures that can be extracted automatically. Expert curators are vital to the process of knowledge extraction but are always in short supply. Recent studies have shown that workers on microtasking platforms such as Amazon’s Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text.
Here, we investigated the use of the AMT in capturing disease mentions in Pubmed abstracts. We used the recently published NCBI Disease corpus as a gold standard for refining and benchmarking the crowdsourcing protocol. After merging the responses from 5 AMT workers per abstract with a simple voting scheme, we were able to achieve a maximum f measure of 0.815 (precision 0.823, recall 0.807) over 593 abstracts as compared to the NCBI annotations on the same abstracts. Comparisons were based on exact matches to annotation spans. The results can also be tuned to optimize for precision (max = 0.98 when recall = 0.23) or recall (max = 0.89 when precision = 0.45). It took 7 days and cost $192.90 to complete all 593 abstracts considered here (at $.06/abstract with 50 additional abstracts used for spam detection).
This experiment demonstrated that microtask-based crowdsourcing can be applied to the disease mention recognition problem in the text of biomedical research articles. The f-measure of 0.815 indicates that there is room for improvement in the crowdsourcing protocol but that, overall, AMT workers are clearly capable of performing this annotation task.
This document discusses using Wikidata as a platform for integrating and curating biomedical knowledge. Wikidata allows distributed curation of information by many contributors. It currently contains data on human and mouse genes, proteins, diseases, Gene Ontology terms and FDA approved drugs to link biomedical concepts. Applications being developed include integrating gene data into Wikipedia pages and loading microbial genome data. Wikidata aims to centralize biomedical content while distributing the work of curation to scale with the large volume of literature.
The document proposes developing a library and tools to more easily manage iptables rules. It would include a C library to parse and modify rules, Python bindings for the library, and a CLI tool. The tool would allow viewing and modifying rules and reverting changes. It would determine if packets can be transmitted or received based on attributes. The project would be developed over 6 phases from April to August as part of Google Summer of Code to improve the user experience of managing iptables rules on Fedora systems.
This presentation describes two modes of web-based knowledge acquisition in the domain of bioinformatics. "Pull" models such as social tagging systems that engage passive altruism and "push" models such as the Mechanical Turk that actively guide and incentivise the knowledge acquisition process.
Update on the gene wiki project, introduction to knowledge.bio semantic search application, introduction to biobranch.org collaborative decision tree creator
The document discusses a proposed software solution called IM Safer that aims to protect children from online predators by monitoring instant messaging conversations in real-time. It analyzes text for predatory patterns and alerts parents via email or SMS. The founders believe this focused approach can be more effective than existing parental control software and create online reputations for user screen names to track potentially harmful users. Market research suggests parents are concerned and willing to pay for effective solutions.
The document discusses building machine learning predictors for biological problems that are accurate using few variables from noisy, high-dimensional datasets. It proposes a Human Guided Random Forest approach where each decision tree is constructed from gene sets selected by domain experts to leverage biological knowledge not captured by networks alone. A game called Combo is presented that interfaces biological training data with machine learning algorithms via a card game interface to generate annotated decision trees.
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
Jan Høydahl presented what is new in Solr 4.0 including near real-time search capabilities, SolrCloud for distributed search across multiple cores, an improved spellchecker, smaller indexes using Flex, pluggable ranking, new sorting functions, and an updated admin GUI. Some key features being added in Solr 4.0 are support for Apache ZooKeeper, auto load balancing of queries across collections, and fault tolerant indexing.
The document discusses porting Python 2 code to Python 3. It notes there are many small differences between Python 2 and 3 like print becoming a function and some common data types being changed. The 2to3 tool can automatically port some code but may require manual fixes. Maintaining a single codebase with compatibility for both versions can become complicated over time. The best approach depends on factors like the expected development schedule. Porting large libraries requires testing and fixing failures, and the community can help with porting efforts.
Mark Hopper Product And Marketing Exec 2010Mark Hopper
This document is an executive summary for Mark Hopper, a business leader with over 15 years of experience in technology industries including mobile, internet, and media. It outlines his experience as a seasoned senior executive, business "decathlete" with skills in marketing, product management, and business development. It also notes his passions for mobility, the internet, media, and strategy leadership.
This document discusses expanded polystyrene (EPS) and expanded polypropylene (EPP) foam materials. It describes their properties, such as EPS being a lightweight closed-cell plastic used for insulation, packaging, and construction applications. EPP is used in automotive parts, industrial packaging, furniture, and food containers due to its durability, energy absorption, and thermal insulation. Pictures show machinery for EPS and EPP production, including batch pre-expanders, molding machines, cutters, and recycling systems.
This document discusses learning through games and scientific discovery games. It covers how games can be used for learning, such as allowing exploration and feedback. Games have also been shown to improve skills like speed of processing, multitasking, and vision. Games can teach useful skills and concepts in areas like biology, medicine, geography and more. Scientific discovery games engage players in real scientific work, such as protein folding games like Foldit and gene selection games like The Cure that have yielded real scientific results. Citizen science games like Phylo have players align gene sequences to help scientists.
The Cure: Making a game of gene selection for breast cancer survival predictionBenjamin Good
The document describes a study that developed an online game called "The Cure" to capture knowledge from over 1,000 players regarding genes that could be used to predict breast cancer survival. Gene sets assembled from the game data showed significant enrichment for cancer-related genes and provided prediction accuracy comparable to other methods. The game successfully tapped into the collective knowledge and reasoning of many players to identify predictive gene signatures.
On the relation between learning, teaching, science and games. Presentation for the course on simulation in medical pedagogy at Paris Descartes university.
Games can be used for learning in several ways:
1) People can learn from playing games through trial and error feedback loops. Games allow exploration without real-world consequences.
2) Games have been used successfully to teach a variety of subjects in classrooms from elementary school through university levels. They increase student motivation and information retention.
3) Scientific discovery games engage many participants in solving research problems through game mechanics. Players of the protein folding game Foldit have contributed to scientific papers by developing strategies to solve protein structures better than computer algorithms.
This document discusses the challenges of analyzing large genomic datasets and summarizes some of the speaker's research applying new techniques to address these challenges. The speaker develops tools to help biologists efficiently analyze vast amounts of sequencing data. One key technique is "digital normalization" which can speed up analysis by 20-200x while reducing data volume. The speaker has applied this to assemble genomes like the parasitic nematode H. contortus and soil metagenomes. The goal is to make downstream data analysis faster, better, and help answer biological questions.
This document discusses the need for open science and data sharing to advance genomics research and drug development. It outlines the vision and mission of Sage Bionetworks to create a "Commons" where integrative models of disease are built and evolved through data sharing and collaborative research. Some key barriers to open science are addressed, such as standards, tools, recognition of contributors, and changing incentives around data hoarding. The opportunity of networks to identify disease drivers and therapies from large genomic datasets is highlighted.
This lesson introduces genetic research and bioinformatics. It explains that genetic researchers study inherited traits by analyzing DNA sequences to understand how organisms are similar and different. It then provides an overview of the scientific method and practices used in genetic research like developing hypotheses, collecting and analyzing data, and drawing conclusions. Finally, it defines bioinformatics as using computer science to analyze large biological data sets and provides some examples of how bioinformatics tools can help answer scientific questions.
The document discusses the rise of big data in microbiology due to decreasing costs of DNA sequencing and computational resources. It describes how high-throughput sequencing is generating vast amounts of microbial genomic and metagenomic data. However, analyzing these large, complex datasets presents numerous technical and social challenges for microbiologists, including handling data volume, integrating diverse data types, accessing resources, and incentivizing data sharing. Overcoming these bottlenecks will be key to unlocking the scientific insights contained within the microbial "big data" tidal wave.
Rapidly decreasing costs of DNA sequencing and increases in computational power have led to an era of "big data" in microbiology. The collection and analysis of massive datasets from metagenomic studies presents both opportunities and challenges. Key opportunities include understanding microbial community dynamics and interactions at an unprecedented scale. However, challenges include developing computational methods to efficiently analyze large, diverse datasets and training the next generation of microbiologists to work in this new "big data" environment. Overcoming these challenges will require collaborative efforts across disciplines as well as a culture change toward open data sharing and reproducible research.
Why Life is Difficult, and What We MIght Do About ItAnita de Waard
This document discusses connecting biological knowledge through claim-evidence networks. It outlines some of the challenges in biology like variability between specimens and gene expression changes. It then proposes that claim-evidence networks can be used to connect biological knowledge by linking experimental evidence to claims. Steps to build these networks include identifying claims in documents, structuring the evidence in databases, and automatically connecting the claims and evidence. Examples of efforts that link drug interactions to evidence and predict protein interactions across species are provided. However, it notes that more still needs to be done to fully realize this approach.
Bioinformatics issues and challanges presentation at s p collegeSKUASTKashmir
This document provides an overview of bioinformatics and some key concepts:
- It discusses the exponential growth of biological data from technologies like PCR and microarrays, and how bioinformatics is needed to analyze this data.
- Bioinformatics is defined as integrating biology and computer science to collect, analyze, and interpret large amounts of molecular-level information. It uses databases and tools to study genomes, proteins, and biological processes.
- Major databases like GenBank, EMBL, and SwissProt store DNA, RNA, protein sequences and provide access to researchers. Tools like BLAST are used to search databases and analyze sequences.
- Benefits of bioinformatics include advances in medicine, agriculture, forensics
Bringing scientists to data to accelerate discoveries and improve human healt...Sri Ambati
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video:
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Data sharing is fraught with privacy concerns in the biomedical domain. How do we develop insights if data silos are our reality? Stanford is undertaking a “data commons on steroids” approach with a goal to “free the scientist” and make insight sharing possible across the data silos.
Somalee is a computational physicist by training, a biotechnologist by profession and a data analyst by the way of passion. She believes that with the explosion of data in healthcare and with new methods to analyze such large amounts of data, we will see massive changes in how human diseases are addressed via novel drugs, large-scale genomics, wearable sensors, and software to tie it all together. She wants to drive part of this revolution.
"Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human ...Hyper Wellbeing
"Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human Logevity, Inc.)
Delivered at the inaugural Hyper Wellbeing Summit, 14th November 2016, Mountain View, California.
For more information including details of subsequent events, please visit http://hyperwellbeing.com
The summit was created to foster a community around an emerging industry - Wellness as a Service (WaaS). Consumer technologies, in particular wearables and mobile, are powering a consumer revolution. A revolution to turn health and wellness into platform delivered services. A revolution enabling consumer data-driven disease risk reduction. A revolution extending health care past sick care towards consumer-led lifelong health, wellness and lifestyle optimization.
WaaS newsletter sign-up http://eepurl.com/b71fdr
@hyperwellbeing
1. Whole genome sequencing is becoming more affordable and widespread, allowing for large datasets and personalized medicine applications.
2. However, genomic data is extremely sensitive and can be used to identify individuals and their relatives, even when anonymized. Once a genome is leaked, it cannot be revoked.
3. Computer scientists are exploring techniques to protect genomic privacy, such as differential privacy and secure computation, but enabling privacy-preserving genomic research remains a challenge.
This document provides a brief biography of the author and outlines their perspective on the complexity of biological systems and gene expression. It notes that a single specimen or species can show significant variability, and that gene expression varies based on factors like age, environmental stimuli, nutritional state, and interactions with other organisms like gut microbes. It argues that fully understanding biological systems requires considering all of these sources of variability and the interactions between different elements. The author's new role focuses on facilitating collaborations to better represent scientific knowledge by connecting experimental data across studies in a way that can help disentangle some of this complexity.
Δρ Χαράλαμπος Πιτσαλίδης, 3rd Health Innovation ConferenceStarttech Ventures
Ομιλία – Παρουσίαση: Δρ. Χαράλαμπος Πιτσαλίδης, Ερευνητής, Εργαστήριο Βιοηλεκτρονικών Συστημάτων & Βιοτεχνολογίας, Department of Chemical Engineering and Biotechnology, University of Cambridge, UK., Επιστημονικός Σύμβουλος σε Θέματα Βιοτεχνολογίας
Τίτλος παρουσίασης: «Ανθρώπινα Όργανα σε Chip». Καινοτόμες τεχνολογίες για την εύρεση νέων φαρμάκων και εξατομικευμένων θεραπειών»
How can Big Data help upgrade brain care?SharpBrains
Current standards of brain and mental care often rely on trials of insufficient scale, which not only limits our ability to diagnose, prevent, treat and personalize care but often leads to incorrect conclusions and undesirable results. What tools and data are becoming available via large-scale web-based and mobile applications, and how can researchers, innovators and practitioners connect with these initiatives?
- Chair: Alvaro Fernandez, CEO of SharpBrains, YGL Class of 2012
- Daniel Sternberg, Data Scientist at Lumosity
- Joan Severson, President of Digital Artefacts
- Robert Bilder, Chief of Medical Psychology-Neuropsychology at UCLA Semel Institute for Neuroscience
Knowledge Will Propel Machine Understanding of Big DataAmit Sheth
1) Amit Sheth presented on how knowledge can help machines better understand big data.
2) He discussed challenges like understanding implicit entities, analyzing drug abuse forums, and understanding city traffic using sensors and text.
3) Sheth argued that knowledge graphs and ontologies can help interpret diverse data types and provide contextual understanding to help solve real-world problems.
Similar to The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction (20)
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsBenjamin Good
The Gene Ontology (GO) Consortium (GOC) is developing a new knowledge representation approach called ‘causal activity models’ (GO-CAM). A GO-CAM describes how one or several gene products contribute to the execution of a biological process. In these models (implemented as OWL instance graphs anchored in Open Biological Ontology (OBO) classes and relations), gene products are linked to molecular activities via semantic relationships like ‘enables’, molecular activities are linked to each other via causal relationships such as ‘positively regulates’, and sets of molecular activities are defined as ‘parts’ of larger biological processes. This approach provides the GOC with a more complete and extensible structure for capturing knowledge of gene function. It also allows for the representation of knowledge typically seen in pathway databases.
Here, we present details and results of a rule-based transformation of pathways represented using the BioPAX exchange format into GO-CAMs. We have automatically converted all Reactome pathways into GO-CAMs and are currently working on the conversion of additional resources available through Pathway Commons. By converting pathways into GO-CAMs, we can leverage OWL description logic reasoning over OBO ontologies to infer new biological relationships and detect logical inconsistencies. Further, the conversion helps to increase standardization for the representation of biological entities and processes. The products of this work can be used to improve source databases, for example by inferring new GO annotations for pathways and reactions and can help with the formation of meta-knowledge bases that integrate content from multiple sources.
Pathways2GO: Converting BioPax pathways to GO-CAMsBenjamin Good
Presentation at the Gene Ontology Consortium Annual Meeting. Describing the automatic conversion of biochemical pathways in the Reactome Knowledge Base into the Gene Ontology 'Causal Activity Model' representation.
Building a Biomedical Knowledge Garden Benjamin Good
Describes the tribulations of building a large biomedical knowledge graph. Provides a comparison between the UMLS and Wikidata in terms of content and structure. Concludes with the idea of anchoring the knowledge graph in Wikidata items and properties.
When the Heart BD2K grant was originally written. We proposed to build something called “Big Data World” to help advance citizen science, scientific crowdsourcing and science education – especially in bioinformatics. This past year, this idea has become Science Game Lab ( https://sciencegamelab.org ) . A collaboration between the Su laboratory at Scripps Research, Playmatics LLC, and recently the creators of WikiPathways.
Gene Wiki and Wikimedia Foundation SPARQL workshopBenjamin Good
This document summarizes a presentation about curating biomedical knowledge on Wikidata and Wikipedia through the Gene Wiki project. The Gene Wiki project develops tools and resources to automatically generate gene pages on Wikipedia using structured data from Wikidata. This centralized biomedical knowledge on open platforms and allows the data to be queried through SPARQL, powering new applications for biomedical research.
Opportunities and challenges presented by Wikidata in the context of biocurationBenjamin Good
Abstract—Wikidata is a world readable and writable knowledge base maintained by the Wikimedia Foundation. It offers the opportunity to collaboratively construct a fully open access knowledge graph spanning biology, medicine, and all other domains of knowledge. To meet this potential, social and technical challenges must be overcome - many of which are familiar to the biocuration community. These include community ontology building, high precision information extraction, provenance, and license management. By working together with Wikidata now, we can help shape it into a trustworthy, unencumbered central node in the Semantic Web of biomedical data.
Wikidata workshop for ISB Biocuration 2016Benjamin Good
This document discusses using Wikidata as a platform for biocuration. Wikidata is presented as a new paradigm that could reduce pain points in current biocuration practices by providing a single platform with persistent data access. It describes Wikidata's structure as a knowledge base of unique items and statements linked together to form a knowledge graph. Examples show how biomedical data like genes and proteins are represented. The document outlines Wikidata's community processes and increasing impact on applications like Wikipedia and genome browsers. It envisions the potential for researchers to contribute new biomedical knowledge through Wikidata.
This document discusses using crowdsourcing to improve knowledge integration from scientific literature. It notes that while scientific literature is growing rapidly, knowledge bases are not being updated quickly enough through traditional manual curation. The document proposes using a "divide and conquer" approach to break the task of knowledge extraction into smaller microtasks that can be distributed to crowds. It describes experiments showing crowds can identify concepts like diseases in text and extract relationships between concepts at a level comparable to experts. The document also discusses Wikidata, an open knowledge base where this kind of distributed curation could take place to build a comprehensive resource for biomedical knowledge.
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery Benjamin Good
PubMed now indexes roughly 25 million articles and is growing by more than a million per year. The scale of this “Big Knowledge” repository renders traditional, article-based modes of user interaction unsatisfactory, demanding new interfaces for integrating and summarizing widely distributed knowledge. Natural language processing (NLP) techniques coupled with rich user interfaces can help meet this demand, providing end-users with enhanced views into public knowledge, stimulating their ability to form new hypotheses.
Knowledge.Bio provides a Web interface for exploring the results from text-mining PubMed. It works with subject, predicate, object assertions (triples) extracted from individual abstracts and with predicted statistical associations between pairs of concepts. While agnostic to the NLP technology employed, the current implementation is loaded with triples from the SemRep-generated SemmedDB database and putative gene-disease pairs obtained using Leiden University Medical Center’s ‘Implicitome’ technology.
Users of Knowledge.Bio begin by identifying a concept of interest using text search. Once a concept is identified, associated triples and concept-pairs are displayed in tables. These tables have text-based and semantic filters to help refine the list of triples to relations of interest. The user then selects relations for insertion into a personal knowledge graph implemented using cytoscape.js. The graph is used as a note-taking or ‘mind-mapping’ structure that can be saved offline and then later reloaded into the application. Clicking on edges within a graph or on the ‘evidence’ element of a triple displays the abstracts where that relation was detected, thus allowing the user to judge the veracity of the statement and to read the underlying articles.
Knowledge.Bio is a free, open-source application that can provide, deep, personal, concise, shareable views into the “Big Knowledge” scattered across the biomedical literature.
Application: http://knowledge.bio
Source code: https://bitbucket.org/sulab/kb1/
Gene Wiki and Mark2Cure update for BD2KBenjamin Good
The document discusses using crowdsourcing via platforms like Amazon Mechanical Turk and Mark2Cure to extract information from biomedical literature at scale. It summarizes experiments showing non-experts can accurately recognize disease concepts in PubMed abstracts when aggregated. The author proposes expanding this approach to identify genes, drugs, diseases and relationships to build a computable network of biomedical knowledge from the literature. Funding sources and collaborators supporting various related projects are acknowledged at the end.
Building a massive biomedical knowledge graph with citizen scienceBenjamin Good
The life sciences are faced with a rapidly growing array of technologies for measuring the molecular states of living things. From sequencing platforms that can assemble the complete genome sequence of a complex organism involving billions of nucleotides in a few days to imaging systems that can just as rapidly churn out millions of snapshots of cells, biology is truly faced with a data deluge. To translate this information into new knowledge that can guide the search for new medicines, biomedical researchers increasingly need to build on the existing knowledge of the broad community. Prior knowledge can help guide searches through the masses of new data. Unfortunately, most biomedical knowledge is represented solely in the text of journal articles. Given that more than a million such articles are published every year, the challenge of using this knowledge effectively is substantial. Ideally, knowledge such as the interrelations between genes, drugs and diseases would be represented in a knowledge graph that enabled queries like: “show me all the genes related to this disease or related to any drugs used to treat this disease”. Systems exist that attempt to extract this information automatically from text, but the quality of their output remains far below what can be obtained by human readers. We are developing a new platform that taps the language comprehension abilities of citizen scientists to help excavate a queryable knowledge graph from the biomedical literature. In proof-of-concept experiments, we have demonstrated that lay-people are capable of extracting meaningful information from complex biological text. The information extracted using this community intelligence framework can surpass the efforts of individual experts in quality while also offering the potential to achieve massive scale. In this presentation we will describe the results of early experiments and introduce our prototype citizen science platform: http://mark2cure.org.
Branch: An interactive, web-based tool for building decision tree classifiersBenjamin Good
A crucial task in modern biology is the prediction of complex phenotypes, such as breast cancer prognosis, from genome-wide measurements. Machine learning algorithms can sometimes infer predictive patterns, but there is rarely enough data to train and test them effectively and the patterns that they identify are often expressed in forms (e.g. support vector machines, neural networks, random forests composed of 10s of thousands of trees) that are highly difficult to understand. In addition, it is generally unclear how to include prior knowledge in the course of their construction.
Decision trees provide an intuitive visual form that can capture complex interactions between multiple variables. Effective methods exist for inferring decision trees automatically but it has been shown that these techniques can be improved upon via the manual interventions of experts. Here, we introduce Branch, a new Web-based tool for the interactive construction of decision trees from genomic datasets. Branch offers the ability to: (1) upload and share datasets intended for classification tasks (in progress), (2) construct decision trees by manually selecting features such as genes for a gene expression dataset, (3) collaboratively edit decision trees, (4) create feature functions that aggregate content from multiple independent features into single decision nodes (e.g. pathways) and (5) evaluate decision tree classifiers in terms of precision and recall. The tool is optimized for genomic use cases through the inclusion of gene and pathway-based search functions.
Branch enables expert biologists to easily engage directly with high-throughput datasets without the need for a team of bioinformaticians. The tree building process allows researchers to rapidly test hypotheses about interactions between biological variables and phenotypes in ways that would otherwise require extensive computational sophistication. In so doing, this tool can both inform biological research and help to produce more accurate, more meaningful classifiers.
A prototype of Branch is available at http://biobranch.org/
Serious games for bioinformatics education. ISMB 2014 education workshopBenjamin Good
This document discusses using games and gamification for bioinformatics education. It begins by outlining how games can be used for recruiting and engaging students. Several existing educational bioinformatics games are then described, including games focused on protein folding, sequence alignment, and introducing concepts like BLAST. However, these games only provide shallow learning. Gamification approaches for bioinformatics education like CACAO and Rosalind.info are also summarized. These apply game elements like levels, badges, and leaderboards to make bioinformatics algorithm practice more engaging. Overall, the document argues that while current offerings have limitations, games show promise for improving bioinformatics learning if they can bridge the gap between games and scientific concepts.
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Benjamin Good
Benjamin M. Good, Max Nanis, Andrew I. Su
Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses that would otherwise be impossible. As a result, many biological natural language processing (BioNLP) projects attempt to address this challenge. However, the state of the art in BioNLP still leaves much room for improvement in terms of precision, recall and the complexity of knowledge structures that can be extracted automatically. Expert curators are vital to the process of knowledge extraction but are always in short supply. Recent studies have shown that workers on microtasking platforms such as Amazon’s Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text.
Here, we investigated the use of the AMT in capturing disease mentions in Pubmed abstracts. We used the recently published NCBI Disease corpus as a gold standard for refining and benchmarking the crowdsourcing protocol. After merging the responses from 5 AMT workers per abstract with a simple voting scheme, we were able to achieve a maximum f measure of 0.815 (precision 0.823, recall 0.807) over 593 abstracts as compared to the NCBI annotations on the same abstracts. Comparisons were based on exact matches to annotation spans. The results can also be tuned to optimize for precision (max = 0.98 when recall = 0.23) or recall (max = 0.89 when precision = 0.45). It took 7 days and cost $192.90 to complete all 593 abstracts considered here (at $.06/abstract with 50 additional abstracts used for spam detection).
This experiment demonstrated that microtask-based crowdsourcing can be applied to the disease mention recognition problem in the text of biomedical research articles. The f-measure of 0.815 indicates that there is room for improvement in the crowdsourcing protocol but that, overall, AMT workers are clearly capable of performing this annotation task.
Mark2Cure: a crowdsourcing platform for biomedical literature annotationBenjamin Good
Mark2Cure is developing a crowdsourcing platform to engage large groups of people, including microtask workers and volunteers, in annotating biomedical literature. Recent studies show non-experts can generate high-quality annotations. The platform will direct workers to complete "quests" assigned by scientific leaders to extract targeted knowledge. An initial prototype uses Amazon Mechanical Turk to evaluate disease mention annotation on PubMed abstracts, achieving expert-level precision and outperforming natural language processing tools. The goal is to structure all published biomedical knowledge with human-level accuracy and completeness on the same day as publication.
Short update on The Cure game first weekBenjamin Good
This document summarizes usage data and initial results from a citizen science game called Cure that aims to crowdsource cancer research. It reports on:
- Website usage statistics from the first week
- Player demographics and time spent
- Community progress defeating boards
- Baseline random gene selection vs gene ranking based on frequency of player selection, showing marginal improvement from crowdsourcing
- Next steps to improve the game and data collection
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Survival Prediction
1. THE CURE: A GAME WITH THE PURPOSE OF
GENE SELECTION FOR BREAST CANCER
SURVIVAL PREDICTION
Benjamin Good*, Salvatore Loguercio, Max Nanis, Andrew Su
The Scripps Research Institute
http://genegames.org/cure/
Rocky 2013
2. A QUESTION
How would you get 150 PhD level scientists
to work together on the same problem?
Without any money?
4. WHY GAMES?
It is estimated that 9 billion
hours are spent playing
Solitaire every year
Luis Von Ahn. : Google Tech Talk: Human Computation 2006.
(Shortly after receiving $500,000 „Genius Grant‟ for this work)
5. Seven million hours of human labor
ONE YEAR SOLITAIRE =
1,285 EMPIRE STATE
BUILDINGS
Empire State Building
6. 150 billion hours gaming each year
What if we could use a tiny fraction of that
human effort to achieve another purpose?
empire state
building
7M
one year of solitaire one year of games
9B
150B
McGonigal J. Reality is broken : why games make us better and how they can
change the world. New York: Penguin Press; 2011.
7. PURPOSES
Computer
science
Find objects
inside
images
Tag songs
Label all images
on the Web
Rate image
quality
Biology
Figure out how
proteins fold
Teach computers
English
Design RNA
molecules
Build ontologies
Map connections
between neurons
Link genes with
diseases
Assemble
genomes
Align DNA and
protein sequences
Tag Malaria parasites
in blood smears
Develop better
treatments for
breast cancer
10. INFERRING SURVIVAL PREDICTORS
10 year
Nosurvival?
Yes
make predictions on new samples
find patterns
10 year survival?
No
Yes
van't Veer, Laura J., et al. "Gene expression profiling predicts clinical outcome of breast cancer.” Nature 415.6871 (2002): 530-536.
11. INFERRING SURVIVAL PREDICTORS
find patterns
make predictions
No
10 year survival?
Yes
1) select genes
Out of the 25,000+ genes, which
small set works together the best?
2) infer predictor from data (e.g. decision tree, SVM, etc.)
12. PROBLEM: GENE SELECTION INSTABILITY
instability: different methods, different datasets
produce different gene sets for the same phenotype [1]
[1] Griffith, Obi L., et al. "A robust prognostic signature for hormone-positive node-negative breast cancer." Genome Medicine 5.10 (2013).
13. PROBLEM: THE VALIDATION GAP
training
data, test
data
validation
validation: predictive signatures often perform
worse on independent data created for validation.
Photograph by Richard Hallman, National Geographic Adventure Blog
14. ADDING PRIOR KNOWLEDGE TO THE DISCOVERY
ALGORITHM
make predictions
find patterns
<10 yr
survival
>10 yr
survival
15. EX.) NETWORK GUIDED FORESTS
Use network to find
good gene
combinations
Dutkowski & Ideker (2011) Protein Networks as Logic Functions in Development in Development and Cancer. PLoS Computational Biology
16. BUT MOST KNOWLEDGE IS NOT STRUCTURED
1000000
950000
900000
850000
Number 800000
articles
750000
added to
PubMed 700000
112 publications/hour
(37 more by the end of this talk)
650000
600000
550000
500000
>160,000 publications linked to “breast cancer” since 2000
http://tinyurl.com/brsince2000
17. HOW CAN WE USE UNSTRUCTURED
KNOWLEDGE FOR GENE SELECTION?
Need an intelligent system that is good at reading and hypothesizing
Like you
28. COMMUNITY BOARD VIEW,
CHOOSE OPEN BOARD
You beat this one
The community
finished this board
(e.g. 11 different
players completed it)
This board is still open
29. BOARDS
• 25 genes each
• randomly selected from 1,250 genes that passed an
unsupervised filter for minimum expression level and variance
for a particular dataset [1],[2]
• 4 different 100 board rounds completed, each with some overlap
• 3731 distinct genes used in the game
[1] Curtis, Christina, et al. "The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups." Nature (2012)
[2] Griffith, Obi L., et al. "A robust prognostic signature for hormone-positive node-negative breast cancer." Genome Medicine (2013)
30. 1,077 Players registered (one year)
http://io9.com/
these-cool-games-let-you-do-real-life-science-486173006
PLAYERS
250
Sage DREAM7
challenge, game
announcement
200
Other
150
Did not state
none
New player
registrations 100
BA
MSc
50
PhD
Au…
Jul-…
Jun…
Ma…
Apr…
Ma…
Fe…
Jan…
De…
No…
Oct…
0
Se…
%PhD
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
MD
32. GAMES PLAYED
• 9,904 games (non training)
Total games played per player
games played, top 20 players
10000
800
PhD
700
1000
Total
games
played
600
MD
500
100
MS
400
300
10
PhD
200
100
1
0
0
200
400
600
800
0
5
Player
PhD
10
15
20
25
33. GENE RANKINGS FROM GAMES
make predictions
find patterns
<10 yr
survival
>10 yr
survival
34. GENE RANKINGS FROM GAMES
•
For each gene:
1. O = number of times it appeared in a game (some genes occur on multiple boards, all
boards are played multiple times, all occurrences are counted)
2. S = number of times it was selected by a player
3. F = S/0
•
Games can be filtered based on player data
•
We can estimate an empirical P value for each value of O, S
•
P reflects the chances of getting S or more by chance given O
Examples (all games):
•
B-cell lymphoma 2 gene:
O = 13, S = 10, F = 10/13 = 0.77, P < 0.0001
•
Alanine and arginine rich domain containing protein:
O = 33, S = 3, F = 3/33 = 0.09, P = 0.91
35. GENES SELECTED BY ALL PLAYERS
9904 GAMES
P<0.001, 60 GENES
Top 10 enriched disease annotations
n genes
adj. P < 2.43e-06
background = 3731 genes
used in any game
Top 10 genes
Wang, Jing, et al. "WEB-based GEne SeT
AnaLysis Toolkit (WebGestalt): update 2013."
Nucleic acids research (2013).
36. GENES SELECTED BY PEOPLE:
WITH PHDS
WITH KNOWLEDGE OF CANCER,
2373 GAMES
P<0.001, 82 GENES
Top 10 enriched disease annotations
“Expert Gene Set”
n genes
adj. P < 5.76e-08
Top 10 genes
37. GENES SELECTED BY PEOPLE:
WITHOUT PHDS,
WITH NO KNOWLEDGE OF CANCER,
THAT ARE NOT BIOLOGISTS
3607 GAMES
P<0.001 , 10 GENES
Top 10 genes
• Gene set not
significantly enriched
with any disease
annotations
39. EVEN WITHOUT FILTERING, THE DATA CONTAINS
THE KNOWLEDGE
•
“All Players” still contained significant cancer signal.
40. PROBLEM: GENE SELECTION INSTABILITY
instability: different methods, different datasets
produce different gene sets for the same phenotype [1]
[1] Griffith, Obi L., et al. "A robust prognostic signature for hormone-positive node-negative breast cancer." Genome Medicine 5.10 (2013).
41. GENE SET OVERLAPS, SOME BUT NOT MUCH
“Expert Gene Set”
http://bioinformatics.psb.ugent.be/webtools/Venn/
42. PROBLEM: THE VALIDATION GAP
training
data, test
data
validation
validation: predictive signatures often perform
worse on independent data created for validation.
Photograph by Richard Hallman, National Geographic Adventure Blog
43. CLASSIFIER PERFORMANCE WITH DIFFERENT
GENE GROUPS, DIFFERENT DATASETS
10 year survival
Yes
No
X-axis Test Set performance
Griffith 2013 data
“Expert Gene Set”
Y-axis Test Set performance
Metabric training Oslo Test
Only difference between
points, are the genes used to
build SVM classifier
44. SUMMARY
Plusses
•
1 year
•
1,000 players, 150 PhDs
•
10,000 games
•
“expert knowledge” captured through an
open game
Minuses
•
New gene ranking method with results
competitive with established approaches
•
Game is now in use in an undergraduate
class
•
Did not make a significantly better breast
cancer survival predictor
•
Game could have been better in many ways
• no beginning, middle or end
• random guessing can win
• easy to cheat
46. THE END
Thanks to:
Players!!!!
Andrew Su
Salvatore Loguercio
Max Nanis
Karthik Gangavarapu
Funding
More information at:
http://genegames.org/cure/
bgood@scripps.edu
@bgood
We are hiring! Looking for
postdocs, programmers
interested in crowdsourcing
and bioinformatics.
Contact: asu@scripps.edu
47. GAMES WITH A PURPOSE
of collecting expert level knowledge
Khatib, Firas, et al. "Algorithm discovery by
protein folding game players." Proceedings of
the National Academy of Sciences (2011)
Loguercio, Salvatore, et al.
"Dizeez: an online game for
human gene-disease
annotation." PloS One (2013)
MOLT
The Cure
48. HUMAN GUIDED FOREST (HGF)
Let CURE players build
decision modules
http://i9606.blogspot.com/2012/04/human-guided-forests-hgf.html
49. WHY DID YOU SIGN UP? (83 RESPONSES)
Why did you sign up for The Cure? (select all that apply)
90.0%
80.0%
70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
To help breast cancer research
To learn something
To have fun playing a game
50. WAS THE GAME FUN?
0.8
0.7
0.6
percent
0.5
0.4
0.3
0.2
0.1
0
Yes, it was very fun
A little bit entertaining
No, not at all
51. DO YOU KNOW ANYONE THAT HAS OR HAD
BREAST CANCER?
Have you known or do you currently know anyone that has or has had breast cancer?
Yes
No
52. DID YOU LEARN ANYTHING FROM PLAYING?
60
50
40
30
20
10
0
Yes, I felt like I learned a lot
Yes, I learned a little bit
No, I did not learn anything
53. MY KNOWLEDGE OF BREAST CANCER IS:
0.6
0.5
0.4
0.3
0.2
0.1
0
I am an expert in breast I have helped conduct I know some biology and I know a little biology, but Nothing, I do not know a
cancer
cancer research ias part have some understanding nothing specific to cancer
thing about it
of my job
of what cancer is
54. AGE?
Which category below includes your age?
17 or younger
18-20
21-29
30-39
40-49
50-59
60 and above
60. OVERLAP OF SIGNIFICANT GENE SETS FROM
DIFFERENT CURE GAME FILTERS
PhD or MD (3,070 games)
Cancer Knowledge (4,660 games)
Biologist (4,913 games)
PhD & Cancer Knowledge (2,373 games)
No Expertise (3,607 games)
61. MOST RANDOM GENE EXPRESSION SIGNATURES ARE
SIGNIFICANTLY ASSOCIATED WITH BREAST CANCER
OUTCOME
Still need to pick gene sets
Feature selection challenge still relevant
Very useful grain of salt in interpreting these results..
Venet et al.(2011). PLoS Comp. Bio.
Editor's Notes
What if we could harness just a tiny fraction of that human effort???
All of this work still requires human effort
a, Two-dimensional presentation of transcript ratios for 98 breast tumours. There were 4,968 significant genes across the group. Each row represents a tumour and each column a single gene. As shown in the colour bar, red indicates upregulation, green downregulation, black no change, and grey no data available. The yellow line marks the subdivision into two dominant tumour clusters. b, Selected clinical data for the 98 patients in a: BRCA1 germline mutation carrier (or sporadic patient), ER expression, tumour grade 3 (versus grade 1 and 2), lymphocytic infiltrate, angioinvasion, and metastasis status. White indicates positive, black negative and grey denotes tumours derived from BRCA1 germline carriers who were excluded from the metastasis evaluation. The cluster below the yellow line consists of 36 tumours, of which 34 are ER negative (total 39 ER-negative) and 16 are carriers of the BRCA1 mutation (total 18). c, Enlarged portion from a containing a group of genes that co-regulate with the ER- gene (ESR1). Each gene is labelled by its gene name or accession number from GenBank. Contig ESTs ending with RC are reverse-complementary of the named contig EST. d, Enlarged portion from a containing a group of co-regulated genes that are the molecular reflection of extensive lymphocytic infiltrate, and comprise a set of genes expressed in T and B cells. (Gene annotation as in c.)
a, Two-dimensional presentation of transcript ratios for 98 breast tumours. There were 4,968 significant genes across the group. Each row represents a tumour and each column a single gene. As shown in the colour bar, red indicates upregulation, green downregulation, black no change, and grey no data available. The yellow line marks the subdivision into two dominant tumour clusters. b, Selected clinical data for the 98 patients in a: BRCA1 germline mutation carrier (or sporadic patient), ER expression, tumour grade 3 (versus grade 1 and 2), lymphocytic infiltrate, angioinvasion, and metastasis status. White indicates positive, black negative and grey denotes tumours derived from BRCA1 germline carriers who were excluded from the metastasis evaluation. The cluster below the yellow line consists of 36 tumours, of which 34 are ER negative (total 39 ER-negative) and 16 are carriers of the BRCA1 mutation (total 18). c, Enlarged portion from a containing a group of genes that co-regulate with the ER- gene (ESR1). Each gene is labelled by its gene name or accession number from GenBank. Contig ESTs ending with RC are reverse-complementary of the named contig EST. d, Enlarged portion from a containing a group of co-regulated genes that are the molecular reflection of extensive lymphocytic infiltrate, and comprise a set of genes expressed in T and B cells. (Gene annotation as in c.)
Main reason statistically is inadequate sample size and correlated data structure. (Xu 2010).Makes it difficult to trust the predictors when different genes appear every time.
though progress is being made on this issue, e.g. Margolin showed very good agreement between cross-validation, test set, and validation performance for models submitted to Sage challenge.
a, Two-dimensional presentation of transcript ratios for 98 breast tumours. There were 4,968 significant genes across the group. Each row represents a tumour and each column a single gene. As shown in the colour bar, red indicates upregulation, green downregulation, black no change, and grey no data available. The yellow line marks the subdivision into two dominant tumour clusters. b, Selected clinical data for the 98 patients in a: BRCA1 germline mutation carrier (or sporadic patient), ER expression, tumour grade 3 (versus grade 1 and 2), lymphocytic infiltrate, angioinvasion, and metastasis status. White indicates positive, black negative and grey denotes tumours derived from BRCA1 germline carriers who were excluded from the metastasis evaluation. The cluster below the yellow line consists of 36 tumours, of which 34 are ER negative (total 39 ER-negative) and 16 are carriers of the BRCA1 mutation (total 18). c, Enlarged portion from a containing a group of genes that co-regulate with the ER- gene (ESR1). Each gene is labelled by its gene name or accession number from GenBank. Contig ESTs ending with RC are reverse-complementary of the named contig EST. d, Enlarged portion from a containing a group of co-regulated genes that are the molecular reflection of extensive lymphocytic infiltrate, and comprise a set of genes expressed in T and B cells. (Gene annotation as in c.)
Walk you through it as a player and then I’ll explain what is going on.
Playing 10,000 75-game series, we would only expect 27 or more occurrences of a particular gene in 1 of the 10,000 series.BCL2: B-cell lymphoma 2, regulator of apoptosisAARD: alanine and arginine rich domain containing protein, no information
Playing 10,000 75-game series, we would only expect 27 or more occurrences of a particular gene in 1 of the 10,000 series.BCL2: B-cell lymphoma 2, regulator of apoptosisAARD: alanine and arginine rich domain containing protein, no informationSingle tailed : the chances of getting S by chance given O
Disease terms came from PharmGKB associations to genes made using NCBI gene and Pubmed.
Disease terms came from PharmGKB associations to genes made using NCBI gene and Pubmed.
Main reason statistically is inadequate sample size and correlated data structure. (Xu 2010).
though progress is being made on this issue, e.g. Margolin showed very good agreement between cross-validation, test set, and validation performance for models submitted to Sage challenge.