This document discusses WikiPathways, an open source pathway database. It began in 2007 with the goals of having an online platform by March 2007 and gaining a first unknown user by January 2008, both of which were successes. WikiPathways has grown significantly since, now containing over 400 human pathways and 6,200 unique human genes. It receives over 1 million pageviews annually. The document advocates for opening up data and code to make omics technology more useful. It describes WikiPathways' various features including its BioPAX format, REST services, and integration with Cytoscape. It also discusses professionalizing open source and collaborating with existing communities and tools rather than trying to change the world alone.
Presentation pathway extensions using knowledge integration and network approaches presented at the Systems Biology Institute in Luxembourg on November 28 2012.
Keynote presented at the Phenotype Foundation first annual meeting.
Describes data sharing, data annotation and the needs for further tool and ontology and ontology mapping development.
Amsterdam, January 18, 2016
Using ontologies to do integrative systems biologyChris Evelo
To really get ahead with complex health problems like cancer and diabetes we need to become better at combining different types of studies, including large scale genomics and genetics studies and we need to learn to better combine such studies with biological knowledge we already. Typically that leads to questions like “I did this study with high-fat low fat diet comparison in mice and looked at the transcriptomics results in liver, fat and muscle. Did somebody else maybe do a study like that and publish the data, maybe for proteomics? Could I find that in one of these open data repositories?”. Or, “I did that, can I find which biological pathways are affected most and whether any of the proteins in that pathway is a known target for an existing drug?”. Or even “I did that study, could I find another study that yielded the same kind of biological results even if it was from a different research field with a completely different result?”.
To answer this kind of questions we need to describe studies and study results, structure knowledge allow mapping of “equal” things with different identifier schemes and essentially do a lot of mapping to and between ontologies. More and more of this is getting real and I will try to describe some of that.
Homepage for this webinar is here: http://www.bioontology.org/ontologies-in-integrative-systems-biology
It is part of this series: http://www.bioontology.org/webinar-series
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
A half day course presented during the Earlham Institute summer school on bioinformatics 2016, in Norwich, UK, http://www.earlham.ac.uk/earlham-institute-summer-school-bioinformatics
Presentation pathway extensions using knowledge integration and network approaches presented at the Systems Biology Institute in Luxembourg on November 28 2012.
Keynote presented at the Phenotype Foundation first annual meeting.
Describes data sharing, data annotation and the needs for further tool and ontology and ontology mapping development.
Amsterdam, January 18, 2016
Using ontologies to do integrative systems biologyChris Evelo
To really get ahead with complex health problems like cancer and diabetes we need to become better at combining different types of studies, including large scale genomics and genetics studies and we need to learn to better combine such studies with biological knowledge we already. Typically that leads to questions like “I did this study with high-fat low fat diet comparison in mice and looked at the transcriptomics results in liver, fat and muscle. Did somebody else maybe do a study like that and publish the data, maybe for proteomics? Could I find that in one of these open data repositories?”. Or, “I did that, can I find which biological pathways are affected most and whether any of the proteins in that pathway is a known target for an existing drug?”. Or even “I did that study, could I find another study that yielded the same kind of biological results even if it was from a different research field with a completely different result?”.
To answer this kind of questions we need to describe studies and study results, structure knowledge allow mapping of “equal” things with different identifier schemes and essentially do a lot of mapping to and between ontologies. More and more of this is getting real and I will try to describe some of that.
Homepage for this webinar is here: http://www.bioontology.org/ontologies-in-integrative-systems-biology
It is part of this series: http://www.bioontology.org/webinar-series
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
A half day course presented during the Earlham Institute summer school on bioinformatics 2016, in Norwich, UK, http://www.earlham.ac.uk/earlham-institute-summer-school-bioinformatics
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGijbbjournal
Latest progress in biology, medical science, bioinformatics, and biotechnology has become important and
tremendous amounts of biodata that demands in-depth analysis. On the other hand, recent progress in data
mining research has led to the development of numerous efficient and scalable methods for mining
interesting patterns in large databases. This paper bridge the two fields, data mining and bioinformatics
for successful mining of biological data. Microarrays constitute a new platform which allows the discovery
and characterization of proteins.
Basics of Data Analysis in BioinformaticsElena Sügis
Presentation gives introduction to the Basics of Data Analysis in Bioinformatics.
The following topics are covered:
Data acquisition
Data summary(selecting the needed column/rows from the file and showing basic descriptive statistics)
Preprocessing (missing values imputation, data normalization, etc.)
Principal Component Analysis
Data Clustering and cluster annotation (k-means, hierarchical)
Cluster annotations
Today ChemSpider (www.chemspider.com) is one of the community’s primary online resources for chemists. Now hosting over 28 million unique chemical compounds linked to over 400 data sources, ChemSpider offers its users a structure centric platform facilitating access to publications and patents, experimental and predicted property data, spectral data and many other forms of data and information that can benefit a chemist. ChemSpider is a crowdsourcing platform allowing the community to contribute data directly to the database by allowing the deposition and sharing of structure data, properties, spectra and reaction syntheses. The crowdsourcing also allows for the annotation and curation of existing data thereby allowing the community to assist in the much-needed curation and validation of chemistry data on the internet. This work is imperative in order to provide the chemistry underpinnings to semantic web projects such as Open PHACTS (www.openphacts.org) of which Merck is sure to benefit when it is released to the community. This presentation will provide an overview of the ChemSpider platform and will also examine the challenges of dealing with heterogeneous data quality when attempting to provide a rich resource of data for the community. If you use the internet to research chemistry based data this presentation will be an essential guide to how to source high quality data.
Presentaion for NetBio SIG 2013 by Robin Haw, Scientific Associate and Outreach Coordinator, Ontario Institute for Cancer Research. “Reactome Knowledgebase and Functional Interaction (FI) Cytoscape Plugin”
Event: Plant and Animal Genomes conference 2012
Speaker: Rachael Huntley
The Gene Ontology (GO) is a well-established, structured vocabulary used in the functional annotation of gene products. GO terms are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently, GO consists of more than 35,000 terms describing the molecular function, biological process and subcellular location of a gene product in a generic cell. The UniProt-Gene Ontology Annotation (UniProt-GOA) database1 provides high-quality manual and electronic GO annotations to proteins within UniProt. By annotating well-studied proteins with GO terms and transferring this knowledge to less well-studied and novel proteins that are highly similar, we offer a valuable contribution to the understanding of all proteomes. UniProt-GOA provides annotated entries for over 387,000 species and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. Annotation files for various proteomes are released each month, including human, mouse, rat, zebrafish, cow, chicken, dog, pig, Arabidopsis and Dictyostelium, as well as a file for the multiple species within UniProt. The UniProt-GOA dataset can be queried through our user-friendly QuickGO browser2 or downloaded in a parsable format via the EBI3 and GO Consortium FTP4 sites. The UniProt-GOA dataset has increasingly been integrated into tools that aid in the analysis of large datasets resulting from high-throughput experiments thus assisting researchers in biological interpretation of their results. The annotations produced by UniProt-GOA are additionally cross-referenced in databases such as Ensembl and NCBI Entrez Gene.
1 http://www.ebi.ac.uk/GOA
2 http://www.ebi.ac.uk/QuickGO
3 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa
4 ftp://ftp.geneontology.org/pub/go/gene-associations
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGijbbjournal
Latest progress in biology, medical science, bioinformatics, and biotechnology has become important and
tremendous amounts of biodata that demands in-depth analysis. On the other hand, recent progress in data
mining research has led to the development of numerous efficient and scalable methods for mining
interesting patterns in large databases. This paper bridge the two fields, data mining and bioinformatics
for successful mining of biological data. Microarrays constitute a new platform which allows the discovery
and characterization of proteins.
Basics of Data Analysis in BioinformaticsElena Sügis
Presentation gives introduction to the Basics of Data Analysis in Bioinformatics.
The following topics are covered:
Data acquisition
Data summary(selecting the needed column/rows from the file and showing basic descriptive statistics)
Preprocessing (missing values imputation, data normalization, etc.)
Principal Component Analysis
Data Clustering and cluster annotation (k-means, hierarchical)
Cluster annotations
Today ChemSpider (www.chemspider.com) is one of the community’s primary online resources for chemists. Now hosting over 28 million unique chemical compounds linked to over 400 data sources, ChemSpider offers its users a structure centric platform facilitating access to publications and patents, experimental and predicted property data, spectral data and many other forms of data and information that can benefit a chemist. ChemSpider is a crowdsourcing platform allowing the community to contribute data directly to the database by allowing the deposition and sharing of structure data, properties, spectra and reaction syntheses. The crowdsourcing also allows for the annotation and curation of existing data thereby allowing the community to assist in the much-needed curation and validation of chemistry data on the internet. This work is imperative in order to provide the chemistry underpinnings to semantic web projects such as Open PHACTS (www.openphacts.org) of which Merck is sure to benefit when it is released to the community. This presentation will provide an overview of the ChemSpider platform and will also examine the challenges of dealing with heterogeneous data quality when attempting to provide a rich resource of data for the community. If you use the internet to research chemistry based data this presentation will be an essential guide to how to source high quality data.
Presentaion for NetBio SIG 2013 by Robin Haw, Scientific Associate and Outreach Coordinator, Ontario Institute for Cancer Research. “Reactome Knowledgebase and Functional Interaction (FI) Cytoscape Plugin”
Event: Plant and Animal Genomes conference 2012
Speaker: Rachael Huntley
The Gene Ontology (GO) is a well-established, structured vocabulary used in the functional annotation of gene products. GO terms are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently, GO consists of more than 35,000 terms describing the molecular function, biological process and subcellular location of a gene product in a generic cell. The UniProt-Gene Ontology Annotation (UniProt-GOA) database1 provides high-quality manual and electronic GO annotations to proteins within UniProt. By annotating well-studied proteins with GO terms and transferring this knowledge to less well-studied and novel proteins that are highly similar, we offer a valuable contribution to the understanding of all proteomes. UniProt-GOA provides annotated entries for over 387,000 species and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. Annotation files for various proteomes are released each month, including human, mouse, rat, zebrafish, cow, chicken, dog, pig, Arabidopsis and Dictyostelium, as well as a file for the multiple species within UniProt. The UniProt-GOA dataset can be queried through our user-friendly QuickGO browser2 or downloaded in a parsable format via the EBI3 and GO Consortium FTP4 sites. The UniProt-GOA dataset has increasingly been integrated into tools that aid in the analysis of large datasets resulting from high-throughput experiments thus assisting researchers in biological interpretation of their results. The annotations produced by UniProt-GOA are additionally cross-referenced in databases such as Ensembl and NCBI Entrez Gene.
1 http://www.ebi.ac.uk/GOA
2 http://www.ebi.ac.uk/QuickGO
3 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa
4 ftp://ftp.geneontology.org/pub/go/gene-associations
National Centre for Student Equity in Higher Education (NCSEHE) Director Professor Sue Trinidad presents, "Student equity: policy and practice" at the ACER-sponsored Strategies for Student Retention conference held in Melbourne on Tuesday 29 and Wednesday 30 September 2015. Professor Trinidad provides an overview of the NCSEHE's work, including the development of student personas in order to better identify cohorts of students requiring additional support, and strategies with which to assist.
A theory from anthropologist and psychologist Robin Dunbar states that the brain capacity of humans limits the number of stable social relationships they can maintain to 150. But what does that mean for B2B organizations with countless contacts?
A Digital Bill of Rights for the Internet, by the InternetMashable
The digital rights conversation was thrust into the mainstream spotlight after news of ongoing, widespread mass surveillance programs leaked to the public. Always a hot topic, these revelations sparked a strong online debate among the Internet community.
To highlight some of the great conversations taking place about digital rights online, we asked the digital community to collaborate with us on the creation of a crowdsourced Digital Bill of Rights.
After six weeks of public discussions, document updates and changes, as well as incorporating input from digital rights experts, Mashable is pleased to unveil its first-ever Digital Bill of Rights, made for the Internet, by the Internet.
For more details on the document: http://on.mash.to/17J4ufh
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
ASI 07 - How Auditing Radio campaigns helps Improve Planning and Buying Effic...Paola Furlanetto
. building and maintaining pools for radio auditing
. planning and buying radio: most common inefficiencies
. fine tuning radio planning and buying: 5 guidelines
Benchmarking the Accounting & Finance Function: 2014 Summary PresentationRobert Half
Is your finance and accounting team ready to drive your success throughout 2014? Robert Half’s fifth annual Benchmarking the Accounting & Finance Function report provides metrics on staffing, financial systems, outsourcing and more. Find out how your company measures up to its peers.
Building bioinformatics resources for the global communityExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
Building bioinformatics resources for the global community. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management and GMI-9, 23-25 May 2016, Rome, Italy.
OVium Bio-Information Solutions use forefront algorithms to analyze key data resources such NCBI, EBLM and PDB to develop cell signal pathways.
OVium employs cloud and MPP computing solutions with homology and signal network mapping to develop chemical and protein pathways for discovery research.
Presented by Richard Kidd at "The Future Information Needs of Pharmaceutical & Medicinal Chemistry", Monday 28 November 2011 at The Linnean Society, Burlington Square, London run by the RSC CICAG group.
A full picture of -omics cellular networks of regulation brings researchers closer to a realistic and reliable understanding of complex conditions. For more information, please visit: http://tbioinfopb.pine-biotech.com/
T-Bioinfo is a comprehensive bioinformatics platform that allows the user to navigate NGS, Mass-Spec and Structural Biology data analysis pipelines using consistent interface. Analysis and integration of such data allows for better and faster discovery and optimization of personalized and precision treatment of complex diseases and understanding of medical conditions. For more information, go to pine-biotech.com
Professor Carole Goble, University of Manchester, talks at the RIN "Research data: policies & behaviour" event as part of a series on Research Information in Transition.
QTLNetMiner - Efficient search and prioritization of gene evidence networksKeywan Hassani-Pak
Introduction slides to set the scene for the QTLNetMiner demo available on YouTube https://www.youtube.com/watch?v=1FDCVrlB6G4x. For updates follow @KeywanHP.
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
With a focus on scalable architecture and optimized native code that fully utilizes the CPU and RAM available, we can scale genomic analysis into sizes conventionally considered Big Data on a single host. In this webcast, we demonstrate recent innovations and features in Golden Helix solutions that enable the analysis of big data on your own terms.
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...Araport
PMR database is a community resource for deposition and analysis of metabolomics data and related transcriptomics data. PMR currently houses metabolomics data from over 25 species of eukaryotes. In this talk, we introduce PMRs RESTful web APIs for data sharing, and demonstrate its applications in research using Araport to provide Arabidopsis metabolomics data.
Similar to WikiPathways: how open source and open data can make omics technology more useful (20)
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Generative AI Deep Dive: Advancing from Proof of Concept to Production
WikiPathways: how open source and open data can make omics technology more useful
1. WikiPathways: how open
source code and open data
can make omics technology
more useful
@Chris_Evelo
Department of Bioinformatics - BiGCaT
Maastricht University
13. Set Early Milestones
• Online (Mar ‘07) Success!
• Firstunknown user (Jan (Jan ’08)
• First unknown user ’08)
14. 400
Number of human pathways
Number of unique human genes
320
240
6,200
160
4,700
80
3,200
http://www.wikipathways.org
15. 2,800
Over 1 million pageviews
by 280,000 unique visitors
1,400
0
~22%
http://www.wikipathways.org
16. Don’t Try to Change the World
Work with (not against) established:
• First unknown user (Jan ’08)
• Models
• Communities
• Tools and pipelines
• Publishing models
17. Go Ahead, Change the World
• Tweak established models
••First unknown user (Jan ’08)
Grow communities
• Change perspectives
• everyone is a curator
• knowledge should be open
18. Go Ahead, Change the World
• Tweak established models
• Grow communities
• Change perspectives
• New attribution systems
• redefine “publication”
• redefine “productive”
19. Go Ahead, Change the World
• Tweak established models
• Grow communities
• Change perspectives
• New attribution systems
• New analysis pipelines
• connect with other communitycurated resources
20.
21. Professional Open Source
• Subversion source repository
• License!
• Development web site
• Bug tracker
• Mailing lists
• Development and Release plans
• Modular (plugins, OSGi)
22. The New Agilent
Academic
Government
Research
Chemical &
Energy
Forensic &
Drug testing
Pharmaceutical
Cancer
Diagnostics
Cytogenetic
research
Environment
Genomics for
Molecular
Analysis
Food
Food
CORE TECHNOLOGY PLATFORMS
Gas chromatography | Liquid Chromatography | Mass spectrometry | Spectroscopy | NMR Spectroscopy | Automation |
Software | Chemistries | Immunohistochemistry | FISH probes | Microarrays | Target enrichment | Bioreagents | Services
26/11/13
22
23. Key Academic Innovation Partners
About 100 active university collaborations annually
U of Washington
U of Calgary
UC Davis
UCSF
UC Berkeley
Stanford
Technical U of Denmark
Karolinska Inst., Sweden
U of Michigan
CU Boulder
UC San Diego
Baylor U
Yale
MIT
Princeton
U of Kent, UK
Harvard
U of Illinois
Arizona State
Karlsruhe U, Germany
NIBRT, Ireland
Johns Hopkins
U of Maryland
U of Twente, NL
Maastricht U, NL
U of Manchester, UK
Tsinghua U, China
Charles U, Czech Republic
U of Oviedo, Spain
Technion, Israel
Seoul Nat’l U, Korea
Chungnam Nat’l U, Korea
Tohoku U, Japan
U of Alicante, Spain
Hamner Inst.
U of Texas
U of Houston
Southeast U, China
U Teknologi MARA, Malaysia
NUS, Singapore
NTU, Singapore
U of São Paulo, Brazil
U of Queensland
UNICAMP, Brazil
Electronics
Chemical Analysis
Life Sciences
U of Pretoria, S. Africa
Macquarie U
Curtin U.
U of W. Australia
U of Sydney
U Tech, Sydney
RMIT U
Monash U
U of Tasmania
26/11/13
23
24. Agilent Thought Leader Program
http://www.agilent.com/univ_relation/TLP/index.shtml
TL Network
Thought Leader Awards
Promote fundamental advances in life science
through contribution to research of thought
leaders
• Align societal trends, academic research and
rapidly advancing Agilent measurement platforms:
Synthetic Biology, Structural Biology, OMICS &
Integrated Biology, in vitro Toxicology, and
Environemntal & Food Safety
• Candidates selected based on scientific
leadership, productivity, project significance
(invitational program)
• High-level executive sponsorship and active support
throughout Agilent enable breakthrough research
Early Career Professor Award
Establish strong collaborative
relationships with highly promising
early career professors
2013 focus:
Contributions to cancer diagnostics
26/11/13
24
32. • Map and visualize data from one or
two types of –omic data on
pathways
Metabolite Data
Overlay
• Search, browse and filter pathways
Supports pathways from:
•WikiPathways
•BioCyc
•Supported pathway formats
•
List of all pathway entities, dynamically
linked to pathway selection
BioPAX 3 – Pathway Commons, Reactome,
NCI Nature Pathway
•
GPML – PathVisio –custom drawing
Export compound list from pathways
37. Propose new experiments
based on pathway analysis
• Re-examine acquired untargeted
metabolomics data based on
pathway analysis
• Design new experiments (metabolite,
protein or genes) based on pathway
results interpretation
Build custom metabolite
database
PCDL
Custom microarray or NGS
design
eArray
Targeted MS/MS
Spectrum Mill
38. Typical workflow used for identification of a
relevant pathway using GeneSpring
Identification of
differential expression
Statistical analysis and
filtering
Curated pathway
analysis using
Wikipathways
Network analysis using
NLP to identify
interaction of pathways
39. Identification of Candidate Genes
Step 1) Identification of
differentially expressed
genes via hierarchical
cluster analysis
Step 2) Volcano plot of showing
significantly differentially expressed
between two conditions
40. From Differential Expression to Pathways
Step 3) Significantly changed
pathways in Müller cells identified
using pathway analysis in
GeneSpring
Step 4) The Protein-Protein Interactions
analysis was further performed to identify
the direct interaction of these genes
products in GeneSpring
Pathway analysis showed significant changes in MAPK signaling at both conditions.
Network analysis shows interaction of MAPK with other gene products.
Compare network analysis/extension in Cytoscape.
41. Combine further with
Open Knowledge
e.g. IMI semantic web project Open PHACTS
Pathway content and extension
Open Data
e.g. ISAtab based study capturing
in phenotype database (dbNP)
pathway analysis and profiling
42. OPS Framework
Architecture. Dec 2011
OPS GUI
App
Framework
Web Service API
Identity &
Vocabulary
Management
Sparql
OPS Data Model
Semantic Data Workflow Engine
RDF Data Cache
Chemistry
Normalisation &
Registration
Descriptor
Feed in WikiPathways
RDF 1
relationships, use BioPAX
to create the RDF
Public
Data 1
Vocabularies
Descriptor
Descriptor
Descriptor
Nanopub
Nanopub
RDF 2
RDF 3
RDF 4
Data 2
Data 3
Data 4
Web
Services
44. Generic Study Capture Framework
Data input / output
GSCF
Templates
Templates
Templates
Events
Molgenis
custom
custom
custom
programs
programs
programs
Protocols
Samples
NCBO
Ontologies
Groups
Assays
web
interface
EBI
repository
Data import
xls, cvs, text
Subjects
custom
custom
custom
dbs
dbs
dbs
45. dbNP Architecture
GSCF
Subjects
Groups
Events
Transcriptomics module
Raw data
cell files
Clean data
gene
expression
Result data
p-values
z-values
Protocols
Epigenetics module
Samples
Assays
Raw data
Nimblegen
Illumina
Clean
CPG island
data
Resulting
Genome
Feature data
Pathways, GO, metabolite profiles
Body weight, BMI, etc.
Templates
Templates
Templates
Query module
Simple Assay module
Full-text querying
Structured
querying
Profile-based analysis
Study comparison
Use PathVisioRPC to use
WikiPathways
content
Web user interface
Faculty of Health, Medicine and Life Sciences
46. Thomas Kelder
Martijn van Iersel
Kristina Hanspers
Martina Kutmon
Andra Waagmeester
Chris Evelo
Bruce Conklin
nrnb.org
wikipathways.org
Acknowledgements
Editor's Notes
Here, for example, is a typical pathway from WikiPathways. Like most textbook pathways, it depicts proteins and metabolites, reactions and complexes, and their localization into subcellular compartments. But each one of these rectangles is also a data object connected to a database of standard identifiers that can be mapped to a variety of databsets.
Here, for example, we’re seeing differential expression data with up- and down-regulated genes in yellow and blue. When a biologist looks at this, something very special happens. A little movie is triggered in their mind……the room goes quiet and their focus is drawn to a particular area of interest; they think about kinetics, rate-limiting factors, conditions and timing; they consider a number of "what if" scenarios: "what if this cascade of events……could be blocked by increasing this factor". This is in fact exactly what happens when we take statin durgs to lower our cholesterol. What I’m trying to illustrate here with ppt tricks is the act of visualization…
The data-mapped image is really just sitting there and all of this is just going on in the mind of the researcher, right? …the researcher takes in this visual data, which allows it to mix with all the other associations up there from prior observations they’ve made (the majority of which have not been published or put into textbooks) and from conversations they’ve had with colleagues (at conferences like this).This wouldn’t be necessary if we could parameterize all these subtle associations……and model them all in a supercomputer. Then we could just mix in all known interactions, molecular concentrations and kinetic rates, and just read out the answers to our questions in concrete units of information. But alas, we can’t do this (at least not yet),… …so even though this is conference on intelligent systems, I’d argue that there are a number of situations where humans are actually still really important in data analysis.
Returning now to this basic unit of pathway visualization, the pathway diagram. How exactly are these models constructed? They do not come from direct measurement. It was assembled from a wide variety of data types and assays, a curated set of observations left intentionally sparse. This pathway, for example, is showing the mechanism of a common cancer drug called 5-FU: it's metabolized in the liver, it's byproducts enter the blood stream and are taken up by cancer cells where they disrupt key pathways for cell survival. But this pathway is not representing all that we know about this process and new data about these components and their interactions continues to pour in with no end in sight. So, all we know for sure is that this model will change over time as we fill-in details and learn what’s most relevant.
This is exactly what we had in mind when we started the WikiPathways project. It's a wiki, like wikipedia, but what we did is we ripped out the text editor and replace it with our own pathway drawing tool. So anyone can find a pathway, click 'edit', and then add new information [[like a new byproduct of 5-FU that also goes to cancer cells and triggers apoptosis]]. You then click ‘save’ and your changes are immediately available to the rest of the world. You can provide literature references to cite evidence for your changes. And the entire research community is your peer review group: they can approve or undo your changes. In this way, we can keep up with the flood of new data relating to biological processes.
When you’re editing a pathway, you are not only editing the diagram, you’re also editing a standard XML file with BioPAX elements that can be exchanged and accessed programatically. For, the software developers in the audience, in addition to this XML formats, there is also web service access to the pathway content. You can programmtically return pathway images with highlighted nodes, for example. There is embed code to insert interactive pathway widgets into your own web sites, and we are starting to represent our pathway content as linked data to support semantic queries. The most common workflow today is import the XML into tools like PathVisio and Cytoscape.
After loading a pathway into Cytoscape, for example, you can then import your own dataset from an excel spreadsheet and define how you want your data to map in terms of color gradients. This process is dynamic and interactive, so you can explore your dataset in the context of these pathways. And, of course, in Cytoscape you can make use of all the other apps that are available to calculate shortest path, perform clustering or over-representation analysis.
Putting it all together, you can begin to see how we are feeding into this virtuous cycle. Data is synthesized into pathway diagrams. And orthogonal data can be mapped onto these pathway models. Computational analysis, together with the act of visualization can lead to new explanations and new ideas.And finally, these new ideas can be tested to generate new data, bringing us back to synthesis. And I have to say, the wiki model really working well here…
We’ve been collecting and curating pathways since 2001. In the years just prior to launching WikiPathways we really struggled relying on our internal curation team alone. [This was our growth curve for number of pathways in blue and number of unique genes on those pathways in green]. In the years following the launch of WikiPathways we experienced a whole new level of growth. And this last year things are really starting to take off. I might have to start using logarithmic plots for the number of pathways. It’s difficult to quantity the effect on quality, but our curation team has been thrilled by the quality and overall improvements we’re seeing in the content. Basically, no internal team can curate all of biology; this task can only be done by a distributed system.
And in terms of participation, not only has the number users increased since our launch in 2008, but so has the number of contributors, averaging at around 22%. Putting pathway editing and curation tools into the hands of researchers is the best (and only) way to keep up with the flood of new data coming in; and they are actually using them! You only need to register if you want to edit, so we also have lots of folks viewing and downloading pathways: over1 million pageviews by over a quarter million unique visitors. And these numbers don’t include access through Cytoscape, embed code and web services. I know these numbers don’t compare to wikipedia, but come on, we’re talking about biological pathways here: a niche market within the niche market of systems biology.
Agilent has a unique combination of scientists from multiple disciplines – exciting to be in a place (esp Santa Clara) where on a daily basis you meet experts in biology, engineering, chemistry, data analysis, etc)In many ways, GS embodies that collaborative spirit. GeneSpring is not new but there are many facets to integrative data analysis and we are just at the beginning.Show/Mention easy and open import of various file formats; analysis tools, visualization, biological contextualization to provide insights, new hypothesis and make it easier to get to the next experiments
DNA methylation is an important and common method for regulation of gene expression.DNA is methylated by methyl transferase enzymes which target cytosines.Methylation of DNA serves as a signal to attract proteins that lead to compact and silent chromatinMethylation of DNA serves as a mechanism for silencing genes and inactivating the X chromosome
Metabolomics is the study of the metabolome, the collection of small molecule compounds that are essential for life. They are messengers both within cells and between cells and regulators of epigenetic events. Those produced by bacteria may represent the link between the microbiome and chronic diseases in the human host such as cancer, diabetes and obesity.
DNA methylation is an important and common method for regulation of gene expression.DNA is methylated by methyl transferase enzymes which target cytosines.Methylation of DNA serves as a signal to attract proteins that lead to compact and silent chromatinMethylation of DNA serves as a mechanism for silencing genes and inactivating the X chromosome
slide showing your experiences/assessment of how well the whole WikiPathways/open source approach works.
The expression profiles were analyzed using GeneSpring software, and gene ontology analysis and the Kyoto Encyclopedia of Genes and Genomes (KEGG) were used to select, annotate, and visualize genes by function and pathway. The selected genes of interest were further validated by Quantitative Real-time PCR (qPCR).
The author states that “The expression profiles were analyzed using GeneSpring software, and gene ontology analysis and the Kyoto Encyclopedia of Genes and Genomes (KEGG) were used to select, annotate, and visualize genes by function and pathway. The selected genes of interest were further validated by Quantitative Real-time PCR (qPCR)”
An overview of the Open Phacts project that pulls in lots of information in a semantic web triple store (including information from WikiPathways RDF) and then provides that for use in other tools. In WikiPathways we use that to suggest possible pathway extensions to curators
I'd like to acknowledge the teams of developers I work with on WikiPathways. I'm also affiliated with NRNB, the National Resource for Network Biology. Part of our mission is to promote the development and use of network biology tools and resources. If you are interested in developing WikiPathways or Cytoscape, for example, let us know. One way we coordinate this is through the annual Google Summer of Code program, where Google pays students from around the world to write open source code for our projects. You can find out more at nrnb.org. Thank you.