The document discusses analyzing gene expression data from a microarray experiment measuring the transcriptional response of human fibroblasts to serum. It describes using Excel functions to preprocess the gene expression data, including normalizing replicate experiments and identifying differentially expressed genes using SAM (Significance Analysis of Microarrays). It then demonstrates applying the full microarray data analysis process on the fibroblast dataset using the online GEPAS suite, including preprocessing, clustering, and identifying differentially expressed genes. The goal is to understand how to analyze microarray data and identify significantly expressed genes in response to stimuli.
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing, presented by Dr Mark Behlke, Chief Scientific Officer at Integrated DNA Technologies
This is the webinar presented on the 14th April as part of the Ensembl Online Webinar series. You can view the recorded webinar on the Ensembl Helpdesk youtube channel https://www.youtube.com/watch?v=blbhuqiiDoA
Sample Work For Engineering Literature Review and Gap IdentificationPhD Assistance
Sample Work For Engineering Literature Review and Gap Identification - PhD Assistance - http://bit.ly/2E9fAVq
2.1 INTRODUCTION
2.2 RESEARCH GAPS IN EXISTING METHODS
2.3 OBJECTIVES OF THIS WORK
Read More : http://bit.ly/2Rl7XT5
#gapanalysis #strategicmanagement #datagapanalysis #gapanalysisppt #gapanalysishealthcare #gapanalysisfinance #gapanalysisEngineering
Final VIPER presentation at BioVis 2013martinjgraham
BioVis 2013 Presentation of VIPER paper
J. Kennedy, M. Graham, T. Paterson, and A. Law, "Visual Cleaning of Genotype Data," Proc. 3rd IEEE Symposium on Biological Data Visualization, pp. 105-112, 2013, doi:10.1109/BioVis.2013.6664353.
The videos are missing, and the animations on the error inheritance slides are all messed up after slideshare conversion... but everything else is ok.
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
Talk presented at the Simons Foundation Biotech Symposium "Complex Data Visualization: Approach and Application" (12 September 2014)
http://www.simonsfoundation.org/event/complex-data-visualization-approach-and-application/
In this talk I describe how we integrated a sophisticated computational framework directly into the StratomeX visualization technique to enable rapid exploration of tens of thousands of stratifications in cancer genomics data, creating a unique and powerful tool for the identification and characterization of tumor subtypes. The tool can handle a wide range of genomic and clinical data types for cohorts with hundreds of patients. StratomeX also provides direct access to comprehensive data sets generated by The Cancer Genome Atlas Firehose analysis pipeline.
http://stratomex.caleydo.org
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing, presented by Dr Mark Behlke, Chief Scientific Officer at Integrated DNA Technologies
This is the webinar presented on the 14th April as part of the Ensembl Online Webinar series. You can view the recorded webinar on the Ensembl Helpdesk youtube channel https://www.youtube.com/watch?v=blbhuqiiDoA
Sample Work For Engineering Literature Review and Gap IdentificationPhD Assistance
Sample Work For Engineering Literature Review and Gap Identification - PhD Assistance - http://bit.ly/2E9fAVq
2.1 INTRODUCTION
2.2 RESEARCH GAPS IN EXISTING METHODS
2.3 OBJECTIVES OF THIS WORK
Read More : http://bit.ly/2Rl7XT5
#gapanalysis #strategicmanagement #datagapanalysis #gapanalysisppt #gapanalysishealthcare #gapanalysisfinance #gapanalysisEngineering
Final VIPER presentation at BioVis 2013martinjgraham
BioVis 2013 Presentation of VIPER paper
J. Kennedy, M. Graham, T. Paterson, and A. Law, "Visual Cleaning of Genotype Data," Proc. 3rd IEEE Symposium on Biological Data Visualization, pp. 105-112, 2013, doi:10.1109/BioVis.2013.6664353.
The videos are missing, and the animations on the error inheritance slides are all messed up after slideshare conversion... but everything else is ok.
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
Talk presented at the Simons Foundation Biotech Symposium "Complex Data Visualization: Approach and Application" (12 September 2014)
http://www.simonsfoundation.org/event/complex-data-visualization-approach-and-application/
In this talk I describe how we integrated a sophisticated computational framework directly into the StratomeX visualization technique to enable rapid exploration of tens of thousands of stratifications in cancer genomics data, creating a unique and powerful tool for the identification and characterization of tumor subtypes. The tool can handle a wide range of genomic and clinical data types for cohorts with hundreds of patients. StratomeX also provides direct access to comprehensive data sets generated by The Cancer Genome Atlas Firehose analysis pipeline.
http://stratomex.caleydo.org
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
Abstract- The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
Classification of Breast Cancer Diseases using Data Mining Techniquesinventionjournals
Medical data mining has great deal for exploring new knowledge from large amount of data. Classification is one of the important data mining techniques for classification of data. In this research work, we have used various data mining based classification techniques for classification of cancer diseases patient or not. We applied the Breast Cancer-Wisconsin (Original) data set into different data mining techniques and compared the accuracy of models with two different data partitions. BayesNet achieved highest accuracy as 97.13% in case of 10-fold data partitions. We have also applied the info gain feature selection technique on BayesNet and Support Vector Machine (SVM) and achieved best accuracy 97.28% accuracy with BayesNet in case of 6 feature subset.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Session ii g3 overview behavior science mmc
1. Theme: Transcriptional Program in the Response of Human
Fibroblasts to Serum.
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Etienne.gnimpieba@usd.edu
2. Data manipulation Gene expression data analysis
OMIC World
DNA
E
DNA
mRNA
E
Degradation
Degradation
Translation
Transcription
Gene
Repression
S P
Catalyse
Genomics
Functional
Genomics
Transcriptomics
Proteomics
Metabolomics
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
3. Data manipulation Gene expression data analysis
OMIC World
GENOMICS
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
4. Data Manipulation Gene Expression Data Analysis
OMIC World
Genomics is the sub discipline of genetics devoted to the
mapping,
sequencing ,
and functional
analysis of genomics
Genomics can be said to have appeared in the 1980s, and took off in the 1990s
with the initiation of genome projects for several biological species.
The most important tools here are microarrays and bioinformatics
DNA microarrays allow for rapid measurement and visualization of differential
expression between genes at the whole genome scale. If technique implementation is
quite complicated, it’s principle is very easy. Here are described the major steps
involved in this process
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
5. Data Manipulation Gene Expression Data Analysis
Process
Biological question
Differentially expressed genes
Sample class prediction etc.
Testing
Biological verification
and interpretation
Microarray experiment
Estimation
Experimental design
Image analysis
Normalization
Clustering Discrimination
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
6. Data Manipulation Gene Expression Data Analysis
Process
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
7. Data Manipulation Gene Expression Data Analysis
Microarray Production Process
High density
filters(macroarrays)
Glass slides (microarrays) Oligonucleotides chips
Detail: Detail: Detail:
Size: 12cm x 8cm Size: 5,4cm x 0,9cm Size: 1,28cm x 1,28cm
•2400 clones by membrane
•radioactive labelling
•1 experimental condition by
membrane
•10000 clones by slide
•fluorescent labelling
•2 experimental conditions
by slide
•300000 oligonucleotides by
slide
•fluorescent labelling
•1 experimental condition by
slide
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
8. Data Manipulation Gene Expression Data Analysis
Microarray Production Process
• Frouin, V. & Gidrol, X. (2005)
• CBB group (Berlin)
• Transcriptome ENS (France) Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Expression Profile Clustering:
Slide Scanning:
Target Preparation:
Hybridization:
9. Data Manipulation Gene Expression Data Analysis
Microarray Production Process
• Frouin, V. & Gidrol, X. (2005)
• CBB group (Berlin)
• Transcriptome ENS (France)
• Image analysis (genepix)
• Normalization (R)
• Pre-treatment
• Differential expression
• Clustering
• Data mining
• Annotation
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
10. Data Manipulation Gene Expression Data Analysis
Excel Used in Genomics
• Frouin, V. & Gidrol, X. (2005)
• CBB group (Berlin)
• Transcriptome ENS (France)
• How to select columns
• How to use functions
• How to anchor a cell value in a function
• How to copy the function result and not the
function itself
• How to sort data by columns
• How to search and replace
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
11. Data Manipulation Gene Expression Data Analysis
Excel Used in Genomics: Pre-Treatment
• Frouin, V. & Gidrol, X. (2005)
• CBB group (Berlin)
• Transcriptome ENS (France)
1. Open the file containing the experiment series (your expression matrix)
in Excel software, using the tabulation character as the column separator.
2. For one column (corresponding to one DNA microarray experiment),
calculate the mean value, using the MEAN Excel function. Verify that the
value obtained is equal to zero.
3. If it is not the case, remove from each experiment log2(Ratio) value the
corresponding mean value. Be careful, for missing values (empty cells),
replace empty contents by the NULL or NA string, in order to avoid
introducing a zero value in Excel calculation in this cell. Indeed, a
missing value is different from a true null one!
4. Once this operation has been done, verify that the final mean value is
equal to zero, this in order to avoid errors with Excel handling. Be careful,
with decimal separator handling in Excel version (dot or coma)!
Centering and Scaling Data
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
12. Data Manipulation Gene Expression Data Analysis
Excel Used in Genomics : Differential Expression Analysis (1)
• Frouin, V. & Gidrol, X. (2005)
• CBB group (Berlin)
• Transcriptome ENS (France)
Significance Analysis of Microarrays (SAM):
SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet
makes this tool easier to use for most of microarray users. Using SAM implies several modifications in
your data file:
The ratio or intensity values in the Excel sheet must not contain any comas but only points as
decimal separator.
The header line depends on the type of analysis you want to perform. You can refer to SAM
manual for more information. So you must duplicate your header if you don’t want to loose the
experiment information (see image below).
Two annotation columns are available. SAM always references its calculation to the line number
in the departure sheet.
SAM (Significance Analysis of Microarray), Excel macro allowing to search for differentially expressed
genes using a bootstrapping method. Website: http://www-stat.stanford.edu/~tibs/SAM/
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
13. Data Manipulation Gene Expression Data Analysis
Excel Used in Genomics : Differential Expression Analysis (2)
• Frouin, V. & Gidrol, X. (2005)
• CBB group (Berlin)
• Transcriptome ENS (France)
When the SAM macro is launched in the tool bar (“SAM”), a setting window appears. For further
information on the various options you can choose, the best is to refer to the SAM manual. However,
the first important things to do is to indicate if the data source has been transformed in log2 or not,
then, as data bootstrapping uses a random generator, you need to initialize it several times by
creating a various number of seeds.
Once all the chosen iterations have been done, SAM displays a plot representing each gene thanks to
its score in the real distribution compared to the random distributions. Therefore, the differentially
expressed genes are the ones moving away from the 45° slope line.
First, display the delta table. This table indicates for each delta value, the number of putative
differentially expressed genes, the significant genes, and the number of false positive genes
estimated using the False Discovery Rate (FDR). The user fixes the delta value according to the
number of false positive or significant genes he wants to obtain.
To choose the delta value, get back to the SAM plot sheet and display the “SAM plot controller” by
clicking on the SAM macro button.
The SAM Plot Controller window lets you fix the delta value you want: “Manually Enter Delta”. Then if
you select the “List Significant Genes” button, SAM displays the list of differentially expressed genes
in the “SAM output” sheet according to the delta value you chose.
This sheet summarizes the selected parameters and gives you the list of induced and repressed
genes.
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
14. Data Manipulation Gene Expression Data Analysis
GEPAS: Gene Expression Pattern Analysis Suite
• Frouin, V. & Gidrol, X. (2005)
• CBB group (Berlin)
• Transcriptome ENS (France)
Verify the availability of the data file in your folder name
FibroGEPAS.txt
Open the dataset for description
Open GEPAS portal on
http://www.transcriptome.ens.fr/gepas/index.html
Click on “Tools”
Preprocessing
- Preprocess DNA array data files: log-transformation,
replicate handling, missing value imputation, filtering and
normalization
- Filtering
Viewing
Clustering
Differential expression
Classification
Data mining
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
16. • Gene Expression Measurement
• Microarray Process
• Gene Expression Data Stores
• Data Mining / Querying
• Data Analysis
• Example: ATP13A2 Profile in Stress
Conditions
17. Gene Expression Measurement
Gene
expression
technologies
Microarray
process
Gene expression
data stores
Data mining /
quering (pb-
query-
extraction-load-
store-pretreat)
Data analysis
(Question-
Answer,
descriptive,
predictive,
modeling)
Example:
ATP13A2 profile
in stress
conditions
Higher-plex techniques:
SAGE
DNA microarray
Tiling array
RNA-Seq
NGS
Low-to-mid-plex techniques:
Reporter gene
Northern blot
Western blot
Fluorescent in situ hybridization
Reverse transcription PCR
18. Database
Microarray
Experiment
Sets
Sample
Profiles
Date Reported
ArrayExpress at EBI 24,838 708,914 October 28, 2011
ArrayTrack™ 1,622 50,953 February 11, 2012
caArray at NCI 41 1,741 November 15, 2006
Gene Expression Omnibus - NCBI 25,859 641,770 October 28, 2011
Genevestigator database 2,500 65,000 January 2012
MUSC database ~45 555 April 1, 2007
Stanford Microarray database 82,542 Not reported October 23, 2011
UNC Microarray database ~31 2,093 April 1, 2007
UNC modENCODE Microarray
database
~6 180 July 17, 2009
UPenn RAD database ~100 ~2,500 September 1, 2007
UPSC-BASE ~100 Not reported November 15, 2007
SAGE GEO
GUDMAP (421) MGI
BIOGPS
Gene
expression
technologies
Microarray
process
Gene expression
data stores
Data mining /
quering (pb-
query-
extraction-load-
store-pretreat)
Data analysis
(Question-
Answer,
descriptive,
predictive,
modeling)
Example:
ATP13A2 profile
in stress
conditions
Gene Expression Measurement
19. Data Mining / Querying
• Problem specification
• Query
• Extraction
• Storage
• Load
• Pretreat / prepare for analysis
Gene
expression
technologies
Microarray
process
Gene expression
data stores
Data mining /
quering (pb-
query-
extraction-load-
store-pretreat)
Data analysis
(Question-
Answer,
descriptive,
predictive,
modeling)
Example:
ATP13A2 profile
in stress
conditions
20. Data Analysis
• Question-Answer
– Experimental condition profile: group comparison
– Annotation profile: systems biological involved
– Clustering profile: co-regulation
– Time course profile: time variation
– …
• Descriptive
– Boxplot (SD, MEAN, MEDIAN, )
– Scatter plot
• Predictive / inference (clustering)
• Modeling (machine learning, simulation)
Gene
expression
technologies
Microarray
process
Gene expression
data stores
Data mining /
quering (pb-
query-
extraction-load-
store-pretreat)
Data analysis
(Question-
Answer,
descriptive,
predictive,
modeling)
Example:
ATP13A2 profile
in stress
conditions
21. • 3 Questions
– What is the right dataset (experimental
condition)?
– Is dataset is ready for analysis (quality)?
– What is the expression profile for a given gene?
– Significant differential expression in groups
comparison
• Tools
– ArrayExpress (EBI)
– Boxplot
– GEO2R (LIMMA, profile graph,)
Gene
expression
technologies
Microarray
process
Gene expression
data stores
Data mining /
quering (pb-
query-
extraction-load-
store-pretreat)
Data analysis
(Question-
Answer,
descriptive,
predictive,
modeling)
Example:
ATP13A2 profile
in stress
conditions
Data Analysis
22. Boxplot
Gene
expression
technologies
Microarray
process
Gene expression
data stores
Data mining /
quering (pb-
query-
extraction-load-
store-pretreat)
Data analysis
(Question-
Answer,
descriptive,
predictive,
modeling)
Example:
ATP13A2 profile
in stress
conditions
Data Analysis
23. Example: ATP13A2 Profile
in Stress Conditions
• Specification: ATP13A2 profile in stress
conditions
• Data querying:
– GEO
– Array Express
– Gene Atlas
• Data analysis:
– Online: GEO2R, Genospace, …
– Desktop: R, ArrayTrack, …
Gene
expression
technologies
Microarray
process
Gene expression
data stores
Data mining /
quering (pb-
query-
extraction-load-
store-pretreat)
Data analysis
(Question-
Answer,
descriptive,
predictive,
modeling)
Example:
ATP13A2 profile
in stress
conditions
24. Resolution Process
Context
Specification & Aims
Lab #2
Preprocessing
Viewing
Clustering
Differential expression
Classification
Data mining
24
Statement of problem / Case study:
The temporal program of gene expression during a model physiological response of human cells, the response of fibroblasts to serum, was explored with a
complementary DNA microarray representing about 8600 different human genes. Genes could be clustered into groups on the basis of their temporal patterns of expression in
this program. Many features of the transcriptional program appeared to be related to the physiology of wound repair, suggesting that fibroblasts play a larger and richer role in
this complex multicellular response than had previously been appreciated.
Gene Expression Data Analysis
16 Vishwanath R. Iyer, Scince, 1999
Conclusion: ?
Aim:
The purpose of this lab is to initiate on gene expression data analysis process.
We simulated the application on “Transcriptional Program in the Response of
Human Fibroblasts to Serum” . Now we can understand how a researcher can
come to identify a significant expressed gene from microarray dataset.
T1. Gene expression overview
T2. Excel used in Genomics
Objective: used of basic excel functionalities to solve some gene
expression data analysis needs
Acquired skills
- Gene expression data overview
- Excel Used for genomics
- Microarray data analysis using GEPAS
T1.1. Review of genomics place in OMIC- world
T1.2. Microarray data technics and process
T1.3. Data analysis cycle and tools
T2.1. Colum manipulation, functions used, anchor, copy with
function, sort data, search and replace
T2.2. Experiment comparison: Data pre-treatment
T1.3. Differential expressed gene from replicate experiments (SAM)
T2. GEPAS: Gene expression analysis pattern suite
Objective: used of the GEPAS suite to apply the whole microarray data
analyzing process on fibroblast data.
http://www.transcriptome.ens.fr/gepas/index.html
Expression Profile Clustering:
Slide Scanning:
Target Preparation:
Hybridization:
Editor's Notes
During this lab, we have:A brief review Lab’s templateGenome exploration practice…
DNA fragments amplified by PCR technique are spotted on a microscopic glass slide coated with polylysine prior to spotting process. The polylysine coating goal is to ensure DNA fixation through electrostatic interactions. PCR fragments are in our case the expressed part (ORF) of the 6200 Saccharomyces cerevisae genes (baker yeast). Slide preparation is achieved by blocking the polylysine not fixed to DNA in order to avoid target binding. Prior to hybridisation, DNA is denatured to obtained a single strand DNA on the microarray, this will allow the probe to bind to the complementary strand from the target. Apart from glass slide microarray other types of chips exist
Target preparation:RNA are extracted from two yeast cultures from which we want to compare expression level. Messengers RNA are then transformed in cDNA by reverse transcription. On this stage, DNA from the first culture with a green dye, whereas DNA from the second culture is labelled with a red dye.Hybridisation:Green labelledcDNA and red labelled ones are mixed together (call the target) and put on the matrix of spotted single strand DNA (call the probe). The chip is then incubated one night at 60 degrees. At this temperature, a DNA strand that encounter the complementary strand and match together to create a double strand DNA. The fluorescent DNA will then hybridise on the spotted onesSlide scanning:A laser excites each spot and the fluorescent emission gather through a photo-multiplicator (PMT) coupled to a confocal microscope. We obtained two images where grey scales represent fluorescent intensities read. If we replace grey scales by green scales for the first image and red scales for the second one, we obtained by superimposing the two images one image composed of spots going from green ones (where only DNA from the first condition is fixed) to red (where only DNA from the second condition is fixed) passing through the yellow colour (where DNA from the two conditions are fixed on equal amount).Data analysis:We have now two microarray images from which we have to calculate the number of DNA molecules in each experimental condition. To dos o, we measure the signal amount in the green dye emission wavelength and the signal amount in the red dye emission wavelength. Then we normalise these amount according to various parameters (yeast amount in each culture condition, emission power of each dye, …). We suppose that the amount of fluorescent DNA fixed is proportional to the mRNA amount present in each cell at the beginning and we calculate the red/green fluorescence ratio. If this ratio is greater than 1 (red on the image), the gene expression is greater in the second experimental condition, if this ration is smaller than 1 (green on the image), the gene expression is greater in the first condition. We can visualize these differences in expression using software as the one developed in the laboratory call ArrayPlot (cf below image). This software allows from the intensities list of spot to display the red intensities of each spot as a function of the green intensities.Expression profile clustering:Then we can try to gather genes that share the same expression profile on several experiments. This clustering can be done gradually as for phylogenetic analysis, which consist in calculating similarity criteria between expression profiles and gather the most similar ones. We can also use more complex techniques as principal component analysis or neuronal networks.At the end hierarchical clustering is usually displayed as a matrix where each column represent one experiment and each row a gene. Ratios are displayed thanks to a colour scale going from green (repressed genes) to red (induced genes).
Once you have your normalized data file, open it with Excel. You can filter out weak intensity spots (eliminate the weakest intensities in both channels) keep spot with ratio greater than 1 or lower than –1. Remember we are working with log2(ratio) so log2(2)=1. This method called “fold change” is the one used at the beginning of microarray analysis and is still useful if you do not have enough replicates to apply statistical treatments.The “fold change” method lack accuracy regarding the significant threshold to be fixed. That’s the reason why it is useful to apply a statistical method able to take into account intensity variations and most of all, the variability among experiments.Significance Analysis of Microarrays (SAM):SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet makes this tool easier to use for most of microarray users. Using SAM implies several modifications in your data file:The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator.The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. So you must duplicate your header if you don’t want to loose the experiment information (see image below).Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet.Before launching the macro, it is necessary to select the data precisely because SAM rejects lines with too much missing values (such as empty lines).
Once you have your normalized data file, open it with Excel. You can filter out weak intensity spots (eliminate the weakest intensities in both channels) keep spot with ratio greater than 1 or lower than –1. Remember we are working with log2(ratio) so log2(2)=1. This method called “fold change” is the one used at the beginning of microarray analysis and is still useful if you do not have enough replicates to apply statistical treatments.The “fold change” method lack accuracy regarding the significant threshold to be fixed. That’s the reason why it is useful to apply a statistical method able to take into account intensity variations and most of all, the variability among experiments.Significance Analysis of Microarrays (SAM):SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet makes this tool easier to use for most of microarray users. Using SAM implies several modifications in your data file:The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator.The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. So you must duplicate your header if you don’t want to loose the experiment information (see image below).Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet.Before launching the macro, it is necessary to select the data precisely because SAM rejects lines with too much missing values (such as empty lines).
I can not say that I'm into Statistician 20 min. I give you just a few items to give rapid analysis of microarray.
The following experimental techniques are used to measure gene expression and are listed in roughly chronological order, starting with the older, more established technologies. They are divided into two groups based on their degree of multiplexity.
ArrayTrack™ provides an integrated solution for managing, analyzing, and interpreting microarray gene expression data. Specifically, ArrayTrack™ is MIAME (Minimum Information About A Microarray Experiment)-supportive for storing both microarray data and experiment parameters associated with a pharmacogenomics or toxicogenomics study. Many statistical and visualization tools are available with ArrayTrack™ which provides a rich collection of functional information about genes, proteins, and pathways for biological interpretation. The primary emphasis of ArrayTrack™ is the direct linking of analysis results with functional information to facilitate the interaction between the choice of analysis methods and the biological relevance of analysis results. Using ArrayTrack™, users can easily select a statistical method applied to stored microarray data to determine a list of differentially expressed genes. The gene list can then be directly linked to pathways and gene ontology for functional analysis.
Boxplots are useful for determining where the majority of the data lies