This document is a research statement by Chien-Wei (Masaki) Lin that summarizes his past and ongoing methodology and collaborative research projects. It discusses his interests in developing statistical methods for analyzing multi-omics data, including power calculation tools, meta-analysis and integrative analysis methods. It also summarizes some of Lin's collaboration projects applying these statistical methods to study topics like brain aging, major depressive disorder, and cardiovascular epidemiology. The document references 18 of Lin's publications and provides an overview of his diverse experience and future research plans developing statistical tools and methods and applying them to biological problems.
BsAbs are artificially designed molecules, which enable to simultaneously recognize two or more different antigens. Bispecific antibody design includes Target Based BsAb Design, MOA Based BsAb Design, Application Based BsAb Design. https://www.creative-biolabs.com/bsab/bsab-design.htm
PubChem and Its Applications for Drug DiscoverySunghwan Kim
Presentation delivered at Lehigh University (Bethlehem, PA) on Friday, April 26, 2019.
This presentation provides a brief introduction to PubChem and discusses how to use PubChem for drug discovery. More detailed information on this topic can found in the following paper:
Getting the most out of PubChem for virtual screening.
Expert Opin Drug Discov. 2016 Aug 5; 11(9):843-55.
https://doi.org/10.1080/17460441.2016.1216967
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5045798/
BsAbs are artificially designed molecules, which enable to simultaneously recognize two or more different antigens. Bispecific antibody design includes Target Based BsAb Design, MOA Based BsAb Design, Application Based BsAb Design. https://www.creative-biolabs.com/bsab/bsab-design.htm
PubChem and Its Applications for Drug DiscoverySunghwan Kim
Presentation delivered at Lehigh University (Bethlehem, PA) on Friday, April 26, 2019.
This presentation provides a brief introduction to PubChem and discusses how to use PubChem for drug discovery. More detailed information on this topic can found in the following paper:
Getting the most out of PubChem for virtual screening.
Expert Opin Drug Discov. 2016 Aug 5; 11(9):843-55.
https://doi.org/10.1080/17460441.2016.1216967
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5045798/
Cellular Signaling Pathways have direct implications on our understanding of tumor cell behavior. A general overview is presented here followed by a brief discussion of some of the major pathways currently implicated in cancer progression : Ras/RAF/MAP kinase pathway and PI3K/AKT/mTOR pathway s
Image result for homology modeling
Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein (the "template").
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its folding and its secondary and tertiary structure from its primary structure. Structure prediction is fundamentally different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry; it is highly important in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes).
A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins.
Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein (the "template"). Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence has been shown that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.
Det finns 10 store programmerings språk att välja i. Men dem är olika på många sätt, och dem används på olika områden. Dem ger också olika karriermöjlighet och lön.
Cellular Signaling Pathways have direct implications on our understanding of tumor cell behavior. A general overview is presented here followed by a brief discussion of some of the major pathways currently implicated in cancer progression : Ras/RAF/MAP kinase pathway and PI3K/AKT/mTOR pathway s
Image result for homology modeling
Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein (the "template").
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its folding and its secondary and tertiary structure from its primary structure. Structure prediction is fundamentally different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry; it is highly important in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes).
A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins.
Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein (the "template"). Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence has been shown that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.
Det finns 10 store programmerings språk att välja i. Men dem är olika på många sätt, och dem används på olika områden. Dem ger också olika karriermöjlighet och lön.
Quando uma empresa cresce é natural que passe por um momento de mudanças na forma como seus funcionários interagem, se comunicam e colaboram. Nesta apresentação, um pouco da história de como a Dafiti enfrentou este processo, as dificuldades, os erros e o processo de superação.
Wie wollen Unternehmen professionelles Marketing betreiben, wenn die Kundenadressen veraltet sind oder erst mühsam zusammengetragen werden müssen? Ist ein zielgerichteter Dialog überhaupt möglich, wenn die Unternehmen die Historie der einzelnen Kunden nicht kennen? Der Schlüssel zum Erfolg liegt beim CRM.
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONcsandit
Sequencing projects arising from high throughput technologies including those of sequencing DNA microarrays allowed to simultaneously measure the expression levels of millions of genes of a biological sample as well as annotate and identify the role (function) of those genes. Consequently, to better manage and organize this significant amount of information,
bioinformatics approaches have been developed. These approaches provide a representation and a more 'relevant' integration of data in order to test and validate the hypothesis of researchers throughout the experimental cycle. In this context, this article describes and discusses some of techniques used for the functional analysis of gene expression data.
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...rahulmonikasharma
Enormous generation of biological data and the need of analysis of that data led to the generation of the field Bioinformatics. Data mining is the stream which is used to derive, analyze the data by exploring the hidden patterns of the biological data. Though, data mining can be used in analyzing biological data such as genomic data, proteomic data here Gene Expression (GE) Data is considered for evaluation. GE is generated from Microarrays such as DNA and oligo micro arrays. The generated data is analyzed through the clustering techniques of data mining. This study deals with an implement the basic clustering approach K-Means and various clustering approaches like Hierarchal, Som, Click and basic fuzzy based clustering approach. Eventually, the comparative study of those approaches which lead to the effective approach of cluster analysis of GE.The experimental results shows that proposed algorithm achieve a higher clustering accuracy and takes less clustering time when compared with existing algorithms.
Genome-wide transcription profiling is a powerful technique in studying disease susceptible footprints. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. Past researches prove that neuro diseases damage the brain network interaction, protein- protein interaction and gene-gene interaction. A number of neurological research paper also analyze the relationship among damaged part. Analysis of gene-gene interaction network drawn by using state-of-the-art gene database of Alzheimer’s patient can conclude a lot of information. In this paper we used gene dataset affected with Alzheimer’s disease and normal patient’s dataset from NCBI databank. After proper processing the .CEL affymetrix data using RMA, we use the processed data to find gene interaction outputs. Then we filter the output files using probe set filtering attributes p-value and fold count and draw a gene-gene interaction network. Then we analyze the interaction network using GeneMania software.
ABSTRACT
Genome-wide transcription profiling is a powerful technique in studying disease susceptible footprints. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. Past researches prove that neuro diseases damage the brain network interaction, protein- protein interaction and gene-gene interaction. A number of neurological research paper also analyze the relationship among damaged part. Analysis of gene-gene interaction network drawn by using state-of-the-art gene database of Alzheimer’s patient can conclude a lot of information. In this paper we used gene dataset affected with Alzheimer’s disease and normal patient’s dataset from NCBI databank. After proper processing the .CEL affymetrix data using RMA, we use the processed data to find gene interaction outputs. Then we filter the output files using probe set filtering attributes p-value and fold count and draw a gene-gene interaction network. Then we analyze the interaction network using GeneMania software.
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...CSCJournals
Logistic Regression (LR) is a well known classification method in the field of statistical learning. It allows probabilistic classification and shows promising results on several benchmark problems. Logistic regression enables us to investigate the relationship between a categorical outcome and a set of explanatory variables. Artificial Neural Networks (ANNs) are popularly used as universal non-linear inference models and have gained extensive popularity in recent years. Research activities are considerable and literature is growing. The goal of this research work is to compare the performance of Logistic Regression and Neural Network models on publicly available medical datasets. The evaluation process of the model is as follows. The logistic regression and neural network methods with sensitivity analysis have been evaluated for the effectiveness of the classification. The Classification Accuracy is used to measure the performance of both the models. From the experimental results it is confirmed that the neural network model with sensitivity analysis model gives more efficient result.
Semantic Web for Health Care and Biomedical InformaticsAmit Sheth
Amit Sheth, "Semantic Web for Health Care and Biomedical Informatics," Keynote at NSF Biomed Web Workshop, Corbett, Oregon, December 4-5, 2007.
http://www.biomedweb.info/2007/
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly.
For more information:
http://societyofdatascientists.com/controlling-informative-features-for-improved-accuracy-and-faster-predictions-in-omentum-cancer-models/?src=slideshare
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
A Survey on Various Disease Prediction Techniquesijtsrd
An analysis of various diseases have been predicted using multiple data mining and text mining techniques. In this article we are going to discuss about 6 prediction techniques. Using gene expression pattern we predict the disease outcome and implementation of pathway based approach for classifying disease based on hyper box principles, we also present a novel hybrid prediction model with missing value imputation HPM-MI which analyze imputation using simple k-means clustering. A technique based on CCAR Constraint Class Association Rule has been used for reducing time consumption in prediction of a particular disease. We have discussed about text mining technique and their applications. Another technique has also been studied about hyper triglyceride mia from anthropometric measures which diverge according to age and gender. Using multilayer classifiers for disease prediction we can achieve high diagnosis accuracy and high performance. C. Leancy Jannet | G. V. Sumalatha "A Survey on Various Disease Prediction Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-6 , October 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18624.pdf
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification accuracies are then compared for these algorithms.This technique gives an accuracy of 98%.
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many
biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon
cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies
in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in
their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms
and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the
matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix
Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification
accuracies are then compared for these algorithms.This technique gives an accuracy of 98%
1. Chien-Wei (Masaki) Lin
1
Research Statement
During my past working experiences and current PhD program training, I have been working
on many methodology and collaborative researches in multi-omics high-throughput data (both
microarray and sequencing data). Inspired by the real data, I am particularly interested in
developing methodology in statistical genetics, bioinformatics, power and sample size
calculation tool, integrative/meta-analysis methods for different omics data type, and
supervised/unsupervised machine learning applications. For collaboration, I have been working
closely with scientists in many fields, such as cardiovascular epidemiology, psychiatry, and
cancer biology. I have experiences in various types of omics data, including single nucleotide
polymorphism (SNP), copy number variation (CNV), DNA methylation, gene expression,
proteomics (peptide), and metabolomics data. In the rest of the statement, I will summarize my
past and ongoing research projects and future research plans.
1. Methodology Research
1.1. Power calculation tool in sequencing data
Unlike earlier fluorescence-based technologies such as microarray, modelling of next generation
sequencing (NGS) data should consider discrete count data. In addition to sample size,
sequencing depth is also directly related to the experimental cost. Consequently, given total
budget and pre-specified unit experimental cost, the experimental design issue in NGS data is
conceptually a more complex multi-dimensional constrained optimization problem rather than
one-dimensional sample size calculation in traditional hypothesis setting.
Past and Ongoing Research Projects
• Power calculation tool in RNA-Seq and Methyl-Seq data [1, 2]
Most existing methods focus on single gene formula, that is, given the type-I error, effect
size, and sample size for one gene, and then derive the corresponding statistical power
based on a certain association test. However, in high-throughput data, statistical power
should be considered for thousands of genes simultaneously, where genome-wide
versions of type-I error and power are considered as false discovery rate and expected
discovery rate.
We proposed two statistical frameworks, namely ``SeqDesign'' [1] and ``MethylSeqDesign”
[2], to utilize pilot data for power calculation and experimental design of RNA-Seq and
Methyl-Seq experiments, respectively. The approach is based on mixture model fitting of
p-value distribution from pilot data and a parametric bootstrap procedure based on
approximated Wald test statistics to infer genome-wide power for optimal sample size and
sequencing depth. Our method contributed in the following ways:
2. Chien-Wei (Masaki) Lin
2
- Our method modeled count data adequately, and considers false discover rate to
control type-I error. By incorporating pilot data, our method can reflect the
characteristics of target experiments.
- We provided intuitive visualization tool to guide various practical experimental designs
for practitioners.
- We performed simulations and real data applications to evaluate the performance of
our methods and compared to existing methods. We showed that our method
outperforms the others.
Future Research Plan
The technology evolves rapidly. As new type of omics experiment develops, challenges for power
calculation and design issues will arise. I would like to explore possibilities to extend this statistical
framework to other type of omics data, such as single-cell experiments, in the future.
1.2. Meta-analysis and integrative analysis
Nowadays, many scientific findings suffer from low reproducibility/reliability, that is, the findings
could not be replicated in another cohort even under the same/similar experimental conditions,
which mostly due to complexity of omics data analysis and biological variation. In other words,
reliable findings across multiple studies are more desirable. Meta-analysis has been successfully
used to achieve this goal by combining effect sizes/p-values from multiple studies. Moreover,
based on central dogma of molecular biology, different types of omics data work jointly as a
system. Therefore, biological findings/conclusions that draw from multiple omics data have
higher reliability and interpretability by its nature.
More and more databases now become available to greatly facilitate these type of analysis; for
example, dbGaP for genetic variants data, NCBI GEO database for gene expression data, and
TCGA/ICGC database for pan cancer data.
Past and Ongoing Research Projects
• Meta-Analytic Robust Classifier [3]
In biomedical research, predicting disease diagnosis, prognosis or survival is an important
application. Robust and interpretable classifiers are usually favored for their clinical and
translational potential. The top scoring pair (TSP) algorithm is an example that applies a
simple rank-based algorithm to identify rank-altered gene pairs for classifier construction.
However, many classification algorithms suffer from low performance in cross-study
validation, including TSP. Hence, I participated to develop a meta-analytic top scoring pair
(MetaKTSP) framework to combine data from multiple transcriptomic studies and
generate a single robust prediction model. Our method has following conclusions:
- We conducted simulation analysis to compare with other popular single study based
methods. The results showed our method outperforms the others.
3. Chien-Wei (Masaki) Lin
3
- We showed that in real data, the biomarkers we identified from multiple studies have
robust prediction power and better biological interpretation.
• Integrative analysis of SNP and gene expression [4, 5]
It has been shown that SNPs are informative for tracing ancestral ethnicity of individuals.
However, it’s useful only for classifying distinct continental origins but cannot discriminate
individuals from closely related ancestral lineages. We found that gene expression data
also supplies ethnic information which is supplemental to SNPs [4]. Our contributions are
summarized as below:
- To the best of our knowledge, we are the first study that integrate SNP and gene
expression data to aid classification of subjects from closely related ethnic populations.
- By integrating SNP and gene expression data together, we can construct the ancestral
prediction model with a reduced number of markers and provide higher accuracy.
Expression quantitative trait loci (eQTL) analysis becomes popular because of its functional
meaning (SNP regulates gene expression). However, most of the analysis is single locus-
based analysis. We use partial least square (PLS) method to investigate the roles of gene-
based eQTL in ancestral ethnicity and pharmacogenetics [5]. We observed ancestry
information enriched in eQTL and can be used to construct prediction model to distinguish
subjects from close ethnic populations. Also, we identified 2 ancestry-informative eQTL
associated with adverse drug reactions and/or drug response.
Future Research Plan
Data science (a.k.a. “Big data”) is an emerging discipline. As a statistician, how to extract
information from these big data properly is an attractive topic to me. I am keen to develop
methods for meta-analysis in various types of statistical problems and for integrating various
types of data (including multi-omics data and brain imaging data).
1.3. General Bioinformatics Problems
Bioinformatics is an interdisciplinary field that provides useful software/tools to assist biologists
understand the biological data better and deeper. The applications are very wide and the needs
are getting higher and higher. For examples, database like TCGA and ICGC which provide
abundant data, and integrative tool like UCSC Xena which collects, analyzes, and visualizes the
data greatly facilitate this area.
Past and Ongoing Research Projects
• SNP array quality control [6]
Genome-wide SNP arrays contain hundreds of thousands of SNPs. The quality of arrays
plays an important role in downstream analysis. We define a quality index for each array
by quantifying the overall deviation of the individual-based allele frequencies from
reference frequencies. Our method can successfully detect poor-quality SNP arrays. A
4. Chien-Wei (Masaki) Lin
4
software called SAQC (written in R and R-GUI) is provided as a quality evaluation and
visualization tool.
• Meta-analysis suite for transcriptomic data [7]
There are many applications in transcriptomic data, for examples, finding genes
differentially expressed in different conditions/groups, quality assessment, clustering,
classification, biological pathway analysis. Many meta-analysis tools have been developed
for each different purpose. We have built a comprehensive suite called ``MetaOmics”
(written in R and Shiny) which provides interactive graphical user interface for biologists.
• Statistical metabolomics tool [8]
Metabolomics data provide opportunities to decipher metabolic mechanisms. Data quality
has been shown as concerns in metabolomics data and must be appropriately addressed
along with downstream statistical analysis. We developed a R tool called ``SMART”, which
can analyze input files with different formats, visually represent various types of data
features, implement peak alignment and annotation, conduct quality control for samples
and peaks, explore batch effects, and perform association analysis.
Future Research Plan
I am happy to work closely with biologists and learn from data to know what kind of tools they
need. It’s easy to foresee that data from genome-wide experiments will grow in quantity,
dimension and complexity. As a statistician, I am eager to develop useful statistical tools to help
biologists analyze, visualize and interpret the data.
Besides the topics I mention above, I am also interested in unsupervised/supervised machine
learning problems. Many machine learning algorithms have been proposed in different fields,
and I would like to explore if any of them could be applied in biological data.
2. Collaboration/Application Research
I have many collaboration opportunities with scientists in many different research areas, such as
cardiovascular epidemiology, psychiatry, and cancer biology. These experiences have been great
trainings for me to understand important biological questions and translate statistical languages
for biologists. I am eager to work with biologists to help them decipher the underlying
pathological mechanism and be inspired to develop useful statistical methodology.
Past and Ongoing Research Projects
• Age effect in human orbitofrontal cortex [9, 10, 11, 12, 13]
We performed genotyping and gene expression analyses on two brain regions (BA47 and
BA11) of 209 healthy postmortem brain samples (in age from 16 to 91) to investigate
molecular mechanisms and genetic modulation in brain during aging process [9]. We
defined a ``delta age” measure as the individual deviation in molecular age from
chronological age, which reflects accelerated or delayed aging for each individual. Finally,
5. Chien-Wei (Masaki) Lin
5
we performed GWAS analysis and developed a polygenic risk score to investigate genetic
modulation that predicts delta age.
The same dataset was used to investigate other aspects as well. We conducted isoform-
specific analysis on KALRN gene (involved in regulation of the actin cytoskeleton within
dendrites) [10]. The overexpression of two isoforms, KAL9 and KAL12, were hypothesized
to associated with age. Our analysis indicated the age effect is significant, but modest. Also,
our work concluded that global KALRN expression analysis might be misleading and future
studies should focus on isoform-specific quantification.
We also investigated the VSNL1 gene, which is a peripheral biomarker for Alzheimer
disease (AD). We found VSNL1 was significantly co-expressed with genes in pathways for
calcium signaling, AD, long-term potentiation, long-term depression, and trafficking of
AMPA receptors [11]. These findings provide an unbiased link between VSNL1 and
molecular mechanisms of AD, including pathways implicated in synaptic pathology in AD.
Another gene, FREM3 was known to be associated with major depression disorder (MDD)
in GWAS study. We investigated how the nearby SNPs affect the FREM3 brain gene
expression level [12]. Our work suggested that common genetic variation associated with
reduced FREM3 expression may confer risk for accelerated aging.
BDNF and SST expression level are known to decrease robustly with age, and lower
expression level of both genes have been observed in many brain disorders. However, the
underlying mechanism that decreases the expression level of both genes is unknown, and
our work suggests DNA methylation may be the proximal mechanism [13]. On the other
hand, there are a consistent set of age-related genes, and we have another work that
extends the view from these two genes to global age-related genes [14]. And again, we
investigated if DNA methylation is the underlying mechanism for those genes undergo age-
related changes.
• Major Depressive Episode/Disorder [15, 16, 17, 18]
We proposed a gene coexpression analysis to investigate biological pathways associated
with antidepressant treatment response predisposition and regulation by microRNAs in
major depressive episode (MDE) samples and control samples [15]. Our work underlines
the importance of inflammation-related pathways and the involvement of a large miRNA
program as biological processes associated with antidepressant treatment response.
We investigated the neurobiological abnormalities related to late-life depression (LLD) by
peripheral proteomic panel and structural brain imaging for LLD patients and control
samples [16]. We found differential expressed proteins are enriched in biological pathways
related to abnormal immune-inflammatory control, cell survival and proliferation,
proteostasis control, lipid metabolism, and intracellular signaling, which increase brain and
systemic allostatic load leading to the downstream negative outcomes of LLD.
Using the same proteomic dataset, we investigated if a systemic molecular pattern
associated with aging (senescent-associated secretory phenotype [SASP]) is elevated in
adults with LLD [17]. Our result suggests that individuals with LLD display enhanced aging-
6. Chien-Wei (Masaki) Lin
6
related molecular patterns that are associated with higher medical comorbidity and worse
cognitive function.
In another study, we have 11 transcriptomic datasets from human postmortem brains with
MDD [18]. We developed a meta analytic clustering method to identify coexpression
modules across 11 studies. We further incorporated the information from GWAS studies
of brain disorders, and we identified a module consistently and significantly associated
with MDD and other complex brain disorders. Our work demonstrates the importance of
integrating transcriptome data and incorporating GWAS results to decipher the molecular
pathology of MDD and other complex brain disorders.
Future Research plan
In my future research, I will keep actively seeking for collaboration opportunities from local
biologists. I am particularly interested in working on application problems that also inspire my
methodology research.
References
[1] Lin, C.-W.*
, Liao, G.*
, Lee, M. L. T., Park, Y., & Tseng, G. C. (2017). SeqDesign: A framework
for RNA-Seq genome-wide power calculation and experimental design issues. (in
preparation)
[2] Lin, C.-W.*
, Liu, P.*
, Park, Y., & Tseng, G. C. (2017). MethylSeqDesign: A framework for
Methyl-Seq genome-wide power calculation and experimental design issues. (in
preparation)
[3] Kim, S., Lin, C.-W., & Tseng, G. C. (2016). MetaKTSP: A Meta-Analytic Top Scoring Pair
Method for Robust Cross-Study Validation of Omics Prediction Analysis. Bioinformatics,
32(March), btw115.
[4] Yang, H.-C., Wang, P.-L., Lin, C.-W., Chen, C.-H., & Chen, C.-H. (2012). Integrative analysis
of single nucleotide polymorphisms and gene expression efficiently distinguishes samples
from closely related ethnic populations. BMC Genomics, 13(1), 346.
[5] Yang, H.-C., Lin, C.-W., Chen, C.-W., & Chen, J. (2014). Applying genome-wide gene-based
expression quantitative trait locus mapping to study population ancestry and
pharmacogenetics. BMC Genomics, 15(1), 319.
[6] Yang, H.-C., Lin, H.-C., Kang, M., Chen, C.-H., Lin, C.-W., Li, L.-H., … Pan, W.-H. (2011). SAQC:
SNP array quality control. BMC Bioinformatics, 12, 100.
[7] Ma, T., Huo, Z., Kuo, A., Zeng, X., Zhu, L., Fang, A., Wang, L., Lin, C. W., Rahman, T., Liu, S.,
Park, Y., Kim, S., Li, J., Chang, L. C., Song, C., & Tseng, G. C. (2017) MetaOmics - a
Comprehensive Software Suite with Interactive Visualization for Transcriptomic Meta-
Analysis. (in preparation)
7. Chien-Wei (Masaki) Lin
7
[8] Liang, Y. J., Lin, Y. T., Chen, C. W., Lin, C. W., Chao, K. M., Pan, W. H., & Yang, H. C. (2016).
SMART: Statistical Metabolomics Analysis - An R Tool. Analytical Chemistry, 88(12), 6334–
6341.
[9] Lin, C.-W., Chang, L. C., Ma, T., Oh, H., Tseng, G. C., Lewis, D. A., & Sibille, E. (2017). Genetic
Modulation of Brain Molecular Aging. (In preparation)
[10] Grubisha, M. J.*
, Lin, C.-W.*
, Tseng, G. C., Penzes, P., Sibille, E., & Sweet, R. A. (2016). Age-
dependent increase in Kalirin-9 and Kalirin-12 transcripts in human orbitofrontal cortex.
European Journal of Neuroscience, 44(7), 2483–2492.
[11] Lin, C. W., Chang, L. C., Tseng, G. C., Kirkwood, C. M., Sibille, E. L., & Sweet, R. A. (2015).
VSNL1 co-expression networks in aging include calcium signaling, synaptic plasticity, and
Alzheimer’s disease pathways. Frontiers in Psychiatry, 6(MAR), 30.
[12] Nikolova, Y. S., Iruku, S. P., Lin, C.-W., Conley, E. D., Puralewski, R., French, B., … Sibille, E.
(2015). FRAS1-related extracellular matrix 3 (FREM3) single-nucleotide polymorphism
effects on gene expression, amygdala reactivity and perceptual processing speed: An
accelerated aging pathway of depression risk. Frontiers in Psychology, 6(September), 1377.
[13] McKinney, B. C., Lin, C.-W., Oh, H., Tseng, G. C., Lewis, D. A., & Sibille, E. (2015).
Hypermethylation of BDNF and SST Genes in the Orbital Frontal Cortex of Older
Individuals: A Putative Mechanism for Declining Gene Expression with Age.
Neuropsychopharmacology : Official Publication of the American College of
Neuropsychopharmacology, 40(11), 2604–13.
[14] McKinney, B. C.*
, Lin, C.-W.*
, Oh, H., Tseng, G. C., Lewis, D. A., & Sibille, E. (2017). DNA
Methylation in the Human Frontal Cortex Reveals a Putative Mechanism for Age-by-
Disease Interactions. (In preparation)
[15] Belzeaux, R., Lin, C.-W., Ding, Y., Bergon, A., Ibrahim, E. C., Turecki, G., … Sibille, E. (2016).
Predisposition to treatment response in major depressive episode: A peripheral blood
gene coexpression network analysis. Journal of Psychiatric Research, 81, 119–126.
[16] Diniz, B. S., Lin, C. W., Sibille, E., Tseng, G., Lotrich, F., Aizenstein, H. J., … Butters, M. A.
(2016). Circulating biosignatures of late-life depression (LLD): Towards a comprehensive,
data-driven approach to understanding LLD pathophysiology. Journal of Psychiatric
Research, 82, 1–7.
[17] Diniz, B. S., Reynolds, C. F., Sibille, E., Lin, C.-W., Tseng, G., Lotrich, F., … Butters, M. A.
(2016). Enhanced Molecular Aging in Late-Life Depression: the Senescent Associated
Secretory Phenotype. The American Journal of Geriatric Psychiatry.
[18] Chang, L. C., Jamain, S., Lin, C. W., Rujescu, D., Tseng, G. C., & Sibille, E. (2014). A conserved
BDNF, glutamate- and GABA-enriched gene module related to human depression
identified by coexpression meta-analysis and DNA variant genome-wide association
studies. PLoS ONE, 9(3), e90980.