SlideShare a Scribd company logo
1 of 38
Download to read offline
Markov Chains Analyzing Dataset of Patients with
Metabolic Associated Fatty Liver Disease
Iman M. Attia  (  imanattiathesis1972@gmail.com )
Institute of Statistical Studies and Research, Cairo University https://orcid.org/0000-0002-7333-9713
Data Note
Keywords: non-alcoholic fatty liver disease, metabolic associated fatty liver disease, steatohepatitis,
continuous time Markov chains, mean sojourn time, life expectancy
Posted Date: July 1st, 2022
DOI: https://doi.org/10.21203/rs.3.rs-1810831/v1
License:   This work is licensed under a Creative Commons Attribution 4.0 International License.  
Read Full License
Markov Chains Analyzing Dataset of Patients
with Metabolic Associated Fatty Liver Disease
Iman M. Attia *
*Department of Mathematical Statistics, Faculty of Graduate Studies for Statistical Research, Cairo University, Egypt
Corresponding author : Iman M. Attia (imanattiathesis1972@gmail.com ,imanattia1972@gmail.com )
Abstract
The prevalence of obesity and type 2 diabetes has reached epidemic levels that
parallel the rates of the widely distributed non-alcoholic fatty liver disease (NAFLD).
Nearly one billion people worldwide suffered from NAFLD. The estimated annual
medical costs for NAFLD exceed €35 billion in four large European countries (the
United Kingdom, France, Germany, and Italy) and $100 billion in the United States.
According to the American Association for the study of liver disease, NAFLD requires
the presence of hepatic steatosis in more than 5 % of hepatocytes detected by histology
or imaging with little consumption of alcohol and exclusion of other causes of chronic
liver diseases.
The risk factors for NAFLD are age>45, males are more susceptible than females,
ethnicity; the Hispanics have more prevalent rates than the whites who are more
susceptible than the blacks, ingestion of high fat and high cholesterol diet, genetic
backgrounds like patatin-like phospholipase domain-containing protein 3 (PNPLA3)
gene which is most prevalent in Hispanics followed by non-Hispanics whites and African
Americans, and features of metabolic syndrome.
The newly proposed name is metabolic associated fatty liver disease (MAFLD).
This new definition requires evidence of hepatic steatosis as previously mentioned plus
one of three features: obesity or overweight (BMI > 25 kg/m2 in white and > 23 kg/m2 in
Asian Individuals), type 2 diabetes, or lean or normal weight with evidence of metabolic
dysregulation. For the definition of metabolic dysregulation , at least two risk metabolic
risk factors should be present. These factors are waist circumference ≥ 102cm for
males and ≥ 88cm for females in the western countries, while for the Asian and Eastern
males and female , it is ≥ 90 cm and ≥ 80 cm respectively, prediabetes, homeostasis
model assessment of insulin resistance (HOMA-IR) ≥ 2.5 ,elevated high-sensitive serum
C-reactive protein(CRP) denoting inflammation, elevated blood pressure or specific
drug treatment, decreased high-density lipoprotein (HDL) cholesterol levels, and
increased plasma triglycerides or drug treatment. The pathogenesis of this disease
process can be explained by the “two-hit theory” which is updated to the “multiple or
parallel hit theory”. The first hit is initiated by liver fat content exceeding five percent of
the total hepatocytes and concomitant insulin resistance. This fatty liver is more
vulnerable to the second hit, inflammation, and necrosis (death of cells). This
inflammation is called steatohepatitis which stimulates fibrosis. Other hits that augment
this steatohepatitis are the interactions of the genetic and environmental factors and the
cross-talk between different organs and tissues like the adipose tissue, the pancreas,
the gut (microbiota), and the liver. Liver biopsy, although invasive and has some
limitations like sampling error, hospital admission, elevated costs, and obseobserver-
dependents the gold standard method for diagnosis. Rigorous control of risk factors with
lifestyle modifications by reducing the caloric intake and exercises can protect the liver.
The newly emerging anti-fibrotic and anti-inflammatory drugs are promising to reduce
the histo-pathological picture of the disease.
Key words: non-alcoholic fatty liver disease, metabolic associated fatty liver disease,
steatohepatitis, continuous time Markov chains, mean sojourn time, life expectancy.
Specification Table
Subject Medicine, Hepatology, Endocrinology, Diabetes, Obesity
Specific subject area Biostatistics, Epidemiology, HealthCare Science
Type of Data Tables & figures, excel workbook, MATLAB codes.
How the data were acquired It is a factitious data. It is a depiction of how can the data be
in reality.
Data Format Raw data represented by the findings of liver biopsy recorded
during each visit.
Processed data with a MATLAB code to calculate the
transition counts in each time interval.
Analyzed data with homogenous continuous time Markov
chains.
Description of the data This is a factitious dataset of 310 participants suffering from
risk factors of NAFLD like: type 2 diabetes,
hypercholesterolemia, hyperglyceridemia, obesity,
hypertension, acting separately or together as a metabolic
syndrome. These participants were followed-up for a total
period of eight years.
Parameters of the dataset The inclusion criteria for the patients are clinical,
biochemical, and radiological evidence of insulin resistance,
type 2 diabetes, hypercholesterolemia, hypertriglyceridemia,
obesity, hypertension.
The exclusion criteria for the patients are clinical,
biochemical, and radiological evidence of hepatitis B or C
infection, primary biliary cirrhosis, primary sclerosing
cholangitis, autoimmune hepatitis, Wilson disease,
heamochromatosis, alpha one antitrypsin deficiency, celiac
disease, Drug-intake, and alcohol consumption.
Data source location The data are factious data to give insight and depiction
for epidemiological and clinical studies and to illustrate
the implementation of the mathematical and statistical
model of homogenous type continuous time Markov
chains to analyze these data.
Data accessibility Within the article.
The data is also present on the IEEE Data Port site with
URL:
https://ieee-dataport.org/documents/ctmc-analyzing-
nafld-progression-small-model
With DOI: 10.21227/az1b-x326
Also the MATLAB codes is present on CodeOcean site
with ULR :
https://codeocean.com/capsule/8641183/tree/v2
DOI: 10.24433/CO.6022979.v2
Related research article This dataset was mentioned as supplementary material
in the article: “Novel Approach of Multistate Markov
Chains to Evaluate Progression in the Expanded Model
of Non-Alcoholic Fatty Liver Disease”. Frontiers in
applied mathematics and statistics,7
https://www.frontiersin.org/article/10.3389/fams.2021.766085
to illustrate the comparison between the simplest model
and the expanded model of the disease process.
Value of the data
1- This dataset can give insight to the behavior of the NAFLD. Statistical analysis of this
dataset provides the following statistical indices: transition rate matrix, transition
probability matrix, the mean sojourn time in each state, the state probability distribution
at specific time point, the life expectancy of the patient in each state, the stationary
probability distribution of the disease process, and the expected number of patients in
each state at specific time point.
2- The previously mentioned statistical indices can be offered to the healthcare policy
makers and medical insurance managers to allocate human and financial resources to
investigate and treat patients with different phenotypes of NAFLD. These indices can be
provided to the epidemiologist to estimate the prevalence and incidence of the NAFLD
cases [1]. These indices can also be presented to the pharmaceutical companies to
assess the effectiveness of the anti-fibrotic and anti-inflammatory drugs used for
treatment of the NAFLD patients. These indices can supply the physicians with the
proposed strategies to formulate the protocols to treat patients. Also the nutritionists can
get a great benefit from these indices to release new food stuffs that are healthy,
delicious, and tasty in the same time in attempt to prevent the disease occurrence. All
the above persons are urged to help the community to reduce the prevalence of this
disease.
3- This dataset can be reproduced in different communities and populations with
different ethnicity backgrounds to study the behavior of the disease.
4- The statistical indices obtained from the analysis of this dataset help in pharmaco-
economic evaluation. Information like prediction of the expected number of patients in
each state at specific time point in addition to the knowledge of costs of investigating
and treating each patients help assess the total costs and economic burden of the
disease. The three major categories of this pharmaco-economic evaluation are the
cost-benefit analysis, cost-effectiveness analysis, and cost-utility analysis. This
evaluation is permissible with the statistical indices supplied by analysis of this dataset.
1. Data description
In one of the governmental healthcare unit, three hundreds and ten patients were
subjected to clinical, biochemical, and radiological examination. These examinations
were done to include data of patients with overweight or obesity, type 2 diabetes, lean
subjects with metabolic dysregulations [2], and metabolic syndrome [3] . They were also
done to exclude data of subjects with chronic liver diseases like hepatitis B and C
infections, autoimmune diseases like primary biliary cirrhosis, autoimmune hepatitis,
primary sclerosing cholangitis, hereditary diseases like hemochromatosis and Wilson
disease, genetic diseases like alpha-one antitrypsin deficiency disease, and other
causes like celiac disease. The diagnostic criteria for metabolic syndrome are the
presence of abdominal adiposity distinguished by wasit circumference > 94 for males
and > 80 cm for females in eastern countries, it is > 102 cm for the males and > 88 cm
for females in the western countries, plus two or three of the following criteria: fasting
blood glucose ≥ 100 mg/dL or drug treatment , arterial blood pressure ≥ 130/85 mmHg
or drug treatment, triglycerides level ≥ 150 mg/dL or drug treatment, and HDL
cholesterol levels < 40 mg/dL for females and < 50 mg/dL for males or drug treatment .
Characteristics of participants with metabolic dysregulations, as previously defined in
the abstract, were included in the dataset. [4]
The biochemical tests were fasting blood glucose (FBG), serum insulin,
homeostatic model assessment of insulin resistance (HOMA-IR), serum alkaline
phosphatase (ALP), serum alanine aminotransferase (ALT), serum aspartate
aminotransferase(AST), gamma-glutamyl-transpeptidase (GGT), serum albumin, serum
creatinine and blood urea nitrogen, international normalized ratio (INR), hemoglobin,
platelet count and red blood cell count, total cholesterol, low-density lipoprotein
cholesterol (LDL-Chol), high-density lipoprotein cholesterol (HDL-Chol), serum
triglyceride level, laboratory tests to exclude Hepatitis B and C antigenaemia like
antibodies against hepatitis B surface antigen (HBsAg) and hepatitis C virus Antibodies
(HCVAb), autoantibodies , serum copper and ceruloplasmin , serum iron , serum ferritin
and transferrin saturation, serum alpha-one antitrypsin levels, and C-reactive
protein(CRP). Only patients with features of obesity, type 2 diabetes and lean persons
with metabolic derangements as previously mentioned were included. Also data of
patients with metabolic syndrome, as previously defined, were included. Alcohol
consumption should be less than daily 20 gram for female and less than daily 30-40
gram for males. Participants on drugs like corticosteroid, amiodarone, or any other
drugs that induce NASH were excluded.
NAFLD process is a dynamic process as defined in the abstract. Fig.1 clarifies this
process [5] .
Fig. 1 Dynamic model of the NAFLD.
Non-alcoholic fatty liver (NAFL) phenotype is characterized by the presence of hepatic steatosis or the
presence of hepatic steatosis plus either hepatic ballooning or hepatic inflammation. If the risk factors
inducing this phenotype are not treated, the patient passes to the more aggressive phenotype non-
alcoholic steatohepatitis (NASH) which is characterized by the presence of steatosis, hepatic
ballooning and inflammation of any grade. If the risk factors contributing to its presence are rigorously
treated and well controlled, the patient can regress to the less severe form (NAFL) or be cured. But if
these factors are left untreated, the NASH will induce fibrogenesis. So, finding NAFL or NASH on initial
liver biopsy does not impact the course of the disease. NAFL patients have the lowest risk for fibrosis
progression than NASH patients. As seen from the figure, in the early stage of the disease, the patient
cycles between NAFL and NASH. Regardless, the biopsy findings are NAFL or NASH, about 80% of
them are slow progressors and they are unlikely to progress further beyond mild fibrosis (F0 to
F2).They nearly evolve to F0 or F1 over 8 years. Approximately 20 % of NASH patients are rapid
progressors and they develop severe fibrosis (F3 to F4) within a few years about 2 to 6 years.
For each participant eligible for the study, liver biopsy was done to record the
findings of the biopsy as defined by Bedossa et al. (2012) algorithm [6]. Fig.2 illustrates
this algorithm. This algorithm defines the absence of NAFLD by the presence of
steatosis at stage 0. For NAFLD to be defined, it requires the presence of steatosis at
any stage. The main two phenotypes of the disease are the non-alcoholic fatty liver
(NAFL) and the non-alcoholic steatohepatitis (NASH). NAFL requires the mandatory
presence of steatosis at any stage plus one of two: the presence of hepatocyte
ballooning of any stage or inflammatory cells of any stage. For NASH to be established,
this NASH requires the presence of steatosis at any stage in addition to the presence of
the other two elements of hepatocyte ballooning and inflammatory cells at any stages.
Table 1 illustrates the comparison between NAFLD activity score (NAS) and the SAF
score, fibrosis score is defined in addition to the scoring system of activity that is
composed of steatosis, ballooning and inflammation. This fibrosis score is almost the
same in both NAS and SAF scores. [7]
Fig. 2 Bedossa et al.2012 algorithm
NAFLD is characterized by three main histopathologic features: steatosis, liver injury in the form of
hepatic ballooning and inflammation (steatohepatitis, NASH), and fibrosis. Absence of steatosis
(steatosis grade=0) excludes NAFLD. The presence of hepatic steatosis is a mandatory precedent to
establishment of NAFLD. The presence of steatosis at any grade plus one of the following: hepatic
ballooning or inflammation point to the diagnosis of NAFL. The presence of the three elements
(steatosis, ballooning, and inflammation) indicates NASH.
Table 1a.
Table 1 b.
Table 1.a & b comparison between the NAS and SAF score
NAFLD activity score (NAS) proposed by U.S. national institutes of health-sponsered NASH CRN
gathers the assessment of steatosis, inflammation, and ballooning to create NAS ranging from 0 to 8
points and a distinct fibrosis score ranging from 0 to 4. It is a useful and beneficial research tool for
use in clinical trials but it is not a suitable prognostic tool to use in clinical practice. As a result and to
avoid these limitations in routine clinical practice, an important called the steatosis-activity-fibrosis
(SAF) has been developed. Using SAF; steatosis, activity, and fibrosis are assessed apart from each
other and then an algorithm is implemented to categorize biopsies into one of the three diagnostic
groups: normal, NAFL, NASH. The fibrosis score in both system scores is the same.
The participants were scheduled to be followed-up every year. But not all of them
followed this schedule. Some of them abided to the follow-up period which was every
year. Others during their course of the follow-up showed up every two years or even
every three years. The overall period of the follow-up was nine years.
Although liver biopsy has limitations as previously stated in the abstract, analysis
of the dataset was concerned with the liver biopsy findings, because it is the gold
standard for diagnosis. This analysis describes the simplest model of “health, disease,
death” process utilizing the homogenous type of continuous time Markov chains. [8]
During each visit, liver biopsy findings were recorded for each participant.
According to these findings the recording values were the states of the Markov model
used in the statistical analysis. State 1 described susceptible participants with no biopsy
findings suggesting diagnosis of NAFLD. State 2 described cases with biopsy findings
suggesting diagnosis of NAFL or NASH. State 3 described death state due to
complications of liver disease process. State 4 described death state due to causes
unrelated to liver disease. The transition from state 1 to state 2 occurs at a rate called
lambda12 or 𝜆12 . The transition from state 2 to state 1 occurs at a rate called mu21 or
𝜇12 . The transition from state 2 to state 3 occurs at a rate called lambda23 or 𝜆23 . The
transition from state 1 to state 4 occurs at a rate called lambda14 or 𝜆14 [9]. Fig.3
demonstrates the general model structure of Markov Chains for this disease process
[10].
Fig. 3 general model structure for NAFLD
Table 2 summarizes the liver biopsy findings in each time point of the follow-up
period for each participant. For example, participant with ID=1, in the first year of the
follow-up at t=0, he was in state 1. After one year at t=1, he was in 1 state. After another
one year at t=2, he was in state 2. After one year at t=3, he was in state 2, then he
stopped to visit the clinic. Participant with ID=2, in the first year of the follow-up at t=0,
he was in state 1. After one year at t=1, he was in 1 state. After another one year at t=2,
he was in state 2. After one year at t=3, he was in state 1. After one year at t=4, he did
not show up or he did not visit the clinic. After another year at t=5, he was in state 2.
After one year later at t=6, he did not show up or visit the clinic. After one year at t=7, he
was in state 1 then he stopped to visit the clinic. And so on for other participants.
Table 2: liver biopsy findings for each participant during each visit of the follow-up period
time
Patient
ID
time
Patient
ID 8
7
6
5
4
3
2
1
0
8
7
6
5
4
3
2
1
0
4
2
2
1
1
1
53
2
2
1
1
1
2
2
1
54
1
2
1
2
1
1
2
2
2
1
55
3
1
3
4
2
2
1
1
1
1
56
1
1
1
1
4
2
2
1
57
2
2
1
2
1
5
4
2
2
1
1
1
1
1
1
58
2
2
1
2
1
6
2
2
2
1
59
2
2
1
2
1
7
3
1
1
1
1
60
3
1
8
3
1
61
4
2
2
1
1
1
9
4
2
2
1
1
1
62
3
1
10
2
2
2
1
1
1
1
1
63
1
2
1
1
1
11
2
2
2
2
1
1
64
3
1
12
2
2
2
1
1
1
1
65
1
1
1
13
2
2
2
1
1
66
3
1
14
2
2
2
1
1
1
1
67
1
1
1
1
1
15
2
2
2
1
68
2
1
1
1
1
1
1
1
1
16
4
2
2
1
69
4
2
2
1
1
17
1
1
1
1
1
70
2
1
1
1
1
18
1
1
1
1
1
1
71
1
2
1
1
1
1
1
19
2
2
1
1
1
1
1
1
72
1
1
1
1
1
1
1
20
2
2
2
1
73
3
1
1
21
2
2
2
1
74
4
2
2
1
1
22
3
1
1
1
75
3
1
23
3
1
1
1
76
3
1
24
2
2
2
2
2
2
2
1
77
3
1
25
1
1
1
1
1
1
78
3
1
26
2
2
2
2
1
1
1
1
79
3
1
27
3
1
80
4
2
2
1
1
28
3
1
81
4
1
29
2
2
1
1
1
82
4
1
30
2
2
2
1
1
1
1
1
83
4
1
31
2
2
1
1
1
1
1
84
4
2
2
1
1
32
3
2
2
2
2
2
2
1
85
4
2
2
1
33
4
2
2
1
86
2
2
2
1
34
1
1
1
1
1
87
2
2
2
1
35
1
1
1
1
1
88
3
1
36
1
1
1
1
1
89
3
1
37
3
2
2
1
1
1
1
90
4
2
2
1
1
38
3
2
2
1
1
1
1
1
91
4
1
39
1
1
1
1
1
92
4
1
40
1
1
1
1
1
93
4
1
41
3
1
1
1
94
4
2
2
1
1
1
42
3
2
2
1
1
1
1
95
4
1
43
3
1
1
1
1
96
4
1
44
1
1
1
1
1
1
1
97
4
1
45
3
2
2
1
98
3
1
1
46
3
2
2
1
1
1
99
4
1
47
3
2
2
1
1
1
100
4
1
48
3
2
2
1
1
1
1
101
4
2
2
1
1
1
1
1
49
3
2
2
1
102
2
2
1
1
1
1
1
1
50
3
1
1
1
103
3
1
51
3
1
1
1
104
3
1
52
time
Patient
ID
time
Patient
ID 8
7
6
5
4
3
2
1
0
8
7
6
5
4
3
2
1
0
3
2
2
1
1
1
1
1
157
3
1
1
1
105
3
2
2
1
158
3
1
106
2
2
1
159
3
2
2
1
1
1
1
1
107
3
2
2
1
1
1
160
1
1
1
1
1
108
3
2
2
1
1
1
161
3
2
2
1
1
1
1
109
3
2
2
1
1
1
1
1
162
3
1
1
1
1
1
110
2
2
1
163
3
2
2
1
111
2
2
1
1
1
164
3
1
1
1
1
112
2
2
1
165
3
2
2
1
113
3
2
2
1
1
1
166
3
1
1
1
114
2
2
1
167
3
1
1
1
115
1
1
1
1
1
1
168
4
2
2
1
116
3
2
2
2
1
169
3
1
1
1
117
2
2
2
1
170
1
1
1
1
1
1
118
3
2
2
1
1
171
2
2
1
1
1
1
1
119
2
2
2
2
2
1
172
1
1
1
1
1
120
3
2
2
1
173
1
1
1
1
1
1
121
2
2
1
174
3
1
1
1
1
122
2
2
2
2
1
1
175
2
1
1
1
1
1
1
123
2
2
1
176
2
2
1
1
1
1
1
1
124
2
2
2
2
2
1
177
4
2
2
1
125
3
2
2
1
178
4
2
2
1
126
2
2
1
1
1
179
4
2
2
1
1
1
1
1
127
3
2
2
2
2
1
180
2
2
1
1
1
1
1
128
3
2
2
1
181
3
2
2
1
1
129
3
1
182
2
2
1
130
3
1
183
3
1
1
1
131
2
2
2
1
1
1
1
1
184
2
2
1
132
3
2
2
2
2
1
185
1
1
1
1
1
1
133
2
2
1
186
2
2
1
134
2
2
2
2
1
1
1
1
187
2
2
1
135
2
2
1
188
3
2
2
1
1
1
136
3
2
2
1
1
1
189
2
2
1
137
3
2
2
1
1
1
1
190
2
2
2
2
2
1
138
3
2
2
2
1
191
1
1
1
1
1
139
2
2
1
192
2
2
2
2
2
1
140
2
2
1
1
193
2
2
1
1
1
1
1
1
1
141
2
2
1
194
1
1
1
1
1
142
3
2
2
1
1
1
1
195
3
1
1
1
1
1
143
2
2
1
196
1
1
1
1
1
144
2
2
1
197
3
2
2
2
2
1
1
1
145
3
2
2
1
1
1
198
1
1
146
2
2
1
199
3
2
2
2
2
1
1
1
147
3
2
2
1
1
1
200
1
1
1
1
1
148
2
2
1
201
2
2
1
1
1
1
1
149
3
1
1
1
1
202
1
1
1
1
1
150
2
2
1
203
1
1
1
1
1
151
3
1
1
1
1
1
1
204
3
2
2
1
1
1
1
152
2
2
1
205
3
1
153
3
2
2
1
1
1
1
206
3
2
2
1
154
2
2
1
207
3
2
2
1
155
1
1
1
1
1
1
1
208
1
1
1
1
1
156
time
Patient
ID
time
Patient
ID 8
7
6
5
4
3
2
1
0
8
7
6
5
4
3
2
1
0
2
2
1
261
3
2
2
1
209
2
1
262
2
2
1
210
2
1
263
2
2
1
1
1
1
211
2
2
1
264
3
2
2
1
212
2
1
265
3
2
2
1
1
1
213
2
1
266
2
2
1
214
2
1
267
3
2
2
1
1
1
1
215
2
2
1
268
3
2
2
1
216
2
1
269
3
2
2
1
217
2
1
270
2
2
1
218
2
1
271
2
2
1
219
2
1
272
3
2
2
1
1
1
220
2
1
273
3
2
2
1
1
1
1
1
221
2
1
274
3
2
2
1
222
2
1
275
3
2
2
1
223
3
1
276
3
2
2
1
1
224
3
1
277
3
1
225
3
1
278
3
2
2
1
1
1
226
3
1
279
3
1
227
3
1
280
3
2
2
1
1
1
1
1
228
3
1
281
3
2
2
1
229
3
1
282
3
1
230
3
1
283
2
2
1
1
1
1
1
231
3
1
284
2
2
1
232
3
1
285
3
1
1
1
1
1
1
1
233
4
1
286
3
2
2
1
234
1
1
287
2
2
1
235
1
1
288
3
1
1
1
1
236
1
1
1
289
2
2
1
237
2
1
1
290
3
2
2
1
1
1
238
2
1
1
291
2
2
1
239
3
1
1
292
2
2
1
1
1
1
1
240
1
1
1
293
2
2
1
241
3
1
1
294
3
2
2
1
1
242
2
1
1
295
2
2
1
243
3
1
1
296
2
2
1
244
1
1
1
297
1
1
2
1
1
245
2
1
1
298
2
1
246
3
1
1
299
3
2
1
1
247
1
1
1
300
2
1
248
2
1
1
301
4
2
1
1
249
3
1
1
302
3
2
1
250
3
1
1
303
2
1
251
2
1
304
3
2
1
252
2
1
305
3
2
1
253
2
1
306
4
1
254
3
1
307
2
2
1
255
4
1
308
2
2
1
256
4
1
309
2
2
1
257
4
1
310
2
1
258
2
1
259
2
1
260
2. Experimental Design, Materials and Methods
2.1. Analysis of the counts during various lengths of observation period
The interval between the visits was one year interval, two years interval and three
years interval. The transition counts, among the different states, for each of these
intervals were calculated using a MATLAB code (See supplementary materials). For
example, during the time interval equals one year, the transitions from state 1 to state 1
(green arrow in table 2) were calculated, the transitions from state 1 to state 2 (red
arrows in table 2) were calculated, transitions from state 1 to state 3 (blue arrows in
table 2) were calculated, and the transitions from state 1 to state 4 (orange arrow in
table 2) were calculated. And this was done for each transition among the states for
each interval [11]. Tables 3, Fig. 4, and Fig. 5 demonstrate the transition counts
distribution during the time interval Δ𝑡 = 1 year.
Tables 4, Fig. 6, and Fig. 7 demonstrate the transition counts distribution during the time
interval Δ𝑡 = 2 year.
Tables 5, Fig. 8, and Fig. 9 demonstrate the transition counts distribution during the time
interval Δ𝑡 = 3 year.
Tables 6, Table 7, Fig. 10, and Fig. 11 demonstrate the transition counts distribution
during the whole period of the follow-up.
Table (3) demonstrates the observed counts of transitions during time interval ∆𝒕 = 𝟏 𝒚𝒆𝒂𝒓
To State 1 To State 2 To State 3 To State 4 Total counts
Transitions out form State 1 330 163 45 12 550
Transitions out form State 2 5 185 45 15 250
Transitions out form State 3 0 0 0 0 0
Transitions out form State 4 0 0 0 0 0
Total 335 348 90 27 800
Fig. 4 total marginal counts of transitions from state1 and state2 during time interval Δ𝑡 = 1 year
Fig. 5 shows the transition counts from state 1 and state 2 to the other states during time interval
Δ𝑡 = 1 year
Table (4) demonstrates the observed counts of transitions during time interval ∆𝒕 = 𝟐 𝒚𝒆𝒂𝒓𝒔
To State 1 To State 2 To State 3 To State 4 Total counts
Transitions out form State 1 70 30 10 1 111
Transitions out form State 2 2 20 13 4 39
Transitions out form State 3 0 0 0 0 0
Transitions out form State 4 0 0 0 0 0
Total 72 50 23 5 150
Fig.6 total marginal counts of transitions from state1 and state2 during time interval Δ𝑡 = 2 years
Fig. 7 shows the transition counts from state 1 and state 2 to the other states during time interval
Δ𝑡 = 2 years
Table (5) demonstrates the observed counts of transitions during time interval ∆𝒕 = 𝟑 𝒚𝒆𝒂𝒓𝒔
To State 1 To State 2 To State 3 To State 4 Total counts
Transitions out form State 1 21 8 7 3 39
Transitions out form State 2 1 6 3 1 11
Transitions out form State 3 0 0 0 0 0
Transitions out form State 4 0 0 0 0 0
Total 22 14 10 4 50
Fig.8 total marginal counts of transitions from state1 and state 2 during time interval Δ𝑡 = 3 years
Fig. 9 shows the transition counts from state 1 and state 2 to the other states during time interval
Δ𝑡 = 3 years
Table (6) demonstrates the numbers of observed transitions among states of the NAFLD process during different time
intervals ∆𝒕 = 𝟏, 𝟐, 𝟑 𝒚𝒆𝒂𝒓𝒔
∆𝑡
Transitions among states
(1,1) (1,2) (1,3) (1,4) (2,1) (2,2) (2,3) (2,4)
1 330 163 45 12 5 185 45 15
2 70 30 10 1 2 20 13 4
3 21 8 7 3 1 6 3 1
421 201 62 16 8 211 61 20
Table (7) demonstrates the total counts of transitions throughout the whole period of the follow-up (8 years)
State 1 State 2 State 3 State 4 Total counts
State 1 421 201 62 16 700
State 2 8 211 61 20 300
State 3 0 0 0 0 0
State 4 0 0 0 0 0
Total 429 412 123 36 1000
Fig.10 total marginal counts of transitions from state1 and state 2 during whole period of follow-up
Fig. 11 shows the transition counts from state 1 and state 2 to the other states during the follow-up
period.
Fig. 12 shows the ratio of the transition counts in different time intervals. In ∆𝑡 = 1 year ,
the counts were 800 out of 1000 transitions . In ∆𝑡 = 2 year , the counts were 150 out of
1000 transitions. In ∆𝑡 = 3 year , the counts were 50 out of 1000 transitions. The ratio
are 0.8 , 0.15 and 0.05 respectively .
Fig . 12 shows the ratio of the transition counts in different time intervals
2.2. Steps to get the previously mentioned statistical indices
2.2.1. Estimation of transition rate matrix and the variance-covariance matrix
Using the transition counts tables (Table3, Table 4, and Table5) to get the estimated
rates and variance-covariance matrix in each interval [12].
Calculations in interval Δ𝑡 = 1 year: Table 8 shows the initial rates.
Table 8: initial rates in the time interval Δ𝑡 = 1 year
Initial rate 𝜆12 𝜆14 𝜇21 𝜆23 𝜆24
Calculation 330
550
= 0.3
12
550
= 0.022
5
250
= 0.02 45
250
= 0.18
15
250
= 0.06
Construct the initial Q transition rate matrix:
𝑄𝑖𝑛𝑖𝑡𝑖𝑎𝑙 = [
−0.322
0.02
0
0
0.3
−0.26
0
0
0.0
0.18
0
0
0.022
0.06
0
0
]
[
𝜆12
𝜆14
𝜇21
𝜆23
𝜆24]
=
[
0.3
0.022
0.02
0.18
0.06 ]
Step 1: calculate the eigenvalues for this Qinitial matrix at Δ𝑡 = 1 : −0.3744 & − 0.2076
Step 2: calculate the partial differentiation of eigenvalue function with respect to specific
theta or rate i.e.
𝜕
𝜕𝜃ℎ
𝑃𝑖𝑗(𝑡) = 𝑡 𝑒𝛬𝑡
𝑑 𝛬 to get the score function. (Substitute t=1)
[
𝑣1
𝑣2
𝑣3
𝑣4
𝑣5]
=
[
−0.712
−0.7269
−0.5488
−0.7733
−0.7733]
Step 3: scale the above score function with a factor equals 4 ∗ (550) + 4 ∗ (250) = 3200
[
−2278.244
−2326.141
−1756.172
−2472.620
−2472.620]
Step 4: multiply the scaled score function with the transposed scaled score function to
get the Hessian matrix:
𝑀(𝜃) = 1 × 10+6
[
5.1904
5.2995
4.0010
5.6378
5.6378
5.2995
5.4109
4.0851
5.7563
5.7563
4.0010
4.0851
3.0841
4.3459
4.3459
5.6378
5.7563
4.3459
6.1237
6.1237
5.6378
5.7563
4.3459
6.1237
6.1237]
Step 5: scale the above hessian matrix with a factor equals 53096.45
𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃) = 1 × 10+11
[
2.7559
2.8138
2.1244
2.9934
2.9934
2.8138
2.8730
2.1690
3.0564
3.0564
2.1244
2.1690
1.6376
2.3075
2.3075
2.9934
3.0564
2.3075
3.2515
3.2515
2.9934
3.0564
2.3075
3.2515
3.2515]
Step 6: invert the scaled Hessian matrix:
[𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃)]−1
= 1 × 10−12
[
0.1454
0.1484
0.1120
0.1579
0.1579
0.1484
0.1515
0.1144
0.1612
0.1612
0.1120
0.1144
0.0864
0.1217
0.1217
0.1579
0.1612
0.1217
0.1715
0.1715
0.1579
0.1612
0.1217
0.1715
0.1715]
Step 7: multiply the inverted scaled Hessian matrix with the scaled score function
1 × 10−8
[
−0.1655
−0.1689
−0.1275
−0.1797
−0.1797]
Step 8: apply Quasi-Newton formula: 𝛉
⃗
⃗ 1 = 𝛉
⃗
⃗ 0 + [𝑀(𝛉
⃗
⃗ 0)]
−1
𝑆(𝛉
⃗
⃗ 𝟎) i.e. (add the initial rate
values in vector form to the above calculated vector)
𝑡ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑟𝑎𝑡𝑒𝑠 𝑑𝑢𝑟𝑖𝑛𝑔 Δ𝑡 = 1 𝑦𝑒𝑎𝑟 =
[
𝜆
̂12
𝜆
̂14
𝜇̂21
𝜆
̂23
𝜆
̂24]
=
[
0.3
0.022
0.02
0.18
0.06 ]
Calculations in interval Δ𝑡 = 2 year: Table 9 shows the initial rates.
Table 9: the initial rates in the time interval Δ𝑡 = 2 year
Initial rate 𝜆12 𝜆14 𝜇21 𝜆23 𝜆24
Calculation 30
111
= 0.27
1
111
= 0.009
2
39
= 0.05 13
39
= 0.333
4
39
= 0.103
Construct the initial Q transition rate matrix:
𝑄𝑖𝑛𝑖𝑡𝑖𝑎𝑙 = [
−0.279
0.05
0
0
0.27
−0.486
0
0
0.0
0.333
0
0
0.009
0.103
0
0
]
[
𝜆12
𝜆14
𝜇21
𝜆23
𝜆24]
=
[
0.27
0.009
0.05
0.333
0.103]
Step 1: calculate the eigenvalues for this Qinitial matrix at Δ𝑡 = 2 : −0.2269 & − 0.5381
Step 2: calculate the partial differentiation of eigenvalue function with respect to specific
theta or rate i.e.
𝜕
𝜕𝜃ℎ
𝑃𝑖𝑗(𝑡) = 𝑡 𝑒𝛬𝑡
𝑑 𝛬 to get the score function. (Substitute t=2)
[
𝑣1
𝑣2
𝑣3
𝑣4
𝑣5]
=
[
−1.0773
−1.1719
−0.2696
−0.7803
−0.7803]
Step 3: scale the above score function with a factor equals 4 ∗ (111) + 4 ∗ (39) = 600
[
−646.3779
−703.1237
−161.7689
−468.1962
−468.1962]
Step 4: multiply the scaled score function with the transposed scaled score function to
get the Hessian matrix:
𝑀(𝜃) = 1 × 10+5
[
4.178
4.5448
1.0456
3.0263
3.0263
4.5448
4.9438
1.1374
3.2920
3.2920
1.0456
1.1374
0.2617
0.7574
0.7574
3.0263
3.2920
0.7574
2.1921
2.1921
3.0263
3.2920
0.7574
2.1921
2.1921]
Step 5: scale the above hessian matrix with a factor equals 15476.614
𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃) = 1 × 10+9
[
6.4651
7.0327
1.6180
4.6829
4.6829
7.0327
7.6501
1.7601
5.0940
5.0940
1.6180
1.7601
0.4049
1.1720
1.1720
4.6829
5.0940
1.1720
3.3920
3.3920
4.6829
5.0940
1.1720
3.3920
3.3920]
Step 6: invert the scaled Hessian matrix:
[𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃)]−1
= 1 × 10−10
[
0.1424
0.1550
0.0356
0.1032
0.1032
0.1550
0.1686
0.0388
0.1122
0.1122
0.0356
0.0388
0.0089
0.0258
0.0258
0.1032
0.1122
0.0258
0.0747
0.0747
0.1032
0.1122
0.0258
0.0747
0.0747]
Step 7: multiply the inverted scaled Hessian matrix with the scaled score function:
1 × 10−7
[
−0.3034
−0.3300
−0.0759
−0.2198
−0.2198]
Step 8: apply Quasi-Newton formula: 𝛉
⃗
⃗ 1 = 𝛉
⃗
⃗ 0 + [𝑀(𝛉
⃗
⃗ 0)]
−1
𝑆(𝛉
⃗
⃗ 𝟎) i.e. (add the initial rate
values in vector form to the above calculated vector)
𝑡ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑟𝑎𝑡𝑒𝑠 𝑑𝑢𝑟𝑖𝑛𝑔 Δ𝑡 = 2 𝑦𝑒𝑎𝑟 =
[
𝜆
̂12
𝜆
̂14
𝜇̂21
𝜆
̂23
𝜆
̂24]
=
[
0.27
0.009
0.05
0.333
0.103]
Calculations in interval Δ𝑡 = 3 year: Table 10 shows the initial rates.
Table 10: initial rates during the time interval Δ𝑡 = 3 year
Initial rate 𝜆12 𝜆14 𝜇21 𝜆23 𝜆24
Calculation 8
39
= 0.205
3
39
= 0.077
1
11
= 0.091 3
11
= 0.273
1
11
= 0.091
Construct the initial Q transition rate matrix:
𝑄𝑖𝑛𝑖𝑡𝑖𝑎𝑙 = [
−0.282
0.091
0
0
0.205
−0.455
0
0
0.0
0.273
0
0
0.077
0.091
0
0
]
[
𝜆12
𝜆14
𝜇21
𝜆23
𝜆24]𝑖𝑛𝑖𝑡𝑖𝑎𝑙
=
[
0.205
0.077
0.091
0.273
0.091]
Step 1: calculate the eigenvalues for this Qinitial matrix at Δ𝑡 = 3 : −0.2068 & − 0.5302
Step 2: calculate the partial differentiation of eigenvalue function with respect to specific
theta or rate i.e.
𝜕
𝜕𝜃ℎ
𝑃𝑖𝑗(𝑡) = 𝑡 𝑒𝛬𝑡
𝑑 𝛬 to get the score function. (Substitute t=3)
[
𝑣1
𝑣2
𝑣3
𝑣4
𝑣5]
=
[
−219.663
−276.039
−41.8608
−168.862
−168.862]
Step 3: scale the above score function with a factor equals 4 ∗ (39) + 4 ∗ (11) = 200
[
−1.0983
−1.3802
−0.2093
−0.8443
−0.8443]
Step 4: multiply the scaled score function with the transposed scaled score function to
get the Hessian matrix:
𝑀(𝜃) = 1 × 10+4
[
4.8252
6.0636
0.9195
3.7093
3.7093
6.0636
7.6198
1.1555
4.6613
4.6613
0.9195
1.1555
0.1752
0.7069
0.7069
3.7093
4.6613
0.7069
2.8514
2.8514
3.7093
4.6613
0.7069
2.8514
2.8514]
Step 5: scale the above hessian matrix with a factor equals 1289.338
𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃) = 1 × 10+7
[
6.2213
7.8180
1.1856
4.7825
4.7825
7.8180
9.8245
1.4899
6.0099
6.0099
1.1856
1.4899
0.2259
0.9114
0.9114
4.7825
6.0099
0.9114
3.6765
3.6765
4.7825
6.0099
0.9114
3.6765
3.6765]
Step 6: invert the scaled Hessian matrix:
[𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃)]−1
= 1 × 10−8
[
0.1115
0.1401
0.0212
0.0857
0.0857
0.1401
0.1760
0.0267
0.1077
0.1077
0.0212
0.0267
0.0040
0.0163
0.0163
0.0857
0.1077
0.0163
0.0659
0.0659
0.0857
0.1077
0.0163
0.0659
0.0659]
Step 7: multiply the inverted scaled Hessian matrix with the scaled score function:
1 × 10−5
[
−0.0930
−0.1168
−0.0177
−0.0715
−0.0715]
Step 8: apply Quasi-Newton formula: 𝛉
⃗
⃗ 1 = 𝛉
⃗
⃗ 0 + [𝑀(𝛉
⃗
⃗ 0)]
−1
𝑆(𝛉
⃗
⃗ 𝟎) i.e. (add the initial rate
values in vector form to the above calculated vector)
𝑡ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑟𝑎𝑡𝑒𝑠 𝑑𝑢𝑟𝑖𝑛𝑔 Δ𝑡 = 3 𝑦𝑒𝑎𝑟 =
[
𝜆
̂12
𝜆
̂14
𝜇̂21
𝜆
̂23
𝜆
̂24]
=
[
0.205
0.077
0.091
0.273
0.091]
To get the final rate and variance-covariance matrix, the estimated rate vector in each
interval is weighted according to the contribution of the counts of transitions in this
interval and summed
(. 8)
[
. 3
. 022
. 02
. 18
. 06 ]
+ (. 15)
[
. 27
. 009
. 05
. 333
. 103]
+ (. 05)
[
. 205
. 077
. 091
. 273
. 091]
=
[
. 2907
. 0228
. 0280
. 2076
. 068 ]
𝑇ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑓𝑖𝑛𝑎𝑙 𝑄 𝑚𝑎𝑡𝑟𝑖𝑥 = 𝑄
̂ = [
−0.3135
0.0280
0
0
0.2907
−0.3036
0
0
0.0
0.2076
0
0
0.0228
0.068
0
0
]
𝐹𝑜𝑟 𝑡ℎ𝑖𝑠 𝑄
̂ 𝑚𝑎𝑡𝑟𝑖𝑥 ∶ 𝑤1 = −0.3989 𝑎𝑛𝑑 𝑤2 = −0.2182
𝐹𝑜𝑟 𝑡ℎ𝑖𝑠 𝑄
̂ 𝑚𝑎𝑡𝑟𝑖𝑥 ∶ 𝑡ℎ𝑒 𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒𝑠 𝑎𝑟𝑒 { −0.3989, −0.2182, 0, 0 }
Also the weighted sum of the inversed scaled Hessian matrix should be used as the variance -
covariance matrix of parameter 𝜃
[𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃)]−1
= 1 × 10−10
[
. 5788
. 7237
. 1116
. 4440
. 4440
. 7237
. 9055
. 1394
. 5554
. 5554
. 1116
. 1394
. 0216
. 0856
. 0856
. 4440
. 5554
. 0856
. 3407
. 3407
. 4440
. 5554
. 0856
. 3407
. 3407]
2.2.2. Estimation of transition probability matrix
Solving the forward Kolmogorov differential equations (in this dataset with differential
operator) yielded the eight PDFs or the empirical transition probability matrix. A
MATLAB code illustrates these calculations. [13] (See supplementary materials.)
This matrix is exactly equal to the exponentiation of the estimated transition rate matrix
which supports time homogeneity of the disease process.
Transition probability matrix after one year was (by either way):
𝑃(𝑡 = 1) = [
. 7339
. 0206
0
0
. 2138
. 7412
0
0
. 0246
. 1793
1
0
. 0277
. 0590
0
1
]
2.2.3. Mean Sojourn time in each state and its variance
𝐸(𝑠1) =
1
𝜆12+𝜆14
=
1
.2907+.0228
= 3.1898𝑦𝑒𝑎𝑟. Average time spent in S1 is about 3 years.
𝐸(𝑠2) =
1
𝜇21 + 𝜆23 + 𝜆24
=
1
. 0280 + .2076 + .068
= 3.2938 𝑦𝑒𝑎𝑟 ,
Average time spent in S2 is about 3 years.
𝑣𝑎𝑟(𝑠1) =
1
(. 2907 + .0228)4
[−1 −1 −1 −1 −1][M(θ)]−1|θ=θ
̂
[
−1
−1
−1
−1
−1]
= 9.4815 × 10−8
𝑣𝑎𝑟(𝑠2) =
1
(. 02805 + .2076 + .068)4
[−1 −1 −1 −1 −1][M(θ)]−1|θ=θ
̂
[
−1
−1
−1
−1
−1]
= 1.0780 × 10−7
2.2.3. State probability distribution at specific time point
Once the rate matrix is obtained, these estimated rates are substituted into the calculated
PDFs’ from the solved differential equations to get the state probability distribution at any point
in time as well as the expected number of patients. [14]
Studying a cohort of 3000 patients with the initial distribution [0.7 0.3 0 0], and initial
numbers of patients in each state are [2100 900 0 0].
At 1 year, the state probability distribution is approximate:
𝑃(1) = [. 7 . 3 0 0] [
. 7339
. 0206
0
0
. 2138
. 7412
0
0
. 0246
. 1793
1
0
. 0277
. 0590
0
1
] = [. 5199 . 3720 . 0710 . 0371]
And the expected numbers of patients in each state is:
[2100 900 0 0] [
. 7339
. 0206
0
0
. 2138
. 7412
0
0
. 0246
. 1793
1
0
. 0277
. 0590
0
1
] = [1559 1117 214 110]
At 20 years, the state probability distribution is approximate:
𝑃(20) = [. 7 . 3 0 0] [
. 0062
. 0019
0
0
. 0199
. 0069
0
0
. 6742
. 7413
1
0
. 2997
. 2499
0
1
] = [. 0049 . 0160 . 6943 . 2848]
And the expected numbers of patients in each state is:
[2100 900 0 0] [
. 0062
. 0019
0
0
. 0199
. 0069
0
0
. 6742
. 7413
1
0
. 2997
. 2499
0
1
] = [15 45 2085 855]
At 60 years, the state probability distribution is approximate:
𝑃(60) = [. 7 . 3 0 0] [
0
0
0
0
0
0
0
0
. 7
. 75
1
0
. 3
. 25
0
1
] = [0 0 . 7097 . 2903]
And the expected numbers of patients in each state is:
[2100 900 0 0] [
0
0
0
0
0
0
0
0
. 7
. 75
1
0
. 3
. 25
0
1
] = [0 0 2145 855]
2.2.4. Stationary probability distribution
This distribution is attained at 42 years. The asymptotic variance-covariance matrix is
calculated as follows
At 42 years and more, the state probability distribution is [0 0 0.7097 0.2903]
Step 1: the transition probability matrix at ≥ 42 years where all the participants are
either in S3 or S4 ( absorbing, death state) will be :
𝑃(𝑡 = 42) = [
0
0
0
0
0
0
0
0
. 7
. 75
1
0
. 3
. 25
0
1
]
Step 2: partially differentiate the transposed rate matrix with respect to each theta, the
result is a vector of fives ones
[1 1 1 1 1]
Step 3: multiply the vector of state probability distribution attained at 42 years ( i.e the
stationary probability distribution) with the row vector of ones to get 𝐶(𝜃) :
𝐶(𝜃) = [
0
0
0.7
0.3
] [1 1 1 1 1] = [
0 0 0
0 0 0
0.7 0.7 0.7
0 0
0 0
0.7 0.7
0.3 0.3 0.3 0.3 0.3
]
Step 4: calculate the pseudo-inverse of the transposed estimated rate matrix( final
estimates rate matrix) via singular value decomposition:
[𝑄′]+
= [
−2.4852
−1.4878
0
0
0.7142
−1.6733
0
0
1.1891
2.2828
0
0
0.5819
0.8783
0
0
]
Then multiply this matrix with (-1)
−[𝑄′]+
= [
2.4852
1.4878
0
0
−0.7142
1.6733
0
0
−1.1891
−2.2828
0
0
−0.5819
−0.8783
0
0
]
Step 5: Multiply −[𝑄′]+
with 𝐶(𝜃) to get 𝐴(𝜃)
𝐴(𝜃) = −[𝑄′]+
𝐶(𝜃) = [
−1.0128
−1.875
0
0
−1.0128
−1.875
0
0
−1.0128
−1.875
0
0
−1.0128
−1.875
0
0
−1.0128
−1.875
0
0
]
Step 6: apply multivariate delta method to get the variance-covariance matrix
𝐴(𝜃)[𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃)]−1[𝐴(𝜃)]𝑇
= 1 × 10−8
[
0.0939
0.1739
0
0
0.1739
0.3220
0
0
0
0
0
0
0
0
0
0
]
[𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃)]−1
is the inversed Hessian scaled matrix.
2.2.5. Life Expectancy of the Patient (mean time to absorption)
Step 1: specify the Q rate matrix: (2 transient states and 2 absorbing states) .Partition
the Q matrix as follows:
𝑄
̂ = [
−(𝜆12 + 𝜆14)
𝜇21
𝜆12
−(𝜇21 + 𝜆23 + 𝜆24)
0
0
0
0
0
𝜆23
𝜆14
𝜆24
0
0
0
0
] = [
𝐵 𝐴
0 0
] 𝑠𝑜 𝐴 = 𝐵𝑍
𝑄
̂ = [
−0.3135
0.0280
0
0
0.2907
−0.3036
0
0
0.0
0.2076
0
0
0.0228
0.068
0
0
]
Step 2: calculate Z matrix.
Step 3: get the inverse of the B matrix
Step 4: apply the formula of mean time to absorption
𝐸(𝜏𝑘) = (−1)
𝑑𝑓∗
𝑘
(𝑠)
𝑑𝑠
|
𝑠=0
= 𝑃(0)[𝑠𝐼 − 𝐵]−2
𝐴|𝑠=0 = 𝑃(0)[𝐵]−1
𝑍 = [𝐵]−1
𝑍
𝐸(𝜏𝑖𝑘) = [𝐵]−1
𝑍 = [
−3.48691 −3.33935
−0.32212 −3.60174
] [
−0.69325 −0.30675
−0.74772 −0.25228
] = [
4.9159 1.9121
2.9163 1.0072
]
𝐸(𝜏13) = 4.9159 𝑦𝑒𝑎𝑟𝑠, 𝐸(𝜏14) = 1.9121 𝑦𝑒𝑎𝑟𝑠 .
𝐸(𝜏23) = 2.9163 𝑦𝑒𝑎𝑟𝑠, 𝐸(𝜏24) = 1.0072 𝑦𝑒𝑎𝑟𝑠 .
In this dataset, the average time from S1 to S3 is about 5 years, from S1 to S4, it is about
2 years, from S2 to S3, it is about 3 years and from S2 to S4, it is about 1 year.
According to the American Association for the study of Liver Disease [15] , the
most common cause of death in patients with NAFLD is cardiovascular disease (CVD),
independent of other metabolic comorbidities, whereas the liver-related mortality is the
third most common cause of death among patients with NAFLD. Cancer-related
mortality is among the top three causes of death in subjects with NAFLD. As shown
from the calculations; the mean time to absorption can be classified into : mean time
from state 1( susceptible individuals with risk factors) to state 3( liver-related mortality) is
approximately 5 years, while the mean time from state 1 to state 4 ( for example CVD
as an example for causes of death other than liver-related mortality causes) is
approximately 2 years .The mean time from state 2 (NAFLD) to state 3 ( liver-related
mortality ) is approximately 3 years while it decreases to approximately 1 year from
state 2 ( NAFLD) to state 4 (causes unrelated to liver complications).
This dataset have some remarks to be mentioned
From Table 7, the observed initial rates or the initial Q matrix is as calculated
below:
𝑄 = [
−0.31
0.027
0
0
0.287
−0.297
0
0
0
0.203
0
0
0.023
0.067
0
0
]
The estimated rates calculated using the MLE and Quasi-Newton nearly reach
equality with these initial observed rates .Therefore, the advantage of this approach is
that the estimated rates can be obtained from the first iteration. However, the limitation
with this method is the degree of the polynomial representing the eigenvalue function as
a function of rates. The 2nd degree polynomial is present in a formula that is easily
differentiated with respects to each of the rates composing this polynomial. In addition, if
the degree of the polynomial is third or fourth degree, the presence of a formula to
differentiate this eigenvalue function helps to use this method of MLE. The higher
degrees of polynomials lacking the presence of such well-formed formula to be
differentiated make it difficult to use this method.
2.3. A continuous time Markov chains (CTMCs) should be tested for time
homogeneity and Markovian property.
2.3.1. Test the time homogeneity hypothesis
The empirical transition probability matrix is exactly as the exponentiation of the
estimated rate matrix and this supports time homogeneity [16] .
If the process is treated as discrete-time Markov chains and considered to be
embedded into continuous time Markov chain, the transition probability matrix in
discrete time, according to the data collected in Table 7, is
𝑃 = [
0.601
0.027
0
0
0.287
0.703
0
0
0.089
0.203
1
0
0.023
0.067
0
1
]
Taking the log for this transition probability matrix to obtain the Q for the corresponding
CTMC:
𝑄
̂ = [
−0.5189
0.0418
0
0
0.4438
−0.3612
0
0
0.0626
0.2400
0
0
0.0125
0.0794
0
0
]
This Q matrix fulfills the criteria for the Q transition rate matrix which are:
1. ∑ 𝑞𝑖𝑗(𝑡) = 0
𝑆
2. 𝑞𝑖𝑗(𝑡) ≥ 0 , 𝑖 ≠ 𝑗
3. − ∑ 𝑞𝑖𝑗(𝑡)
𝑆 = 𝑞𝑖𝑖 , 𝑖 = 𝑗
So there is no embedding problem and this support time homogeneity of the disease
process.[17]
2.3.2. Test the Markovian hypothesis
The difference between the empirical transition matrix 𝑃0𝑡
(𝑒)
calculated over the time
interval [0,t] and the product of the half-period matrices, 𝑃
0
𝑡
2
(𝑒)
and 𝑃𝑡
2
𝑡
(𝑒)
, is calculated and
compared using the {𝐿2
− 𝑛𝑜𝑟𝑚} as defined in the following equations (1) and (2). If the
difference approaches zero, the process is Markovian. [18]
‖𝑍‖ = 𝜌𝑚𝑎𝑥 (𝑍) (1)
the {𝐿2
− 𝑛𝑜𝑟𝑚} of the transition matrix (Z) is the maximum singular value of Z .
The difference is
𝑍 = 𝑃
0𝑡
(𝑒)
− [ 𝑃
0
𝑡
2
(𝑒)
× 𝑃𝑡
2
𝑡
(𝑒)
] (2)
A MATLAB code illustrates the above concepts. (See supplementary materials)
The results obtained from running this code is
[
0.7339
0.0206
0
0
0.2138
0.7412
0
0
0.0246
0.1793
1
0
0.0277
0.059
0
1
] − [
0.8558
0.0120
0
0
0.1246
0.8600
0
0
0.0068
0.0963
1
0
0.0128
0.0316
0
1
]
× [
0.8558
0.0120
0
0
0.1246
0.8600
0
0
0.0068
0.0963
1
0
0.0128
0.0316
0
1
]
= 1 × 10−3
× [
0.0112
0.0104
0
0
0.0113
0.1048
0
0
−0.0184
0.1004
0
0
0.0084
0.0704
0
0
]
𝑇ℎ𝑒 𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒𝑠 𝑓𝑜𝑟 𝑍 = 1 × 10−3
× [
0.0099
0.106
0
0
]
The maximum singular value for Z is zero.
So the process obeys the Markovian property or the Chapman-Kolmogorov equations.
2.4. Goodness of Fit for the Markov chains
To calculate goodness of fit for multistate model used in this simple model, it is like
the procedure used in contingency table, and it is calculated in each interval then sum:
Step 1 : 𝐻0 = 𝑓𝑢𝑡𝑢𝑟𝑒 𝑠𝑡𝑎𝑡𝑒 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑑𝑒𝑝𝑒𝑛𝑑 𝑜𝑛 𝑡ℎ𝑒 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑠𝑡𝑎𝑡𝑒.
𝐻1 = 𝑓𝑢𝑡𝑢𝑟𝑒 𝑠𝑡𝑎𝑡𝑒 𝑑𝑜𝑒𝑠 𝑑𝑒𝑝𝑒𝑛𝑑 𝑜𝑛 𝑡ℎ𝑒 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑠𝑡𝑎𝑡𝑒
Step 2: calculate the 𝑃𝑖𝑗(∆𝑡 = 1) = [
. 7338
. 0206
0
0
. 2139
. 7411
0
0
. 0247
. 1793
1
0
. 0277
. 059
0
1
]
Step 3: calculate the expected counts in this interval by multiplying each row in the
probability matrix with the corresponding total marginal counts in the observed transition
counts matrix in the same interval to get the expected counts as shown in Table 11.
Total marginal counts for S1 are 550 and for S2 are 250. The observed counts are
shown in Table 3 for this time interval.
Table 11 : the expected counts during the ∆𝑡 = 1 year
State 1 State 2 State3 State4 total
State1 403.645 117.59 13.53 15.235 550.
State2 5.15 185.3 44.825 14.75 250.025
State3 0 0 0 0 0
State4 0 0 0 0 0
Step 4: apply ∑ ∑
(𝑂𝑖𝑗−𝐸𝑖𝑗)
2
𝐸𝑖𝑗
4
𝑗=1
4
𝑖=1 = 104.866~𝜒(4−1)(4−1)(.05)
2
The same steps are used for the observed transition counts in the ∆𝑡 = 2 𝑎𝑛𝑑 ∆𝑡 = 3
with the following results
𝑃𝑖𝑗(∆𝑡 = 2) = [
. 543
. 0304
0
0
. 3154
. 5537
0
0
. 0811
. 3126
1
0
. 0606
. 1033
0
1
]
Table 12 : the expected counts during the ∆𝑡 = 2 years
State 1 State 2 State3 State4 total
State1 60.273 35.0094 9.0021 6.7266 111
State2 1.1856 21.5943 12.1914 4.0287 39
State3 0 0 0 0 0
State4 0 0 0 0 0
∑ ∑
(𝑂𝑖𝑗 − 𝐸𝑖𝑗)
2
𝐸𝑖𝑗
4
𝑗=1
4
𝑖=1
= 8.003~𝜒(4−1)(4−1)(.05)
2
The same steps are used for the observed transition counts in ∆𝑡 = 3 with the following
results:
𝑃𝑖𝑗(∆𝑡 = 3) = [
. 405
. 0337
0
0
. 3498
. 4169
0
0
. 151
. 4127
1
0
. 0943
. 1368
0
1
]
Table 13: The expected counts during∆𝑡 = 3 years
State 1 State 2 State3 State4 total
State1 15.795 13.6422 5.889 3.6777 39
State2 .3707 4.5859 4.5397 1.5048 11
State3 0 0 0 0 0
State4 0 0 0 0 0
∑ ∑
(𝑂𝑖𝑗 − 𝐸𝑖𝑗)
2
𝐸𝑖𝑗
4
𝑗=1
4
𝑖=1
= 6.579~𝜒(4−1)(4−1)(.05)
2
Step 5: sum up the above results to get
∑ ∑ ∑
(𝑂𝑖𝑗𝑙 − 𝐸𝑖𝑗𝑙)
2
𝐸𝑖𝑗𝑙
𝑡=3
𝑙=1
4
𝑗=1
4
𝑖=1
= 119.449~𝜒(𝑑𝑓=27)(.05)
2
So from the above results the null hypothesis is rejected while the alternative
hypothesis is accepted and the model fits the data that is to mean the future state
depends on the current state with the estimated transition rate and probability matrices
as obtained.
Supplementary materials
The supplementary materials contain a file with a theoretical background for the
mathematical and statistical calculations. Excel file for the data (Table 2) . MATLAB
codes for all the calculation .
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable
Availability of data and material
Not applicable. Data sharing not applicable to this article as no datasets were generated
or analyzed during the current study.
Competing interests
The author declares that I have no competing interests.
Funding
No funding resource. No funding roles in the design of the study and collection,
analysis, and interpretation of data and in writing the manuscript are declared
Authors’ contribution
I am the author who has carried the mathematical analysis as well as applying these
mathematical statistical concepts on the hypothetical example.
Acknowledgement
Not applicable
Declaration of competing interest
The author declare that they have no known competing financial interests or personal
relationships which have could be perceived to have influenced the work reported in this
article.
CRediT author Statement
Attia IM: conceptualization, formal analysis, data generation and creation, Methodology,
software computation, writing, review, and editing.
ORiCD= 0000-0002-7333-9713
References
[1] Z. Younossi et al., “Global burden of NAFLD and NASH: trends, predictions, risk
factors and prevention,” Nat Rev Gastroenterol Hepatol, vol. 15, no. 1, pp. 11–20,
Jan. 2018, doi: 10.1038/nrgastro.2017.109.
[2] M. Eslam et al., “MAFLD: A Consensus-Driven Proposed Nomenclature for
Metabolic Associated Fatty Liver Disease,” Gastroenterology, vol. 158, no. 7, pp.
1999-2014.e1, May 2020, doi: 10.1053/j.gastro.2019.11.312.
[3] P. L. Huang, “A comprehensive definition for metabolic syndrome,” Disease Models
& Mechanisms, vol. 2, no. 5–6, pp. 231–237, Apr. 2009, doi: 10.1242/dmm.001180.
[4] H. Tilg and M. Effenberger, “From NAFLD to MAFLD: when pathophysiology
succeeds,” Nat Rev Gastroenterol Hepatol, vol. 17, no. 7, Art. no. 7, Jul. 2020, doi:
10.1038/s41575-020-0316-6.
[5] A. De and A. Duseja, “Natural History of Simple Steatosis or Nonalcoholic Fatty
Liver,” Journal of Clinical and Experimental Hepatology, vol. 10, no. 3, pp. 255–262,
May 2020, doi: 10.1016/j.jceh.2019.09.005.
[6] P. Bedossa et al., “Histopathological algorithm and scoring system for evaluation of
liver lesions in morbidly obese patients,” Hepatology, vol. 56, no. 5, pp. 1751–1759,
2012, doi: 10.1002/hep.25889.
[7] D. E. Kleiner et al., “Design and validation of a histological scoring system for
nonalcoholic fatty liver disease,” Hepatology, vol. 41, no. 6, pp. 1313–1321, 2005,
doi: 10.1002/hep.20701.
[8] L. J. S. Allen, An Introduction to Stochastic Processes with Applications to Biology,
2nd edition. Boca Raton, FL: Chapman and Hall/CRC, 2010.
[9] Z. M. Younossi et al., “The economic and clinical burden of nonalcoholic fatty liver
disease in the United States and Europe,” Hepatology, vol. 64, no. 5, pp. 1577–
1586, 2016, doi: 10.1002/hep.28785.
[10] Z. M. Younossi et al., “Economic and Clinical Burden of Nonalcoholic
Steatohepatitis in Patients With Type 2 Diabetes in the U.S,” Diabetes Care, vol. 43,
no. 2, pp. 283–289, Feb. 2020, doi: 10.2337/dc19-1113.
[11] J. H. Klotz and L. D. Sharples, “Estimation for a Markov Heart Transplant Model,”
Journal of the Royal Statistical Society: Series D (The Statistician), vol. 43, no. 3,
pp. 431–438, 1994, doi: 10.2307/2348579.
[12] J. D. Kalbfleisch and J. F. Lawless, “The Analysis of Panel Data under a Markov
Assumption,” Journal of the American Statistical Association, vol. 80, no. 392, pp.
863–871, Dec. 1985, doi: 10.1080/01621459.1985.10478195.
[13] C. G. Cassandras and S. Lafortune, Eds., “Introduction to Discrete-Event
Simulation,” in Introduction to Discrete Event Systems, Boston, MA: Springer US,
2008, pp. 557–615. doi: 10.1007/978-0-387-68612-7_10.
[14] C. L. Chiang, “Introduction to stochastic processes in biostatistics.,” 1968. doi:
10.2307/2986707.
[15] N. Chalasani et al., “The diagnosis and management of nonalcoholic fatty liver
disease: Practice guidance from the American Association for the Study of Liver
Diseases,” Hepatology, vol. 67, no. 1, pp. 328–357, 2018, doi: 10.1002/hep.29367.
[16] R. B. Israel, J. S. Rosenthal, and J. Z. Wei, “Finding Generators for Markov
Chains via Empirical Transition Matrices, with Applications to Credit Ratings,”
Mathematical Finance, vol. 11, no. 2, pp. 245–265, 2001, doi: 10.1111/1467-
9965.00114.
[17] K. L. Verbyla, V. B. Yap, A. Pahwa, Y. Shao, and G. A. Huttley, “The embedding
problem for markov models of nucleotide substitution,” PLoS One, vol. 8, no. 7, p.
e69187, 2013, doi: 10.1371/journal.pone.0069187.
[18] P. Lencastre, F. Raischel, P. Lind, and T. Rogers, “Are credit ratings time-
homogeneous and Markov?,” Mar. 2014.
Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.
MATLABcodetransitioncountsineachinterval.pdf
MATLABcodeforcalculatethe nalratevectoranditsvariance.pdf
MATLABcodeforcalculationoftheratematrixatallinterval.pdf
MATLABcodeforthecalculationoftheprobabilitymatrixat rstyear.pdf
MATLABcodetocalculatetheEstimatedMeanSojournTimeandLifeExpectancy.pdf
MATLABcodetocalculateTheVarianceOfTheStationaryDistribution.pdf
theoreticalsupplementarymaterials.pdf

More Related Content

Similar to Metabolic associated fatty liver disease and continuous time Markov chains

Cardiovasc diabetol 2009_sep_26_8_52[1]
Cardiovasc diabetol 2009_sep_26_8_52[1]Cardiovasc diabetol 2009_sep_26_8_52[1]
Cardiovasc diabetol 2009_sep_26_8_52[1]
Manuel F. Miyamoto
 
Cardiovasc diabetol 2009_sep_26_8_52[1]
Cardiovasc diabetol 2009_sep_26_8_52[1]Cardiovasc diabetol 2009_sep_26_8_52[1]
Cardiovasc diabetol 2009_sep_26_8_52[1]
Manuel F. Miyamoto
 
Normal-Weight Obesity & Risk of Subclinical Atherosclerosis
Normal-Weight Obesity & Risk of Subclinical AtherosclerosisNormal-Weight Obesity & Risk of Subclinical Atherosclerosis
Normal-Weight Obesity & Risk of Subclinical Atherosclerosis
Jain hospital,Mahavir Sikshan Sansthan
 
Iisrt zz srinivas ravi
Iisrt zz srinivas raviIisrt zz srinivas ravi
Iisrt zz srinivas ravi
IISRT
 
Diabetes Mellitus Prediction System Using Data Mining
Diabetes Mellitus Prediction System Using Data MiningDiabetes Mellitus Prediction System Using Data Mining
Diabetes Mellitus Prediction System Using Data Mining
paperpublications3
 
ACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domainsACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domains
Matthew Clark
 

Similar to Metabolic associated fatty liver disease and continuous time Markov chains (20)

H017315053
H017315053H017315053
H017315053
 
Application of Support Vector Machine and Fuzzy Logic for Detecting and Ident...
Application of Support Vector Machine and Fuzzy Logic for Detecting and Ident...Application of Support Vector Machine and Fuzzy Logic for Detecting and Ident...
Application of Support Vector Machine and Fuzzy Logic for Detecting and Ident...
 
Industrial-Project-Report
Industrial-Project-ReportIndustrial-Project-Report
Industrial-Project-Report
 
Cardiovasc diabetol 2009_sep_26_8_52[1]
Cardiovasc diabetol 2009_sep_26_8_52[1]Cardiovasc diabetol 2009_sep_26_8_52[1]
Cardiovasc diabetol 2009_sep_26_8_52[1]
 
Cardiovasc diabetol 2009_sep_26_8_52[1]
Cardiovasc diabetol 2009_sep_26_8_52[1]Cardiovasc diabetol 2009_sep_26_8_52[1]
Cardiovasc diabetol 2009_sep_26_8_52[1]
 
Core Components of the Metabolic Syndrome in Nonalcohlic Fatty Liver Disease
Core Components of the Metabolic Syndrome in Nonalcohlic Fatty Liver DiseaseCore Components of the Metabolic Syndrome in Nonalcohlic Fatty Liver Disease
Core Components of the Metabolic Syndrome in Nonalcohlic Fatty Liver Disease
 
Prevalence of Chronic Kidney disease in Patients with Metabolic Syndrome in S...
Prevalence of Chronic Kidney disease in Patients with Metabolic Syndrome in S...Prevalence of Chronic Kidney disease in Patients with Metabolic Syndrome in S...
Prevalence of Chronic Kidney disease in Patients with Metabolic Syndrome in S...
 
Normal-Weight Obesity & Risk of Subclinical Atherosclerosis
Normal-Weight Obesity & Risk of Subclinical AtherosclerosisNormal-Weight Obesity & Risk of Subclinical Atherosclerosis
Normal-Weight Obesity & Risk of Subclinical Atherosclerosis
 
LIVER DISEASE PREDICTION BY USING DIFFERENT DECISION TREE TECHNIQUES
LIVER DISEASE PREDICTION BY USING DIFFERENT DECISION TREE TECHNIQUESLIVER DISEASE PREDICTION BY USING DIFFERENT DECISION TREE TECHNIQUES
LIVER DISEASE PREDICTION BY USING DIFFERENT DECISION TREE TECHNIQUES
 
Systems Medicine and Metabolic Diseases
Systems Medicine and Metabolic DiseasesSystems Medicine and Metabolic Diseases
Systems Medicine and Metabolic Diseases
 
Iisrt zz srinivas ravi
Iisrt zz srinivas raviIisrt zz srinivas ravi
Iisrt zz srinivas ravi
 
Diabetes Mellitus Prediction System Using Data Mining
Diabetes Mellitus Prediction System Using Data MiningDiabetes Mellitus Prediction System Using Data Mining
Diabetes Mellitus Prediction System Using Data Mining
 
ACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domainsACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domains
 
International Journal of Clinical Endocrinology
International Journal of Clinical EndocrinologyInternational Journal of Clinical Endocrinology
International Journal of Clinical Endocrinology
 
Selfmaps
SelfmapsSelfmaps
Selfmaps
 
C0151216
C0151216C0151216
C0151216
 
C0151216
C0151216C0151216
C0151216
 
Unravelling the molecular linkage of co morbid diseases
Unravelling the molecular linkage of co morbid diseasesUnravelling the molecular linkage of co morbid diseases
Unravelling the molecular linkage of co morbid diseases
 
Unravelling the molecular linkage of co morbid
Unravelling the molecular linkage of co morbidUnravelling the molecular linkage of co morbid
Unravelling the molecular linkage of co morbid
 
UK BMJ - A systematic review on the Impact of dietary fibre on cardiovascular...
UK BMJ - A systematic review on the Impact of dietary fibre on cardiovascular...UK BMJ - A systematic review on the Impact of dietary fibre on cardiovascular...
UK BMJ - A systematic review on the Impact of dietary fibre on cardiovascular...
 

Recently uploaded

PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
Cherry
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Cherry
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Cherry
 
COMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demeritsCOMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demerits
Cherry
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Cherry
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Cherry
 

Recently uploaded (20)

GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolation
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.
Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.
Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of Asepsis
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) Metabolism
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
COMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demeritsCOMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demerits
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Understanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution MethodsUnderstanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution Methods
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 

Metabolic associated fatty liver disease and continuous time Markov chains

  • 1. Markov Chains Analyzing Dataset of Patients with Metabolic Associated Fatty Liver Disease Iman M. Attia  (  imanattiathesis1972@gmail.com ) Institute of Statistical Studies and Research, Cairo University https://orcid.org/0000-0002-7333-9713 Data Note Keywords: non-alcoholic fatty liver disease, metabolic associated fatty liver disease, steatohepatitis, continuous time Markov chains, mean sojourn time, life expectancy Posted Date: July 1st, 2022 DOI: https://doi.org/10.21203/rs.3.rs-1810831/v1 License:   This work is licensed under a Creative Commons Attribution 4.0 International License.   Read Full License
  • 2. Markov Chains Analyzing Dataset of Patients with Metabolic Associated Fatty Liver Disease Iman M. Attia * *Department of Mathematical Statistics, Faculty of Graduate Studies for Statistical Research, Cairo University, Egypt Corresponding author : Iman M. Attia (imanattiathesis1972@gmail.com ,imanattia1972@gmail.com ) Abstract The prevalence of obesity and type 2 diabetes has reached epidemic levels that parallel the rates of the widely distributed non-alcoholic fatty liver disease (NAFLD). Nearly one billion people worldwide suffered from NAFLD. The estimated annual medical costs for NAFLD exceed €35 billion in four large European countries (the United Kingdom, France, Germany, and Italy) and $100 billion in the United States. According to the American Association for the study of liver disease, NAFLD requires the presence of hepatic steatosis in more than 5 % of hepatocytes detected by histology or imaging with little consumption of alcohol and exclusion of other causes of chronic liver diseases. The risk factors for NAFLD are age>45, males are more susceptible than females, ethnicity; the Hispanics have more prevalent rates than the whites who are more susceptible than the blacks, ingestion of high fat and high cholesterol diet, genetic backgrounds like patatin-like phospholipase domain-containing protein 3 (PNPLA3) gene which is most prevalent in Hispanics followed by non-Hispanics whites and African Americans, and features of metabolic syndrome. The newly proposed name is metabolic associated fatty liver disease (MAFLD). This new definition requires evidence of hepatic steatosis as previously mentioned plus one of three features: obesity or overweight (BMI > 25 kg/m2 in white and > 23 kg/m2 in Asian Individuals), type 2 diabetes, or lean or normal weight with evidence of metabolic dysregulation. For the definition of metabolic dysregulation , at least two risk metabolic risk factors should be present. These factors are waist circumference ≥ 102cm for males and ≥ 88cm for females in the western countries, while for the Asian and Eastern males and female , it is ≥ 90 cm and ≥ 80 cm respectively, prediabetes, homeostasis model assessment of insulin resistance (HOMA-IR) ≥ 2.5 ,elevated high-sensitive serum C-reactive protein(CRP) denoting inflammation, elevated blood pressure or specific drug treatment, decreased high-density lipoprotein (HDL) cholesterol levels, and increased plasma triglycerides or drug treatment. The pathogenesis of this disease process can be explained by the “two-hit theory” which is updated to the “multiple or parallel hit theory”. The first hit is initiated by liver fat content exceeding five percent of
  • 3. the total hepatocytes and concomitant insulin resistance. This fatty liver is more vulnerable to the second hit, inflammation, and necrosis (death of cells). This inflammation is called steatohepatitis which stimulates fibrosis. Other hits that augment this steatohepatitis are the interactions of the genetic and environmental factors and the cross-talk between different organs and tissues like the adipose tissue, the pancreas, the gut (microbiota), and the liver. Liver biopsy, although invasive and has some limitations like sampling error, hospital admission, elevated costs, and obseobserver- dependents the gold standard method for diagnosis. Rigorous control of risk factors with lifestyle modifications by reducing the caloric intake and exercises can protect the liver. The newly emerging anti-fibrotic and anti-inflammatory drugs are promising to reduce the histo-pathological picture of the disease. Key words: non-alcoholic fatty liver disease, metabolic associated fatty liver disease, steatohepatitis, continuous time Markov chains, mean sojourn time, life expectancy. Specification Table Subject Medicine, Hepatology, Endocrinology, Diabetes, Obesity Specific subject area Biostatistics, Epidemiology, HealthCare Science Type of Data Tables & figures, excel workbook, MATLAB codes. How the data were acquired It is a factitious data. It is a depiction of how can the data be in reality. Data Format Raw data represented by the findings of liver biopsy recorded during each visit. Processed data with a MATLAB code to calculate the transition counts in each time interval. Analyzed data with homogenous continuous time Markov chains. Description of the data This is a factitious dataset of 310 participants suffering from risk factors of NAFLD like: type 2 diabetes, hypercholesterolemia, hyperglyceridemia, obesity, hypertension, acting separately or together as a metabolic syndrome. These participants were followed-up for a total period of eight years. Parameters of the dataset The inclusion criteria for the patients are clinical, biochemical, and radiological evidence of insulin resistance, type 2 diabetes, hypercholesterolemia, hypertriglyceridemia, obesity, hypertension. The exclusion criteria for the patients are clinical, biochemical, and radiological evidence of hepatitis B or C infection, primary biliary cirrhosis, primary sclerosing cholangitis, autoimmune hepatitis, Wilson disease, heamochromatosis, alpha one antitrypsin deficiency, celiac disease, Drug-intake, and alcohol consumption.
  • 4. Data source location The data are factious data to give insight and depiction for epidemiological and clinical studies and to illustrate the implementation of the mathematical and statistical model of homogenous type continuous time Markov chains to analyze these data. Data accessibility Within the article. The data is also present on the IEEE Data Port site with URL: https://ieee-dataport.org/documents/ctmc-analyzing- nafld-progression-small-model With DOI: 10.21227/az1b-x326 Also the MATLAB codes is present on CodeOcean site with ULR : https://codeocean.com/capsule/8641183/tree/v2 DOI: 10.24433/CO.6022979.v2 Related research article This dataset was mentioned as supplementary material in the article: “Novel Approach of Multistate Markov Chains to Evaluate Progression in the Expanded Model of Non-Alcoholic Fatty Liver Disease”. Frontiers in applied mathematics and statistics,7 https://www.frontiersin.org/article/10.3389/fams.2021.766085 to illustrate the comparison between the simplest model and the expanded model of the disease process. Value of the data 1- This dataset can give insight to the behavior of the NAFLD. Statistical analysis of this dataset provides the following statistical indices: transition rate matrix, transition probability matrix, the mean sojourn time in each state, the state probability distribution at specific time point, the life expectancy of the patient in each state, the stationary probability distribution of the disease process, and the expected number of patients in each state at specific time point. 2- The previously mentioned statistical indices can be offered to the healthcare policy makers and medical insurance managers to allocate human and financial resources to investigate and treat patients with different phenotypes of NAFLD. These indices can be provided to the epidemiologist to estimate the prevalence and incidence of the NAFLD cases [1]. These indices can also be presented to the pharmaceutical companies to assess the effectiveness of the anti-fibrotic and anti-inflammatory drugs used for treatment of the NAFLD patients. These indices can supply the physicians with the proposed strategies to formulate the protocols to treat patients. Also the nutritionists can get a great benefit from these indices to release new food stuffs that are healthy, delicious, and tasty in the same time in attempt to prevent the disease occurrence. All
  • 5. the above persons are urged to help the community to reduce the prevalence of this disease. 3- This dataset can be reproduced in different communities and populations with different ethnicity backgrounds to study the behavior of the disease. 4- The statistical indices obtained from the analysis of this dataset help in pharmaco- economic evaluation. Information like prediction of the expected number of patients in each state at specific time point in addition to the knowledge of costs of investigating and treating each patients help assess the total costs and economic burden of the disease. The three major categories of this pharmaco-economic evaluation are the cost-benefit analysis, cost-effectiveness analysis, and cost-utility analysis. This evaluation is permissible with the statistical indices supplied by analysis of this dataset. 1. Data description In one of the governmental healthcare unit, three hundreds and ten patients were subjected to clinical, biochemical, and radiological examination. These examinations were done to include data of patients with overweight or obesity, type 2 diabetes, lean subjects with metabolic dysregulations [2], and metabolic syndrome [3] . They were also done to exclude data of subjects with chronic liver diseases like hepatitis B and C infections, autoimmune diseases like primary biliary cirrhosis, autoimmune hepatitis, primary sclerosing cholangitis, hereditary diseases like hemochromatosis and Wilson disease, genetic diseases like alpha-one antitrypsin deficiency disease, and other causes like celiac disease. The diagnostic criteria for metabolic syndrome are the presence of abdominal adiposity distinguished by wasit circumference > 94 for males and > 80 cm for females in eastern countries, it is > 102 cm for the males and > 88 cm for females in the western countries, plus two or three of the following criteria: fasting blood glucose ≥ 100 mg/dL or drug treatment , arterial blood pressure ≥ 130/85 mmHg or drug treatment, triglycerides level ≥ 150 mg/dL or drug treatment, and HDL cholesterol levels < 40 mg/dL for females and < 50 mg/dL for males or drug treatment . Characteristics of participants with metabolic dysregulations, as previously defined in the abstract, were included in the dataset. [4] The biochemical tests were fasting blood glucose (FBG), serum insulin, homeostatic model assessment of insulin resistance (HOMA-IR), serum alkaline phosphatase (ALP), serum alanine aminotransferase (ALT), serum aspartate aminotransferase(AST), gamma-glutamyl-transpeptidase (GGT), serum albumin, serum creatinine and blood urea nitrogen, international normalized ratio (INR), hemoglobin, platelet count and red blood cell count, total cholesterol, low-density lipoprotein cholesterol (LDL-Chol), high-density lipoprotein cholesterol (HDL-Chol), serum
  • 6. triglyceride level, laboratory tests to exclude Hepatitis B and C antigenaemia like antibodies against hepatitis B surface antigen (HBsAg) and hepatitis C virus Antibodies (HCVAb), autoantibodies , serum copper and ceruloplasmin , serum iron , serum ferritin and transferrin saturation, serum alpha-one antitrypsin levels, and C-reactive protein(CRP). Only patients with features of obesity, type 2 diabetes and lean persons with metabolic derangements as previously mentioned were included. Also data of patients with metabolic syndrome, as previously defined, were included. Alcohol consumption should be less than daily 20 gram for female and less than daily 30-40 gram for males. Participants on drugs like corticosteroid, amiodarone, or any other drugs that induce NASH were excluded. NAFLD process is a dynamic process as defined in the abstract. Fig.1 clarifies this process [5] .
  • 7. Fig. 1 Dynamic model of the NAFLD. Non-alcoholic fatty liver (NAFL) phenotype is characterized by the presence of hepatic steatosis or the presence of hepatic steatosis plus either hepatic ballooning or hepatic inflammation. If the risk factors inducing this phenotype are not treated, the patient passes to the more aggressive phenotype non- alcoholic steatohepatitis (NASH) which is characterized by the presence of steatosis, hepatic ballooning and inflammation of any grade. If the risk factors contributing to its presence are rigorously treated and well controlled, the patient can regress to the less severe form (NAFL) or be cured. But if these factors are left untreated, the NASH will induce fibrogenesis. So, finding NAFL or NASH on initial liver biopsy does not impact the course of the disease. NAFL patients have the lowest risk for fibrosis progression than NASH patients. As seen from the figure, in the early stage of the disease, the patient cycles between NAFL and NASH. Regardless, the biopsy findings are NAFL or NASH, about 80% of them are slow progressors and they are unlikely to progress further beyond mild fibrosis (F0 to F2).They nearly evolve to F0 or F1 over 8 years. Approximately 20 % of NASH patients are rapid progressors and they develop severe fibrosis (F3 to F4) within a few years about 2 to 6 years. For each participant eligible for the study, liver biopsy was done to record the findings of the biopsy as defined by Bedossa et al. (2012) algorithm [6]. Fig.2 illustrates this algorithm. This algorithm defines the absence of NAFLD by the presence of steatosis at stage 0. For NAFLD to be defined, it requires the presence of steatosis at any stage. The main two phenotypes of the disease are the non-alcoholic fatty liver (NAFL) and the non-alcoholic steatohepatitis (NASH). NAFL requires the mandatory presence of steatosis at any stage plus one of two: the presence of hepatocyte
  • 8. ballooning of any stage or inflammatory cells of any stage. For NASH to be established, this NASH requires the presence of steatosis at any stage in addition to the presence of the other two elements of hepatocyte ballooning and inflammatory cells at any stages. Table 1 illustrates the comparison between NAFLD activity score (NAS) and the SAF score, fibrosis score is defined in addition to the scoring system of activity that is composed of steatosis, ballooning and inflammation. This fibrosis score is almost the same in both NAS and SAF scores. [7] Fig. 2 Bedossa et al.2012 algorithm NAFLD is characterized by three main histopathologic features: steatosis, liver injury in the form of hepatic ballooning and inflammation (steatohepatitis, NASH), and fibrosis. Absence of steatosis (steatosis grade=0) excludes NAFLD. The presence of hepatic steatosis is a mandatory precedent to establishment of NAFLD. The presence of steatosis at any grade plus one of the following: hepatic ballooning or inflammation point to the diagnosis of NAFL. The presence of the three elements (steatosis, ballooning, and inflammation) indicates NASH.
  • 10. Table 1 b. Table 1.a & b comparison between the NAS and SAF score NAFLD activity score (NAS) proposed by U.S. national institutes of health-sponsered NASH CRN gathers the assessment of steatosis, inflammation, and ballooning to create NAS ranging from 0 to 8 points and a distinct fibrosis score ranging from 0 to 4. It is a useful and beneficial research tool for use in clinical trials but it is not a suitable prognostic tool to use in clinical practice. As a result and to avoid these limitations in routine clinical practice, an important called the steatosis-activity-fibrosis (SAF) has been developed. Using SAF; steatosis, activity, and fibrosis are assessed apart from each other and then an algorithm is implemented to categorize biopsies into one of the three diagnostic groups: normal, NAFL, NASH. The fibrosis score in both system scores is the same. The participants were scheduled to be followed-up every year. But not all of them followed this schedule. Some of them abided to the follow-up period which was every year. Others during their course of the follow-up showed up every two years or even every three years. The overall period of the follow-up was nine years. Although liver biopsy has limitations as previously stated in the abstract, analysis of the dataset was concerned with the liver biopsy findings, because it is the gold standard for diagnosis. This analysis describes the simplest model of “health, disease, death” process utilizing the homogenous type of continuous time Markov chains. [8]
  • 11. During each visit, liver biopsy findings were recorded for each participant. According to these findings the recording values were the states of the Markov model used in the statistical analysis. State 1 described susceptible participants with no biopsy findings suggesting diagnosis of NAFLD. State 2 described cases with biopsy findings suggesting diagnosis of NAFL or NASH. State 3 described death state due to complications of liver disease process. State 4 described death state due to causes unrelated to liver disease. The transition from state 1 to state 2 occurs at a rate called lambda12 or 𝜆12 . The transition from state 2 to state 1 occurs at a rate called mu21 or 𝜇12 . The transition from state 2 to state 3 occurs at a rate called lambda23 or 𝜆23 . The transition from state 1 to state 4 occurs at a rate called lambda14 or 𝜆14 [9]. Fig.3 demonstrates the general model structure of Markov Chains for this disease process [10]. Fig. 3 general model structure for NAFLD Table 2 summarizes the liver biopsy findings in each time point of the follow-up period for each participant. For example, participant with ID=1, in the first year of the follow-up at t=0, he was in state 1. After one year at t=1, he was in 1 state. After another one year at t=2, he was in state 2. After one year at t=3, he was in state 2, then he stopped to visit the clinic. Participant with ID=2, in the first year of the follow-up at t=0, he was in state 1. After one year at t=1, he was in 1 state. After another one year at t=2, he was in state 2. After one year at t=3, he was in state 1. After one year at t=4, he did not show up or he did not visit the clinic. After another year at t=5, he was in state 2. After one year later at t=6, he did not show up or visit the clinic. After one year at t=7, he was in state 1 then he stopped to visit the clinic. And so on for other participants.
  • 12. Table 2: liver biopsy findings for each participant during each visit of the follow-up period time Patient ID time Patient ID 8 7 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 0 4 2 2 1 1 1 53 2 2 1 1 1 2 2 1 54 1 2 1 2 1 1 2 2 2 1 55 3 1 3 4 2 2 1 1 1 1 56 1 1 1 1 4 2 2 1 57 2 2 1 2 1 5 4 2 2 1 1 1 1 1 1 58 2 2 1 2 1 6 2 2 2 1 59 2 2 1 2 1 7 3 1 1 1 1 60 3 1 8 3 1 61 4 2 2 1 1 1 9 4 2 2 1 1 1 62 3 1 10 2 2 2 1 1 1 1 1 63 1 2 1 1 1 11 2 2 2 2 1 1 64 3 1 12 2 2 2 1 1 1 1 65 1 1 1 13 2 2 2 1 1 66 3 1 14 2 2 2 1 1 1 1 67 1 1 1 1 1 15 2 2 2 1 68 2 1 1 1 1 1 1 1 1 16 4 2 2 1 69 4 2 2 1 1 17 1 1 1 1 1 70 2 1 1 1 1 18 1 1 1 1 1 1 71 1 2 1 1 1 1 1 19 2 2 1 1 1 1 1 1 72 1 1 1 1 1 1 1 20 2 2 2 1 73 3 1 1 21 2 2 2 1 74 4 2 2 1 1 22 3 1 1 1 75 3 1 23 3 1 1 1 76 3 1 24 2 2 2 2 2 2 2 1 77 3 1 25 1 1 1 1 1 1 78 3 1 26 2 2 2 2 1 1 1 1 79 3 1 27 3 1 80 4 2 2 1 1 28 3 1 81 4 1 29 2 2 1 1 1 82 4 1 30 2 2 2 1 1 1 1 1 83 4 1 31 2 2 1 1 1 1 1 84 4 2 2 1 1 32 3 2 2 2 2 2 2 1 85 4 2 2 1 33 4 2 2 1 86 2 2 2 1 34 1 1 1 1 1 87 2 2 2 1 35 1 1 1 1 1 88 3 1 36 1 1 1 1 1 89 3 1 37 3 2 2 1 1 1 1 90 4 2 2 1 1 38 3 2 2 1 1 1 1 1 91 4 1 39 1 1 1 1 1 92 4 1 40 1 1 1 1 1 93 4 1 41 3 1 1 1 94 4 2 2 1 1 1 42 3 2 2 1 1 1 1 95 4 1 43 3 1 1 1 1 96 4 1 44 1 1 1 1 1 1 1 97 4 1 45 3 2 2 1 98 3 1 1 46 3 2 2 1 1 1 99 4 1 47 3 2 2 1 1 1 100 4 1 48 3 2 2 1 1 1 1 101 4 2 2 1 1 1 1 1 49 3 2 2 1 102 2 2 1 1 1 1 1 1 50 3 1 1 1 103 3 1 51 3 1 1 1 104 3 1 52
  • 13. time Patient ID time Patient ID 8 7 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 0 3 2 2 1 1 1 1 1 157 3 1 1 1 105 3 2 2 1 158 3 1 106 2 2 1 159 3 2 2 1 1 1 1 1 107 3 2 2 1 1 1 160 1 1 1 1 1 108 3 2 2 1 1 1 161 3 2 2 1 1 1 1 109 3 2 2 1 1 1 1 1 162 3 1 1 1 1 1 110 2 2 1 163 3 2 2 1 111 2 2 1 1 1 164 3 1 1 1 1 112 2 2 1 165 3 2 2 1 113 3 2 2 1 1 1 166 3 1 1 1 114 2 2 1 167 3 1 1 1 115 1 1 1 1 1 1 168 4 2 2 1 116 3 2 2 2 1 169 3 1 1 1 117 2 2 2 1 170 1 1 1 1 1 1 118 3 2 2 1 1 171 2 2 1 1 1 1 1 119 2 2 2 2 2 1 172 1 1 1 1 1 120 3 2 2 1 173 1 1 1 1 1 1 121 2 2 1 174 3 1 1 1 1 122 2 2 2 2 1 1 175 2 1 1 1 1 1 1 123 2 2 1 176 2 2 1 1 1 1 1 1 124 2 2 2 2 2 1 177 4 2 2 1 125 3 2 2 1 178 4 2 2 1 126 2 2 1 1 1 179 4 2 2 1 1 1 1 1 127 3 2 2 2 2 1 180 2 2 1 1 1 1 1 128 3 2 2 1 181 3 2 2 1 1 129 3 1 182 2 2 1 130 3 1 183 3 1 1 1 131 2 2 2 1 1 1 1 1 184 2 2 1 132 3 2 2 2 2 1 185 1 1 1 1 1 1 133 2 2 1 186 2 2 1 134 2 2 2 2 1 1 1 1 187 2 2 1 135 2 2 1 188 3 2 2 1 1 1 136 3 2 2 1 1 1 189 2 2 1 137 3 2 2 1 1 1 1 190 2 2 2 2 2 1 138 3 2 2 2 1 191 1 1 1 1 1 139 2 2 1 192 2 2 2 2 2 1 140 2 2 1 1 193 2 2 1 1 1 1 1 1 1 141 2 2 1 194 1 1 1 1 1 142 3 2 2 1 1 1 1 195 3 1 1 1 1 1 143 2 2 1 196 1 1 1 1 1 144 2 2 1 197 3 2 2 2 2 1 1 1 145 3 2 2 1 1 1 198 1 1 146 2 2 1 199 3 2 2 2 2 1 1 1 147 3 2 2 1 1 1 200 1 1 1 1 1 148 2 2 1 201 2 2 1 1 1 1 1 149 3 1 1 1 1 202 1 1 1 1 1 150 2 2 1 203 1 1 1 1 1 151 3 1 1 1 1 1 1 204 3 2 2 1 1 1 1 152 2 2 1 205 3 1 153 3 2 2 1 1 1 1 206 3 2 2 1 154 2 2 1 207 3 2 2 1 155 1 1 1 1 1 1 1 208 1 1 1 1 1 156
  • 14. time Patient ID time Patient ID 8 7 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 0 2 2 1 261 3 2 2 1 209 2 1 262 2 2 1 210 2 1 263 2 2 1 1 1 1 211 2 2 1 264 3 2 2 1 212 2 1 265 3 2 2 1 1 1 213 2 1 266 2 2 1 214 2 1 267 3 2 2 1 1 1 1 215 2 2 1 268 3 2 2 1 216 2 1 269 3 2 2 1 217 2 1 270 2 2 1 218 2 1 271 2 2 1 219 2 1 272 3 2 2 1 1 1 220 2 1 273 3 2 2 1 1 1 1 1 221 2 1 274 3 2 2 1 222 2 1 275 3 2 2 1 223 3 1 276 3 2 2 1 1 224 3 1 277 3 1 225 3 1 278 3 2 2 1 1 1 226 3 1 279 3 1 227 3 1 280 3 2 2 1 1 1 1 1 228 3 1 281 3 2 2 1 229 3 1 282 3 1 230 3 1 283 2 2 1 1 1 1 1 231 3 1 284 2 2 1 232 3 1 285 3 1 1 1 1 1 1 1 233 4 1 286 3 2 2 1 234 1 1 287 2 2 1 235 1 1 288 3 1 1 1 1 236 1 1 1 289 2 2 1 237 2 1 1 290 3 2 2 1 1 1 238 2 1 1 291 2 2 1 239 3 1 1 292 2 2 1 1 1 1 1 240 1 1 1 293 2 2 1 241 3 1 1 294 3 2 2 1 1 242 2 1 1 295 2 2 1 243 3 1 1 296 2 2 1 244 1 1 1 297 1 1 2 1 1 245 2 1 1 298 2 1 246 3 1 1 299 3 2 1 1 247 1 1 1 300 2 1 248 2 1 1 301 4 2 1 1 249 3 1 1 302 3 2 1 250 3 1 1 303 2 1 251 2 1 304 3 2 1 252 2 1 305 3 2 1 253 2 1 306 4 1 254 3 1 307 2 2 1 255 4 1 308 2 2 1 256 4 1 309 2 2 1 257 4 1 310 2 1 258 2 1 259 2 1 260
  • 15. 2. Experimental Design, Materials and Methods 2.1. Analysis of the counts during various lengths of observation period The interval between the visits was one year interval, two years interval and three years interval. The transition counts, among the different states, for each of these intervals were calculated using a MATLAB code (See supplementary materials). For example, during the time interval equals one year, the transitions from state 1 to state 1 (green arrow in table 2) were calculated, the transitions from state 1 to state 2 (red arrows in table 2) were calculated, transitions from state 1 to state 3 (blue arrows in table 2) were calculated, and the transitions from state 1 to state 4 (orange arrow in table 2) were calculated. And this was done for each transition among the states for each interval [11]. Tables 3, Fig. 4, and Fig. 5 demonstrate the transition counts distribution during the time interval Δ𝑡 = 1 year. Tables 4, Fig. 6, and Fig. 7 demonstrate the transition counts distribution during the time interval Δ𝑡 = 2 year. Tables 5, Fig. 8, and Fig. 9 demonstrate the transition counts distribution during the time interval Δ𝑡 = 3 year. Tables 6, Table 7, Fig. 10, and Fig. 11 demonstrate the transition counts distribution during the whole period of the follow-up. Table (3) demonstrates the observed counts of transitions during time interval ∆𝒕 = 𝟏 𝒚𝒆𝒂𝒓 To State 1 To State 2 To State 3 To State 4 Total counts Transitions out form State 1 330 163 45 12 550 Transitions out form State 2 5 185 45 15 250 Transitions out form State 3 0 0 0 0 0 Transitions out form State 4 0 0 0 0 0 Total 335 348 90 27 800
  • 16. Fig. 4 total marginal counts of transitions from state1 and state2 during time interval Δ𝑡 = 1 year Fig. 5 shows the transition counts from state 1 and state 2 to the other states during time interval Δ𝑡 = 1 year
  • 17. Table (4) demonstrates the observed counts of transitions during time interval ∆𝒕 = 𝟐 𝒚𝒆𝒂𝒓𝒔 To State 1 To State 2 To State 3 To State 4 Total counts Transitions out form State 1 70 30 10 1 111 Transitions out form State 2 2 20 13 4 39 Transitions out form State 3 0 0 0 0 0 Transitions out form State 4 0 0 0 0 0 Total 72 50 23 5 150 Fig.6 total marginal counts of transitions from state1 and state2 during time interval Δ𝑡 = 2 years Fig. 7 shows the transition counts from state 1 and state 2 to the other states during time interval Δ𝑡 = 2 years
  • 18. Table (5) demonstrates the observed counts of transitions during time interval ∆𝒕 = 𝟑 𝒚𝒆𝒂𝒓𝒔 To State 1 To State 2 To State 3 To State 4 Total counts Transitions out form State 1 21 8 7 3 39 Transitions out form State 2 1 6 3 1 11 Transitions out form State 3 0 0 0 0 0 Transitions out form State 4 0 0 0 0 0 Total 22 14 10 4 50 Fig.8 total marginal counts of transitions from state1 and state 2 during time interval Δ𝑡 = 3 years Fig. 9 shows the transition counts from state 1 and state 2 to the other states during time interval Δ𝑡 = 3 years
  • 19. Table (6) demonstrates the numbers of observed transitions among states of the NAFLD process during different time intervals ∆𝒕 = 𝟏, 𝟐, 𝟑 𝒚𝒆𝒂𝒓𝒔 ∆𝑡 Transitions among states (1,1) (1,2) (1,3) (1,4) (2,1) (2,2) (2,3) (2,4) 1 330 163 45 12 5 185 45 15 2 70 30 10 1 2 20 13 4 3 21 8 7 3 1 6 3 1 421 201 62 16 8 211 61 20 Table (7) demonstrates the total counts of transitions throughout the whole period of the follow-up (8 years) State 1 State 2 State 3 State 4 Total counts State 1 421 201 62 16 700 State 2 8 211 61 20 300 State 3 0 0 0 0 0 State 4 0 0 0 0 0 Total 429 412 123 36 1000 Fig.10 total marginal counts of transitions from state1 and state 2 during whole period of follow-up
  • 20. Fig. 11 shows the transition counts from state 1 and state 2 to the other states during the follow-up period. Fig. 12 shows the ratio of the transition counts in different time intervals. In ∆𝑡 = 1 year , the counts were 800 out of 1000 transitions . In ∆𝑡 = 2 year , the counts were 150 out of 1000 transitions. In ∆𝑡 = 3 year , the counts were 50 out of 1000 transitions. The ratio are 0.8 , 0.15 and 0.05 respectively .
  • 21. Fig . 12 shows the ratio of the transition counts in different time intervals 2.2. Steps to get the previously mentioned statistical indices 2.2.1. Estimation of transition rate matrix and the variance-covariance matrix Using the transition counts tables (Table3, Table 4, and Table5) to get the estimated rates and variance-covariance matrix in each interval [12]. Calculations in interval Δ𝑡 = 1 year: Table 8 shows the initial rates. Table 8: initial rates in the time interval Δ𝑡 = 1 year Initial rate 𝜆12 𝜆14 𝜇21 𝜆23 𝜆24 Calculation 330 550 = 0.3 12 550 = 0.022 5 250 = 0.02 45 250 = 0.18 15 250 = 0.06 Construct the initial Q transition rate matrix: 𝑄𝑖𝑛𝑖𝑡𝑖𝑎𝑙 = [ −0.322 0.02 0 0 0.3 −0.26 0 0 0.0 0.18 0 0 0.022 0.06 0 0 ]
  • 22. [ 𝜆12 𝜆14 𝜇21 𝜆23 𝜆24] = [ 0.3 0.022 0.02 0.18 0.06 ] Step 1: calculate the eigenvalues for this Qinitial matrix at Δ𝑡 = 1 : −0.3744 & − 0.2076 Step 2: calculate the partial differentiation of eigenvalue function with respect to specific theta or rate i.e. 𝜕 𝜕𝜃ℎ 𝑃𝑖𝑗(𝑡) = 𝑡 𝑒𝛬𝑡 𝑑 𝛬 to get the score function. (Substitute t=1) [ 𝑣1 𝑣2 𝑣3 𝑣4 𝑣5] = [ −0.712 −0.7269 −0.5488 −0.7733 −0.7733] Step 3: scale the above score function with a factor equals 4 ∗ (550) + 4 ∗ (250) = 3200 [ −2278.244 −2326.141 −1756.172 −2472.620 −2472.620] Step 4: multiply the scaled score function with the transposed scaled score function to get the Hessian matrix: 𝑀(𝜃) = 1 × 10+6 [ 5.1904 5.2995 4.0010 5.6378 5.6378 5.2995 5.4109 4.0851 5.7563 5.7563 4.0010 4.0851 3.0841 4.3459 4.3459 5.6378 5.7563 4.3459 6.1237 6.1237 5.6378 5.7563 4.3459 6.1237 6.1237] Step 5: scale the above hessian matrix with a factor equals 53096.45 𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃) = 1 × 10+11 [ 2.7559 2.8138 2.1244 2.9934 2.9934 2.8138 2.8730 2.1690 3.0564 3.0564 2.1244 2.1690 1.6376 2.3075 2.3075 2.9934 3.0564 2.3075 3.2515 3.2515 2.9934 3.0564 2.3075 3.2515 3.2515] Step 6: invert the scaled Hessian matrix: [𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃)]−1 = 1 × 10−12 [ 0.1454 0.1484 0.1120 0.1579 0.1579 0.1484 0.1515 0.1144 0.1612 0.1612 0.1120 0.1144 0.0864 0.1217 0.1217 0.1579 0.1612 0.1217 0.1715 0.1715 0.1579 0.1612 0.1217 0.1715 0.1715]
  • 23. Step 7: multiply the inverted scaled Hessian matrix with the scaled score function 1 × 10−8 [ −0.1655 −0.1689 −0.1275 −0.1797 −0.1797] Step 8: apply Quasi-Newton formula: 𝛉 ⃗ ⃗ 1 = 𝛉 ⃗ ⃗ 0 + [𝑀(𝛉 ⃗ ⃗ 0)] −1 𝑆(𝛉 ⃗ ⃗ 𝟎) i.e. (add the initial rate values in vector form to the above calculated vector) 𝑡ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑟𝑎𝑡𝑒𝑠 𝑑𝑢𝑟𝑖𝑛𝑔 Δ𝑡 = 1 𝑦𝑒𝑎𝑟 = [ 𝜆 ̂12 𝜆 ̂14 𝜇̂21 𝜆 ̂23 𝜆 ̂24] = [ 0.3 0.022 0.02 0.18 0.06 ] Calculations in interval Δ𝑡 = 2 year: Table 9 shows the initial rates. Table 9: the initial rates in the time interval Δ𝑡 = 2 year Initial rate 𝜆12 𝜆14 𝜇21 𝜆23 𝜆24 Calculation 30 111 = 0.27 1 111 = 0.009 2 39 = 0.05 13 39 = 0.333 4 39 = 0.103 Construct the initial Q transition rate matrix: 𝑄𝑖𝑛𝑖𝑡𝑖𝑎𝑙 = [ −0.279 0.05 0 0 0.27 −0.486 0 0 0.0 0.333 0 0 0.009 0.103 0 0 ] [ 𝜆12 𝜆14 𝜇21 𝜆23 𝜆24] = [ 0.27 0.009 0.05 0.333 0.103] Step 1: calculate the eigenvalues for this Qinitial matrix at Δ𝑡 = 2 : −0.2269 & − 0.5381 Step 2: calculate the partial differentiation of eigenvalue function with respect to specific theta or rate i.e. 𝜕 𝜕𝜃ℎ 𝑃𝑖𝑗(𝑡) = 𝑡 𝑒𝛬𝑡 𝑑 𝛬 to get the score function. (Substitute t=2) [ 𝑣1 𝑣2 𝑣3 𝑣4 𝑣5] = [ −1.0773 −1.1719 −0.2696 −0.7803 −0.7803]
  • 24. Step 3: scale the above score function with a factor equals 4 ∗ (111) + 4 ∗ (39) = 600 [ −646.3779 −703.1237 −161.7689 −468.1962 −468.1962] Step 4: multiply the scaled score function with the transposed scaled score function to get the Hessian matrix: 𝑀(𝜃) = 1 × 10+5 [ 4.178 4.5448 1.0456 3.0263 3.0263 4.5448 4.9438 1.1374 3.2920 3.2920 1.0456 1.1374 0.2617 0.7574 0.7574 3.0263 3.2920 0.7574 2.1921 2.1921 3.0263 3.2920 0.7574 2.1921 2.1921] Step 5: scale the above hessian matrix with a factor equals 15476.614 𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃) = 1 × 10+9 [ 6.4651 7.0327 1.6180 4.6829 4.6829 7.0327 7.6501 1.7601 5.0940 5.0940 1.6180 1.7601 0.4049 1.1720 1.1720 4.6829 5.0940 1.1720 3.3920 3.3920 4.6829 5.0940 1.1720 3.3920 3.3920] Step 6: invert the scaled Hessian matrix: [𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃)]−1 = 1 × 10−10 [ 0.1424 0.1550 0.0356 0.1032 0.1032 0.1550 0.1686 0.0388 0.1122 0.1122 0.0356 0.0388 0.0089 0.0258 0.0258 0.1032 0.1122 0.0258 0.0747 0.0747 0.1032 0.1122 0.0258 0.0747 0.0747] Step 7: multiply the inverted scaled Hessian matrix with the scaled score function: 1 × 10−7 [ −0.3034 −0.3300 −0.0759 −0.2198 −0.2198] Step 8: apply Quasi-Newton formula: 𝛉 ⃗ ⃗ 1 = 𝛉 ⃗ ⃗ 0 + [𝑀(𝛉 ⃗ ⃗ 0)] −1 𝑆(𝛉 ⃗ ⃗ 𝟎) i.e. (add the initial rate values in vector form to the above calculated vector) 𝑡ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑟𝑎𝑡𝑒𝑠 𝑑𝑢𝑟𝑖𝑛𝑔 Δ𝑡 = 2 𝑦𝑒𝑎𝑟 = [ 𝜆 ̂12 𝜆 ̂14 𝜇̂21 𝜆 ̂23 𝜆 ̂24] = [ 0.27 0.009 0.05 0.333 0.103]
  • 25. Calculations in interval Δ𝑡 = 3 year: Table 10 shows the initial rates. Table 10: initial rates during the time interval Δ𝑡 = 3 year Initial rate 𝜆12 𝜆14 𝜇21 𝜆23 𝜆24 Calculation 8 39 = 0.205 3 39 = 0.077 1 11 = 0.091 3 11 = 0.273 1 11 = 0.091 Construct the initial Q transition rate matrix: 𝑄𝑖𝑛𝑖𝑡𝑖𝑎𝑙 = [ −0.282 0.091 0 0 0.205 −0.455 0 0 0.0 0.273 0 0 0.077 0.091 0 0 ] [ 𝜆12 𝜆14 𝜇21 𝜆23 𝜆24]𝑖𝑛𝑖𝑡𝑖𝑎𝑙 = [ 0.205 0.077 0.091 0.273 0.091] Step 1: calculate the eigenvalues for this Qinitial matrix at Δ𝑡 = 3 : −0.2068 & − 0.5302 Step 2: calculate the partial differentiation of eigenvalue function with respect to specific theta or rate i.e. 𝜕 𝜕𝜃ℎ 𝑃𝑖𝑗(𝑡) = 𝑡 𝑒𝛬𝑡 𝑑 𝛬 to get the score function. (Substitute t=3) [ 𝑣1 𝑣2 𝑣3 𝑣4 𝑣5] = [ −219.663 −276.039 −41.8608 −168.862 −168.862] Step 3: scale the above score function with a factor equals 4 ∗ (39) + 4 ∗ (11) = 200 [ −1.0983 −1.3802 −0.2093 −0.8443 −0.8443] Step 4: multiply the scaled score function with the transposed scaled score function to get the Hessian matrix: 𝑀(𝜃) = 1 × 10+4 [ 4.8252 6.0636 0.9195 3.7093 3.7093 6.0636 7.6198 1.1555 4.6613 4.6613 0.9195 1.1555 0.1752 0.7069 0.7069 3.7093 4.6613 0.7069 2.8514 2.8514 3.7093 4.6613 0.7069 2.8514 2.8514]
  • 26. Step 5: scale the above hessian matrix with a factor equals 1289.338 𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃) = 1 × 10+7 [ 6.2213 7.8180 1.1856 4.7825 4.7825 7.8180 9.8245 1.4899 6.0099 6.0099 1.1856 1.4899 0.2259 0.9114 0.9114 4.7825 6.0099 0.9114 3.6765 3.6765 4.7825 6.0099 0.9114 3.6765 3.6765] Step 6: invert the scaled Hessian matrix: [𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃)]−1 = 1 × 10−8 [ 0.1115 0.1401 0.0212 0.0857 0.0857 0.1401 0.1760 0.0267 0.1077 0.1077 0.0212 0.0267 0.0040 0.0163 0.0163 0.0857 0.1077 0.0163 0.0659 0.0659 0.0857 0.1077 0.0163 0.0659 0.0659] Step 7: multiply the inverted scaled Hessian matrix with the scaled score function: 1 × 10−5 [ −0.0930 −0.1168 −0.0177 −0.0715 −0.0715] Step 8: apply Quasi-Newton formula: 𝛉 ⃗ ⃗ 1 = 𝛉 ⃗ ⃗ 0 + [𝑀(𝛉 ⃗ ⃗ 0)] −1 𝑆(𝛉 ⃗ ⃗ 𝟎) i.e. (add the initial rate values in vector form to the above calculated vector) 𝑡ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑟𝑎𝑡𝑒𝑠 𝑑𝑢𝑟𝑖𝑛𝑔 Δ𝑡 = 3 𝑦𝑒𝑎𝑟 = [ 𝜆 ̂12 𝜆 ̂14 𝜇̂21 𝜆 ̂23 𝜆 ̂24] = [ 0.205 0.077 0.091 0.273 0.091] To get the final rate and variance-covariance matrix, the estimated rate vector in each interval is weighted according to the contribution of the counts of transitions in this interval and summed (. 8) [ . 3 . 022 . 02 . 18 . 06 ] + (. 15) [ . 27 . 009 . 05 . 333 . 103] + (. 05) [ . 205 . 077 . 091 . 273 . 091] = [ . 2907 . 0228 . 0280 . 2076 . 068 ] 𝑇ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑓𝑖𝑛𝑎𝑙 𝑄 𝑚𝑎𝑡𝑟𝑖𝑥 = 𝑄 ̂ = [ −0.3135 0.0280 0 0 0.2907 −0.3036 0 0 0.0 0.2076 0 0 0.0228 0.068 0 0 ] 𝐹𝑜𝑟 𝑡ℎ𝑖𝑠 𝑄 ̂ 𝑚𝑎𝑡𝑟𝑖𝑥 ∶ 𝑤1 = −0.3989 𝑎𝑛𝑑 𝑤2 = −0.2182
  • 27. 𝐹𝑜𝑟 𝑡ℎ𝑖𝑠 𝑄 ̂ 𝑚𝑎𝑡𝑟𝑖𝑥 ∶ 𝑡ℎ𝑒 𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒𝑠 𝑎𝑟𝑒 { −0.3989, −0.2182, 0, 0 } Also the weighted sum of the inversed scaled Hessian matrix should be used as the variance - covariance matrix of parameter 𝜃 [𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃)]−1 = 1 × 10−10 [ . 5788 . 7237 . 1116 . 4440 . 4440 . 7237 . 9055 . 1394 . 5554 . 5554 . 1116 . 1394 . 0216 . 0856 . 0856 . 4440 . 5554 . 0856 . 3407 . 3407 . 4440 . 5554 . 0856 . 3407 . 3407] 2.2.2. Estimation of transition probability matrix Solving the forward Kolmogorov differential equations (in this dataset with differential operator) yielded the eight PDFs or the empirical transition probability matrix. A MATLAB code illustrates these calculations. [13] (See supplementary materials.) This matrix is exactly equal to the exponentiation of the estimated transition rate matrix which supports time homogeneity of the disease process. Transition probability matrix after one year was (by either way): 𝑃(𝑡 = 1) = [ . 7339 . 0206 0 0 . 2138 . 7412 0 0 . 0246 . 1793 1 0 . 0277 . 0590 0 1 ] 2.2.3. Mean Sojourn time in each state and its variance 𝐸(𝑠1) = 1 𝜆12+𝜆14 = 1 .2907+.0228 = 3.1898𝑦𝑒𝑎𝑟. Average time spent in S1 is about 3 years. 𝐸(𝑠2) = 1 𝜇21 + 𝜆23 + 𝜆24 = 1 . 0280 + .2076 + .068 = 3.2938 𝑦𝑒𝑎𝑟 , Average time spent in S2 is about 3 years. 𝑣𝑎𝑟(𝑠1) = 1 (. 2907 + .0228)4 [−1 −1 −1 −1 −1][M(θ)]−1|θ=θ ̂ [ −1 −1 −1 −1 −1] = 9.4815 × 10−8 𝑣𝑎𝑟(𝑠2) = 1 (. 02805 + .2076 + .068)4 [−1 −1 −1 −1 −1][M(θ)]−1|θ=θ ̂ [ −1 −1 −1 −1 −1] = 1.0780 × 10−7
  • 28. 2.2.3. State probability distribution at specific time point Once the rate matrix is obtained, these estimated rates are substituted into the calculated PDFs’ from the solved differential equations to get the state probability distribution at any point in time as well as the expected number of patients. [14] Studying a cohort of 3000 patients with the initial distribution [0.7 0.3 0 0], and initial numbers of patients in each state are [2100 900 0 0]. At 1 year, the state probability distribution is approximate: 𝑃(1) = [. 7 . 3 0 0] [ . 7339 . 0206 0 0 . 2138 . 7412 0 0 . 0246 . 1793 1 0 . 0277 . 0590 0 1 ] = [. 5199 . 3720 . 0710 . 0371] And the expected numbers of patients in each state is: [2100 900 0 0] [ . 7339 . 0206 0 0 . 2138 . 7412 0 0 . 0246 . 1793 1 0 . 0277 . 0590 0 1 ] = [1559 1117 214 110] At 20 years, the state probability distribution is approximate: 𝑃(20) = [. 7 . 3 0 0] [ . 0062 . 0019 0 0 . 0199 . 0069 0 0 . 6742 . 7413 1 0 . 2997 . 2499 0 1 ] = [. 0049 . 0160 . 6943 . 2848] And the expected numbers of patients in each state is: [2100 900 0 0] [ . 0062 . 0019 0 0 . 0199 . 0069 0 0 . 6742 . 7413 1 0 . 2997 . 2499 0 1 ] = [15 45 2085 855] At 60 years, the state probability distribution is approximate: 𝑃(60) = [. 7 . 3 0 0] [ 0 0 0 0 0 0 0 0 . 7 . 75 1 0 . 3 . 25 0 1 ] = [0 0 . 7097 . 2903] And the expected numbers of patients in each state is: [2100 900 0 0] [ 0 0 0 0 0 0 0 0 . 7 . 75 1 0 . 3 . 25 0 1 ] = [0 0 2145 855]
  • 29. 2.2.4. Stationary probability distribution This distribution is attained at 42 years. The asymptotic variance-covariance matrix is calculated as follows At 42 years and more, the state probability distribution is [0 0 0.7097 0.2903] Step 1: the transition probability matrix at ≥ 42 years where all the participants are either in S3 or S4 ( absorbing, death state) will be : 𝑃(𝑡 = 42) = [ 0 0 0 0 0 0 0 0 . 7 . 75 1 0 . 3 . 25 0 1 ] Step 2: partially differentiate the transposed rate matrix with respect to each theta, the result is a vector of fives ones [1 1 1 1 1] Step 3: multiply the vector of state probability distribution attained at 42 years ( i.e the stationary probability distribution) with the row vector of ones to get 𝐶(𝜃) : 𝐶(𝜃) = [ 0 0 0.7 0.3 ] [1 1 1 1 1] = [ 0 0 0 0 0 0 0.7 0.7 0.7 0 0 0 0 0.7 0.7 0.3 0.3 0.3 0.3 0.3 ] Step 4: calculate the pseudo-inverse of the transposed estimated rate matrix( final estimates rate matrix) via singular value decomposition: [𝑄′]+ = [ −2.4852 −1.4878 0 0 0.7142 −1.6733 0 0 1.1891 2.2828 0 0 0.5819 0.8783 0 0 ] Then multiply this matrix with (-1) −[𝑄′]+ = [ 2.4852 1.4878 0 0 −0.7142 1.6733 0 0 −1.1891 −2.2828 0 0 −0.5819 −0.8783 0 0 ] Step 5: Multiply −[𝑄′]+ with 𝐶(𝜃) to get 𝐴(𝜃) 𝐴(𝜃) = −[𝑄′]+ 𝐶(𝜃) = [ −1.0128 −1.875 0 0 −1.0128 −1.875 0 0 −1.0128 −1.875 0 0 −1.0128 −1.875 0 0 −1.0128 −1.875 0 0 ]
  • 30. Step 6: apply multivariate delta method to get the variance-covariance matrix 𝐴(𝜃)[𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃)]−1[𝐴(𝜃)]𝑇 = 1 × 10−8 [ 0.0939 0.1739 0 0 0.1739 0.3220 0 0 0 0 0 0 0 0 0 0 ] [𝑠𝑐𝑎𝑙𝑒𝑑 𝑀(𝜃)]−1 is the inversed Hessian scaled matrix. 2.2.5. Life Expectancy of the Patient (mean time to absorption) Step 1: specify the Q rate matrix: (2 transient states and 2 absorbing states) .Partition the Q matrix as follows: 𝑄 ̂ = [ −(𝜆12 + 𝜆14) 𝜇21 𝜆12 −(𝜇21 + 𝜆23 + 𝜆24) 0 0 0 0 0 𝜆23 𝜆14 𝜆24 0 0 0 0 ] = [ 𝐵 𝐴 0 0 ] 𝑠𝑜 𝐴 = 𝐵𝑍 𝑄 ̂ = [ −0.3135 0.0280 0 0 0.2907 −0.3036 0 0 0.0 0.2076 0 0 0.0228 0.068 0 0 ] Step 2: calculate Z matrix. Step 3: get the inverse of the B matrix Step 4: apply the formula of mean time to absorption 𝐸(𝜏𝑘) = (−1) 𝑑𝑓∗ 𝑘 (𝑠) 𝑑𝑠 | 𝑠=0 = 𝑃(0)[𝑠𝐼 − 𝐵]−2 𝐴|𝑠=0 = 𝑃(0)[𝐵]−1 𝑍 = [𝐵]−1 𝑍 𝐸(𝜏𝑖𝑘) = [𝐵]−1 𝑍 = [ −3.48691 −3.33935 −0.32212 −3.60174 ] [ −0.69325 −0.30675 −0.74772 −0.25228 ] = [ 4.9159 1.9121 2.9163 1.0072 ] 𝐸(𝜏13) = 4.9159 𝑦𝑒𝑎𝑟𝑠, 𝐸(𝜏14) = 1.9121 𝑦𝑒𝑎𝑟𝑠 . 𝐸(𝜏23) = 2.9163 𝑦𝑒𝑎𝑟𝑠, 𝐸(𝜏24) = 1.0072 𝑦𝑒𝑎𝑟𝑠 . In this dataset, the average time from S1 to S3 is about 5 years, from S1 to S4, it is about 2 years, from S2 to S3, it is about 3 years and from S2 to S4, it is about 1 year. According to the American Association for the study of Liver Disease [15] , the most common cause of death in patients with NAFLD is cardiovascular disease (CVD), independent of other metabolic comorbidities, whereas the liver-related mortality is the third most common cause of death among patients with NAFLD. Cancer-related mortality is among the top three causes of death in subjects with NAFLD. As shown
  • 31. from the calculations; the mean time to absorption can be classified into : mean time from state 1( susceptible individuals with risk factors) to state 3( liver-related mortality) is approximately 5 years, while the mean time from state 1 to state 4 ( for example CVD as an example for causes of death other than liver-related mortality causes) is approximately 2 years .The mean time from state 2 (NAFLD) to state 3 ( liver-related mortality ) is approximately 3 years while it decreases to approximately 1 year from state 2 ( NAFLD) to state 4 (causes unrelated to liver complications). This dataset have some remarks to be mentioned From Table 7, the observed initial rates or the initial Q matrix is as calculated below: 𝑄 = [ −0.31 0.027 0 0 0.287 −0.297 0 0 0 0.203 0 0 0.023 0.067 0 0 ] The estimated rates calculated using the MLE and Quasi-Newton nearly reach equality with these initial observed rates .Therefore, the advantage of this approach is that the estimated rates can be obtained from the first iteration. However, the limitation with this method is the degree of the polynomial representing the eigenvalue function as a function of rates. The 2nd degree polynomial is present in a formula that is easily differentiated with respects to each of the rates composing this polynomial. In addition, if the degree of the polynomial is third or fourth degree, the presence of a formula to differentiate this eigenvalue function helps to use this method of MLE. The higher degrees of polynomials lacking the presence of such well-formed formula to be differentiated make it difficult to use this method. 2.3. A continuous time Markov chains (CTMCs) should be tested for time homogeneity and Markovian property. 2.3.1. Test the time homogeneity hypothesis The empirical transition probability matrix is exactly as the exponentiation of the estimated rate matrix and this supports time homogeneity [16] . If the process is treated as discrete-time Markov chains and considered to be embedded into continuous time Markov chain, the transition probability matrix in discrete time, according to the data collected in Table 7, is
  • 32. 𝑃 = [ 0.601 0.027 0 0 0.287 0.703 0 0 0.089 0.203 1 0 0.023 0.067 0 1 ] Taking the log for this transition probability matrix to obtain the Q for the corresponding CTMC: 𝑄 ̂ = [ −0.5189 0.0418 0 0 0.4438 −0.3612 0 0 0.0626 0.2400 0 0 0.0125 0.0794 0 0 ] This Q matrix fulfills the criteria for the Q transition rate matrix which are: 1. ∑ 𝑞𝑖𝑗(𝑡) = 0 𝑆 2. 𝑞𝑖𝑗(𝑡) ≥ 0 , 𝑖 ≠ 𝑗 3. − ∑ 𝑞𝑖𝑗(𝑡) 𝑆 = 𝑞𝑖𝑖 , 𝑖 = 𝑗 So there is no embedding problem and this support time homogeneity of the disease process.[17] 2.3.2. Test the Markovian hypothesis The difference between the empirical transition matrix 𝑃0𝑡 (𝑒) calculated over the time interval [0,t] and the product of the half-period matrices, 𝑃 0 𝑡 2 (𝑒) and 𝑃𝑡 2 𝑡 (𝑒) , is calculated and compared using the {𝐿2 − 𝑛𝑜𝑟𝑚} as defined in the following equations (1) and (2). If the difference approaches zero, the process is Markovian. [18] ‖𝑍‖ = 𝜌𝑚𝑎𝑥 (𝑍) (1) the {𝐿2 − 𝑛𝑜𝑟𝑚} of the transition matrix (Z) is the maximum singular value of Z . The difference is 𝑍 = 𝑃 0𝑡 (𝑒) − [ 𝑃 0 𝑡 2 (𝑒) × 𝑃𝑡 2 𝑡 (𝑒) ] (2) A MATLAB code illustrates the above concepts. (See supplementary materials) The results obtained from running this code is
  • 33. [ 0.7339 0.0206 0 0 0.2138 0.7412 0 0 0.0246 0.1793 1 0 0.0277 0.059 0 1 ] − [ 0.8558 0.0120 0 0 0.1246 0.8600 0 0 0.0068 0.0963 1 0 0.0128 0.0316 0 1 ] × [ 0.8558 0.0120 0 0 0.1246 0.8600 0 0 0.0068 0.0963 1 0 0.0128 0.0316 0 1 ] = 1 × 10−3 × [ 0.0112 0.0104 0 0 0.0113 0.1048 0 0 −0.0184 0.1004 0 0 0.0084 0.0704 0 0 ] 𝑇ℎ𝑒 𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒𝑠 𝑓𝑜𝑟 𝑍 = 1 × 10−3 × [ 0.0099 0.106 0 0 ] The maximum singular value for Z is zero. So the process obeys the Markovian property or the Chapman-Kolmogorov equations. 2.4. Goodness of Fit for the Markov chains To calculate goodness of fit for multistate model used in this simple model, it is like the procedure used in contingency table, and it is calculated in each interval then sum: Step 1 : 𝐻0 = 𝑓𝑢𝑡𝑢𝑟𝑒 𝑠𝑡𝑎𝑡𝑒 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑑𝑒𝑝𝑒𝑛𝑑 𝑜𝑛 𝑡ℎ𝑒 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑠𝑡𝑎𝑡𝑒. 𝐻1 = 𝑓𝑢𝑡𝑢𝑟𝑒 𝑠𝑡𝑎𝑡𝑒 𝑑𝑜𝑒𝑠 𝑑𝑒𝑝𝑒𝑛𝑑 𝑜𝑛 𝑡ℎ𝑒 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑠𝑡𝑎𝑡𝑒 Step 2: calculate the 𝑃𝑖𝑗(∆𝑡 = 1) = [ . 7338 . 0206 0 0 . 2139 . 7411 0 0 . 0247 . 1793 1 0 . 0277 . 059 0 1 ] Step 3: calculate the expected counts in this interval by multiplying each row in the probability matrix with the corresponding total marginal counts in the observed transition counts matrix in the same interval to get the expected counts as shown in Table 11. Total marginal counts for S1 are 550 and for S2 are 250. The observed counts are shown in Table 3 for this time interval.
  • 34. Table 11 : the expected counts during the ∆𝑡 = 1 year State 1 State 2 State3 State4 total State1 403.645 117.59 13.53 15.235 550. State2 5.15 185.3 44.825 14.75 250.025 State3 0 0 0 0 0 State4 0 0 0 0 0 Step 4: apply ∑ ∑ (𝑂𝑖𝑗−𝐸𝑖𝑗) 2 𝐸𝑖𝑗 4 𝑗=1 4 𝑖=1 = 104.866~𝜒(4−1)(4−1)(.05) 2 The same steps are used for the observed transition counts in the ∆𝑡 = 2 𝑎𝑛𝑑 ∆𝑡 = 3 with the following results 𝑃𝑖𝑗(∆𝑡 = 2) = [ . 543 . 0304 0 0 . 3154 . 5537 0 0 . 0811 . 3126 1 0 . 0606 . 1033 0 1 ] Table 12 : the expected counts during the ∆𝑡 = 2 years State 1 State 2 State3 State4 total State1 60.273 35.0094 9.0021 6.7266 111 State2 1.1856 21.5943 12.1914 4.0287 39 State3 0 0 0 0 0 State4 0 0 0 0 0 ∑ ∑ (𝑂𝑖𝑗 − 𝐸𝑖𝑗) 2 𝐸𝑖𝑗 4 𝑗=1 4 𝑖=1 = 8.003~𝜒(4−1)(4−1)(.05) 2 The same steps are used for the observed transition counts in ∆𝑡 = 3 with the following results: 𝑃𝑖𝑗(∆𝑡 = 3) = [ . 405 . 0337 0 0 . 3498 . 4169 0 0 . 151 . 4127 1 0 . 0943 . 1368 0 1 ] Table 13: The expected counts during∆𝑡 = 3 years State 1 State 2 State3 State4 total State1 15.795 13.6422 5.889 3.6777 39 State2 .3707 4.5859 4.5397 1.5048 11 State3 0 0 0 0 0 State4 0 0 0 0 0
  • 35. ∑ ∑ (𝑂𝑖𝑗 − 𝐸𝑖𝑗) 2 𝐸𝑖𝑗 4 𝑗=1 4 𝑖=1 = 6.579~𝜒(4−1)(4−1)(.05) 2 Step 5: sum up the above results to get ∑ ∑ ∑ (𝑂𝑖𝑗𝑙 − 𝐸𝑖𝑗𝑙) 2 𝐸𝑖𝑗𝑙 𝑡=3 𝑙=1 4 𝑗=1 4 𝑖=1 = 119.449~𝜒(𝑑𝑓=27)(.05) 2 So from the above results the null hypothesis is rejected while the alternative hypothesis is accepted and the model fits the data that is to mean the future state depends on the current state with the estimated transition rate and probability matrices as obtained. Supplementary materials The supplementary materials contain a file with a theoretical background for the mathematical and statistical calculations. Excel file for the data (Table 2) . MATLAB codes for all the calculation . Ethics approval and consent to participate Not applicable. Consent for publication Not applicable Availability of data and material Not applicable. Data sharing not applicable to this article as no datasets were generated or analyzed during the current study. Competing interests The author declares that I have no competing interests. Funding No funding resource. No funding roles in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript are declared Authors’ contribution I am the author who has carried the mathematical analysis as well as applying these mathematical statistical concepts on the hypothetical example.
  • 36. Acknowledgement Not applicable Declaration of competing interest The author declare that they have no known competing financial interests or personal relationships which have could be perceived to have influenced the work reported in this article. CRediT author Statement Attia IM: conceptualization, formal analysis, data generation and creation, Methodology, software computation, writing, review, and editing. ORiCD= 0000-0002-7333-9713 References [1] Z. Younossi et al., “Global burden of NAFLD and NASH: trends, predictions, risk factors and prevention,” Nat Rev Gastroenterol Hepatol, vol. 15, no. 1, pp. 11–20, Jan. 2018, doi: 10.1038/nrgastro.2017.109. [2] M. Eslam et al., “MAFLD: A Consensus-Driven Proposed Nomenclature for Metabolic Associated Fatty Liver Disease,” Gastroenterology, vol. 158, no. 7, pp. 1999-2014.e1, May 2020, doi: 10.1053/j.gastro.2019.11.312. [3] P. L. Huang, “A comprehensive definition for metabolic syndrome,” Disease Models & Mechanisms, vol. 2, no. 5–6, pp. 231–237, Apr. 2009, doi: 10.1242/dmm.001180. [4] H. Tilg and M. Effenberger, “From NAFLD to MAFLD: when pathophysiology succeeds,” Nat Rev Gastroenterol Hepatol, vol. 17, no. 7, Art. no. 7, Jul. 2020, doi: 10.1038/s41575-020-0316-6. [5] A. De and A. Duseja, “Natural History of Simple Steatosis or Nonalcoholic Fatty Liver,” Journal of Clinical and Experimental Hepatology, vol. 10, no. 3, pp. 255–262, May 2020, doi: 10.1016/j.jceh.2019.09.005. [6] P. Bedossa et al., “Histopathological algorithm and scoring system for evaluation of liver lesions in morbidly obese patients,” Hepatology, vol. 56, no. 5, pp. 1751–1759, 2012, doi: 10.1002/hep.25889. [7] D. E. Kleiner et al., “Design and validation of a histological scoring system for nonalcoholic fatty liver disease,” Hepatology, vol. 41, no. 6, pp. 1313–1321, 2005, doi: 10.1002/hep.20701.
  • 37. [8] L. J. S. Allen, An Introduction to Stochastic Processes with Applications to Biology, 2nd edition. Boca Raton, FL: Chapman and Hall/CRC, 2010. [9] Z. M. Younossi et al., “The economic and clinical burden of nonalcoholic fatty liver disease in the United States and Europe,” Hepatology, vol. 64, no. 5, pp. 1577– 1586, 2016, doi: 10.1002/hep.28785. [10] Z. M. Younossi et al., “Economic and Clinical Burden of Nonalcoholic Steatohepatitis in Patients With Type 2 Diabetes in the U.S,” Diabetes Care, vol. 43, no. 2, pp. 283–289, Feb. 2020, doi: 10.2337/dc19-1113. [11] J. H. Klotz and L. D. Sharples, “Estimation for a Markov Heart Transplant Model,” Journal of the Royal Statistical Society: Series D (The Statistician), vol. 43, no. 3, pp. 431–438, 1994, doi: 10.2307/2348579. [12] J. D. Kalbfleisch and J. F. Lawless, “The Analysis of Panel Data under a Markov Assumption,” Journal of the American Statistical Association, vol. 80, no. 392, pp. 863–871, Dec. 1985, doi: 10.1080/01621459.1985.10478195. [13] C. G. Cassandras and S. Lafortune, Eds., “Introduction to Discrete-Event Simulation,” in Introduction to Discrete Event Systems, Boston, MA: Springer US, 2008, pp. 557–615. doi: 10.1007/978-0-387-68612-7_10. [14] C. L. Chiang, “Introduction to stochastic processes in biostatistics.,” 1968. doi: 10.2307/2986707. [15] N. Chalasani et al., “The diagnosis and management of nonalcoholic fatty liver disease: Practice guidance from the American Association for the Study of Liver Diseases,” Hepatology, vol. 67, no. 1, pp. 328–357, 2018, doi: 10.1002/hep.29367. [16] R. B. Israel, J. S. Rosenthal, and J. Z. Wei, “Finding Generators for Markov Chains via Empirical Transition Matrices, with Applications to Credit Ratings,” Mathematical Finance, vol. 11, no. 2, pp. 245–265, 2001, doi: 10.1111/1467- 9965.00114. [17] K. L. Verbyla, V. B. Yap, A. Pahwa, Y. Shao, and G. A. Huttley, “The embedding problem for markov models of nucleotide substitution,” PLoS One, vol. 8, no. 7, p. e69187, 2013, doi: 10.1371/journal.pone.0069187. [18] P. Lencastre, F. Raischel, P. Lind, and T. Rogers, “Are credit ratings time- homogeneous and Markov?,” Mar. 2014.
  • 38. Supplementary Files This is a list of supplementary les associated with this preprint. Click to download. MATLABcodetransitioncountsineachinterval.pdf MATLABcodeforcalculatethe nalratevectoranditsvariance.pdf MATLABcodeforcalculationoftheratematrixatallinterval.pdf MATLABcodeforthecalculationoftheprobabilitymatrixat rstyear.pdf MATLABcodetocalculatetheEstimatedMeanSojournTimeandLifeExpectancy.pdf MATLABcodetocalculateTheVarianceOfTheStationaryDistribution.pdf theoreticalsupplementarymaterials.pdf