Exploiting Big Data in Time Series Forecasting: A Cross-Sectional ApproachMolham Al-Maleh
This seminar summarizes the published research: Exploiting Big Data in Time Series Forecasting: A Cross-Sectional Approach.
It presents a break with the traditional forecasting in two points. First, taking only as much data as necessary in order to create accurate forecasts instead of whole histories.
Second, abandon the concept of one model per time series and focus on modeling whole sets of time series.
this approach is called cross-sectional forecasting.
Exploiting Big Data in Time Series Forecasting: A Cross-Sectional ApproachMolham Al-Maleh
This seminar summarizes the published research: Exploiting Big Data in Time Series Forecasting: A Cross-Sectional Approach.
It presents a break with the traditional forecasting in two points. First, taking only as much data as necessary in order to create accurate forecasts instead of whole histories.
Second, abandon the concept of one model per time series and focus on modeling whole sets of time series.
this approach is called cross-sectional forecasting.
Data Trend Analysis by Assigning Polynomial Function For Given Data SetIJCERT
This paper aims at explaining the method of creating a polynomial equation out of the given data set which can be used as a representation of the data itself and can be used to run aggregation against itself to find the results. This approach uses least-squares technique to construct a model of data and fit to a polynomial. Differential calculus technique is used on this equation to generate the aggregated results that represents the original data set.
Data Structures and Algorithms (DSA) Tutorial for Beginners - Learn Data Structures and Algorithm using c, C++ and Java in simple and easy steps starting from basic to advanced concepts with examples
Parameter Estimation for the Weibul distribution model Using Least-Squares Me...IJMERJOURNAL
Abstract: We find Survival rate estimates, parameter estimates for the Weibulldistribution model using least-squares estimation method for the case when partial derivatives were not available, the Simplex optimization Methods (Nelder and Mead, and Hooke and Jeeves were used, and for the case when first partial derivatives were available, the Quasi – Newton Methods(Davidon-Fletcher-Powel (DFP) and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization methods)were applied.The medical data sets of 21 Leukemiacancer patients with time span of 35 weeks were used.
2018 Global Azure Bootcamp Azure Machine Learning for neural networksSetu Chokshi
This was the introduction session done for the 2018 Global Azure Bootcamp to get the users started with neural networks on Azure Machine Learning Studio. This gives them the initial introduction on how to develop and write the neural networks. We started with writing LeNet architecture on Azure Machine Learning studio to identify handwritten digits and then moved on to cats and dogs.
This was also the presented in the first workshop of my meetup
Microsoft Ai, ML Community which can be reached here
https://www.meetup.com/Microsoft-AI-ML-Community/
Data Trend Analysis by Assigning Polynomial Function For Given Data SetIJCERT
This paper aims at explaining the method of creating a polynomial equation out of the given data set which can be used as a representation of the data itself and can be used to run aggregation against itself to find the results. This approach uses least-squares technique to construct a model of data and fit to a polynomial. Differential calculus technique is used on this equation to generate the aggregated results that represents the original data set.
Data Structures and Algorithms (DSA) Tutorial for Beginners - Learn Data Structures and Algorithm using c, C++ and Java in simple and easy steps starting from basic to advanced concepts with examples
Parameter Estimation for the Weibul distribution model Using Least-Squares Me...IJMERJOURNAL
Abstract: We find Survival rate estimates, parameter estimates for the Weibulldistribution model using least-squares estimation method for the case when partial derivatives were not available, the Simplex optimization Methods (Nelder and Mead, and Hooke and Jeeves were used, and for the case when first partial derivatives were available, the Quasi – Newton Methods(Davidon-Fletcher-Powel (DFP) and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization methods)were applied.The medical data sets of 21 Leukemiacancer patients with time span of 35 weeks were used.
2018 Global Azure Bootcamp Azure Machine Learning for neural networksSetu Chokshi
This was the introduction session done for the 2018 Global Azure Bootcamp to get the users started with neural networks on Azure Machine Learning Studio. This gives them the initial introduction on how to develop and write the neural networks. We started with writing LeNet architecture on Azure Machine Learning studio to identify handwritten digits and then moved on to cats and dogs.
This was also the presented in the first workshop of my meetup
Microsoft Ai, ML Community which can be reached here
https://www.meetup.com/Microsoft-AI-ML-Community/
Extended Fuzzy C-Means with Random Sampling Techniques for Clustering Large DataAM Publications
Big data are any data that you cannot load into your computer’s primary memory. Clustering is a primary
task in pattern recognition and data mining. We need algorithms that scale well with the data size. The former
implementation, literal Fuzzy C-Means is linear or serialized. FCM algorithm attempts to partition a finite collection
of n elements into collection of c fuzzy clusters. So, given a finite set of data, this algorithm returns a list of c cluster
centers. However it doesn't scale well and slows down with increase in the size of data and is thus impractical and
sometimes undesirable. In this paper, we propose an extended version of fuzzy c-means clustering algorithm by means of various random sampling techniques to study which method scales well for large or very large data.
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUEScscpconf
This paper presents a new idea for fault detection and isolation (FDI) technique which is applied to industrial system. This technique is based on Neural Networks fault-free and Faulty
behaviours Models (NNFMs). NNFMs are used for residual generation, while decision tree architecture is used for residual evaluation. The decision tree is realized with data collected
from the NNFM’s outputs and is used to isolate detectable faults depending on computed threshold. Each part of the tree corresponds to specific residual. With the decision tree, it
becomes possible to take the appropriate decision regarding the actual process behaviour by evaluating few numbers of residuals. In comparison to usual systematic evaluation of all
residuals, the proposed technique requires less computational effort and can be used for on line diagnosis. An application example is presented to illustrate and confirm the effectiveness and the accuracy of the proposed approach.
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUEScsitconf
This paper presents a new idea for fault detection and isolation (FDI) technique which is
applied to industrial system. This technique is based on Neural Networks fault-free and Faulty
behaviours Models (NNFMs). NNFMs are used for residual generation, while decision tree
architecture is used for residual evaluation. The decision tree is realized with data collected
from the NNFM’s outputs and is used to isolate detectable faults depending on computed
threshold. Each part of the tree corresponds to specific residual. With the decision tree, it
becomes possible to take the appropriate decision regarding the actual process behaviour by
evaluating few numbers of residuals. In comparison to usual systematic evaluation of all
residuals, the proposed technique requires less computational effort and can be used for on line
diagnosis. An application example is presented to illustrate and confirm the effectiveness and
the accuracy of the proposed approach.
⭐⭐⭐⭐⭐ Finding a Dynamical Model of a Social Norm Physical Activity InterventionVictor Asanza
✅ Low levels of physical activity in sedentary individuals constitute a major concern in public health.
✅ Physical activity interventions can be designed relying on mobile technologies such as smartphones.
✅ The purpose of this work is to find a dynamical model of a social norm physical activity intervention relying on Social Cognitive Theory, and using a data set obtained from a previous experiment.
✅ The model will serve as a framework for the design of future optimized interventions. To obtain model parameters, two strategies are developed: first, an algorithm is proposed that randomly varies the values of each model parameter around initial guesses.
✅ The second approach utilizes traditional system identification concepts to obtain model parameters relying on semi-physical identification routines. For both cases, the obtained model is assessed through the computation of percentage fits to a validation data set, and by the development of a correlation analysis.
Simplified Knowledge Prediction: Application of Machine Learning in Real LifePeea Bal Chakraborty
Machine learning is the scientific study of algorithms and statistical models that is used by the machines to perform a specific task depending on patterns and inference rather than explicit instructions. This research and analysis aims to observe how precisely a machine can predict that a patient suspected of breast cancer is having malignant or benign cancer.In this paper the classification of cancer type and prediction of risk levels is done by various model of machine learning and is pictorially depicted by various tools of visual analytics.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
1. Tim Morris
MRC CTU at UCL
25th UK Stata Conference
Michael Crowther
University of Leicester
The Right Way to code
simulation studies in Stata
2. MRC CTU at UCL
https://github.com/tpmorris/TheRightWay
tldr:
Michael’s way is unambiguously wrong
My way is not unambiguously right
The Right Way is unambiguously right
3. MRC CTU at UCL
What is a simulation study?
Use of (pseudo) random numbers to produce data from
some distribution to help us to study properties of a
statistical method.
An example:
1. Generate data from a distribution with parameter θ
2. Apply analysis method to data, producing an estimate 𝜃
3. Repeat (1) and (2) nsim times
4. Compare θ with E[ 𝜃] – if we had not generated the data,
we would not know θ and so could not do this.
4. MRC CTU at UCL
Some background
• Consistent terminology with definitions
• ADEMP (Aims, Data-generating mechanisms,
Estimands, Methods, Performance measures): D, E, M
are important in coding simulation studies
5. MRC CTU at UCL
Four datasets (possibly)
• Simulated: e.g. a simulated hypothetical study
• Estimates: some summary of 𝑛 𝑠𝑖𝑚 repetitions
• States: record of 𝑛 𝑠𝑖𝑚 + 1 RNG states – at the beginning
of each repetition and one after final repetition
• Performance: summarises estimates of performance
(bias, empirical SE, coverage etc.), and (hopefully) their
Monte Carlo SE, for each D, E, M
6. MRC CTU at UCL
This talk
This talk focuses on the code that produces a simulated
dataset and returns the estimates and states datasets.
I teach simulation studies a lot. Errors in coding occur
primarily in generating data in the way you want, and in
storing summaries of each repetition (estimates data).
7. MRC CTU at UCL
A simple simulation study:
Aims
Suppose we are interested in the analysis of a randomised
trial with a survival outcome and unknown baseline hazard
function.
Aim to evaluate the impacts of:
1. misspecifying the baseline hazard function on the
estimate of the treatment effect
2. fitting a more complex model than necessary
3. avoiding the issue by using a semiparametric model
8. MRC CTU at UCL
Data generating mechanisms
Simulate nobs=100 and then nobs=500 from a Weibull
distribution with 𝑋𝑖~𝐵𝑒𝑟𝑛(.5) and
ℎ 𝑡 = 𝜆𝛾𝑡 𝛾−1 exp 𝑋𝑖 𝜃 where 𝜆 = 0.1, 𝜃 = −0.5
(admin censoring
at 5 years)
Study 𝛾 = 1
then 𝛾 = 1.5
9. MRC CTU at UCL
Estimands and Methods
Estimand is 𝜃, the hazard ratio for treatment vs. control
Methods:
1. Exponential model
2. Weibull model
3. Cox model
(Don’t need to consider performance measures for this talk;
see London Stata Conference 2020!)
12. MRC CTU at UCL
The simulate approach
From the help file:
‘simulate eases the programming task of
performing Monte Carlo-type simulations’
… ‘questionable’ to ‘no’.
13. MRC CTU at UCL
The simulate approach
If you haven’t used it, simulate works as follows:
1. You write a program (rclass or eclass) that follows
standard Stata syntax and returns quantities of interest
as scalars.
2. Your program will generate ≥1 simulated dataset and
return estimates for ≥1 estimands obtained by ≥1
methods.
3. You use simulate to repeatedly call the program.
14. MRC CTU at UCL
The simulate approach
I’ve wished-&-grumbled here and on Statalist that
simulate:
– Does not allow posting of the repetition number (an
oversight?)
– Precludes putting strings into the estimates dataset,
meaning non-numerical inputs (D) and contents of
c(rngstate) cannot be stored.
– Produces ultra-wide data (if E, M and D vary, the resulting
estimates must be stored across a single row!)
Your code is clean; your estimates dataset is a mess.
15. MRC CTU at UCL
The post approach
Structure:
tempname tim
postfile `tim' int(rep) str5(dgm estimand) ///
double(theta se) using estimates.dta, replace
forval i = 1/`nsim' {
<1st DGM>
<apply method>
post `tim' (`i') ("thing") ("theta") (_b[trt])
> (_se[trt])
<2nd DGM>
}
postclose `tim'
16. MRC CTU at UCL
The post approach
+ No shortcomings of simulate
+ Produces a well-formed estimates dataset
– post commands become entangled in the code for
generating and analysing data
– post lines are more error prone. Suppose you are using
different n. An efficient way to code this is to generate a
dataset (with n observations) and then increase subsets of
this data in analysis for the ‘smaller n’ data-generating
mechanisms. The code can get inelegant and you mis-
post.
Your estimates dataset is clean; your code is a mess.
17. MRC CTU at UCL
The right approach
One can mash-up the two!
1. Write a program, as you would with simulate
2. Use postfile
3. Call the program
4. Post inputs and returned results using post
5. Use a second postfile for storing rngstates
Why?
1. Appease Michael: Tidy code that is less error-prone.
2. Appease Tim: Tidy estimates (and states) dataset that
avoids error-prone reshaping & formatting acrobatics.
18. MRC CTU at UCL
A query (grumble?)
• None of the options allow for a well-formatted dataset. I
want to define a (unique) sort order, label variables &
values, use chars… (for value labels, order matters; see
below)
• I believe this stuff has to be done afterwards (?)
• To use 1 "Exponential" 2 "Weibull" and 3 "Cox" (I do), I
have to open estimates.dta, label define and label
values. Could this be done up-front so you could e.g. fill
in DGM codes with “Cox”:method_label rather than
number 2?