An efficient one-pass online algorithm for triclustering of binary data (triadic formal contexts) is proposed. This algorithm is a modified version of the basic algorithm for OAC-triclustering approach, but it has linear time and memory complexities with respect to the cardinality
of the underlying ternary relation and can be easily parallelized in order to be applied for the analysis of big datasets. The results of computer experiments show the efficiency of the proposed algorithm.
Context-Aware Recommender System Based on Boolean Matrix FactorisationDmitrii Ignatov
In this work we propose and study an approach for collaborative filtering, which is based on Boolean matrix factorisation and exploits additional (context) information about users and items. To avoid similarity loss in case of Boolean representation we use an adjusted type of projection of a target user to the obtained factor space.
We have compared the proposed method with SVD-based approach on the MovieLens dataset. The experiments demonstrate that the proposed method has better MAE and Precision and comparable Recall and F-measure. We also report an increase of quality in the context information presence.
Accelerating Pseudo-Marginal MCMC using Gaussian ProcessesMatt Moores
The grouped independence Metropolis-Hastings (GIMH) and Markov chain within Metropolis (MCWM) algorithms are pseudo-marginal methods used to perform Bayesian inference in latent variable models. These methods replace intractable likelihood calculations with unbiased estimates within Markov chain Monte Carlo algorithms. The GIMH method has the posterior of interest as its limiting distribution, but suffers from poor mixing if it is too computationally intensive to obtain high-precision likelihood estimates. The MCWM algorithm has better mixing properties, but less theoretical support. In this paper we accelerate the GIMH method by using a Gaussian process (GP) approximation to the log-likelihood and train this GP using a short pilot run of the MCWM algorithm. Our new method, GP-GIMH, is illustrated on simulated data from a stochastic volatility and a gene network model. Our approach produces reasonable estimates of the univariate and bivariate posterior distributions, and the posterior correlation matrix in these examples with at least an order of magnitude improvement in computing time.
1. Motivation: why do we need low-rank tensors
2. Tensors of the second order (matrices)
3. CP, Tucker and tensor train tensor formats
4. Many classical kernels have (or can be approximated in ) low-rank tensor format
5. Post processing: Computation of mean, variance, level sets, frequency
Context-Aware Recommender System Based on Boolean Matrix FactorisationDmitrii Ignatov
In this work we propose and study an approach for collaborative filtering, which is based on Boolean matrix factorisation and exploits additional (context) information about users and items. To avoid similarity loss in case of Boolean representation we use an adjusted type of projection of a target user to the obtained factor space.
We have compared the proposed method with SVD-based approach on the MovieLens dataset. The experiments demonstrate that the proposed method has better MAE and Precision and comparable Recall and F-measure. We also report an increase of quality in the context information presence.
Accelerating Pseudo-Marginal MCMC using Gaussian ProcessesMatt Moores
The grouped independence Metropolis-Hastings (GIMH) and Markov chain within Metropolis (MCWM) algorithms are pseudo-marginal methods used to perform Bayesian inference in latent variable models. These methods replace intractable likelihood calculations with unbiased estimates within Markov chain Monte Carlo algorithms. The GIMH method has the posterior of interest as its limiting distribution, but suffers from poor mixing if it is too computationally intensive to obtain high-precision likelihood estimates. The MCWM algorithm has better mixing properties, but less theoretical support. In this paper we accelerate the GIMH method by using a Gaussian process (GP) approximation to the log-likelihood and train this GP using a short pilot run of the MCWM algorithm. Our new method, GP-GIMH, is illustrated on simulated data from a stochastic volatility and a gene network model. Our approach produces reasonable estimates of the univariate and bivariate posterior distributions, and the posterior correlation matrix in these examples with at least an order of magnitude improvement in computing time.
1. Motivation: why do we need low-rank tensors
2. Tensors of the second order (matrices)
3. CP, Tucker and tensor train tensor formats
4. Many classical kernels have (or can be approximated in ) low-rank tensor format
5. Post processing: Computation of mean, variance, level sets, frequency
Full paper: https://arxiv.org/pdf/1804.02339.pdf
We propose and analyze a novel adaptive step size variant of the Davis-Yin three operator splitting, a method that can solve optimization problems composed of a sum of a smooth term for which we have access to its gradient and an arbitrary number of potentially non-smooth terms for which we have access to their proximal operator. The proposed method leverages local information of the objective function, allowing for larger step sizes while preserving the convergence properties of the original method. It only requires two extra function evaluations per iteration and does not depend on any step size hyperparameter besides an initial estimate. We provide a convergence rate analysis of this method, showing sublinear convergence rate for general convex functions and linear convergence under stronger assumptions, matching the best known rates of its non adaptive variant. Finally, an empirical comparison with related methods on 6 different problems illustrates the computational advantage of the adaptive step size strategy.
Faster Practical Block Compression for Rank/Select DictionariesRakuten Group, Inc.
We present faster practical encoding and decoding procedures for block compression. Such encoding and decoding procedures are important to efficiently support rank/select queries on compressed bit vectors. This paper was presented at the 24th International Symposium on String Processing and Information Retrieval (SPIRE 2017) in Palermo, Italy.
In this talk, we discuss some recent advances in probabilistic schemes for high-dimensional PIDEs. It is known that traditional PDE solvers, e.g., finite element, finite difference methods, do not scale well with the increase of dimension. The idea of probabilistic schemes is to link a wide class of nonlinear parabolic PIDEs to stochastic Levy processes based on nonlinear version of the Feynman-Kac theory. As such, the solution of the PIDE can be represented by a conditional expectation (i.e., a high-dimensional integral) with respect to a stochastic dynamical system driven by Levy processes. In other words, we can solve the PIDEs by performing high-dimensional numerical integration. A variety of quadrature methods could be applied, including MC, QMC, sparse grids, etc. The probabilistic schemes have been used in many application problems, e.g., particle transport in plasmas (e.g., Vlasov-Fokker-Planck equations), nonlinear filtering (e.g., Zakai equations), and option pricing, etc.
This paper presents an interesting idea how to compute a consensus of several k-partitions of a set by means of finding an antichain in the concept lattice of an appropriate formal context.
Full paper: https://arxiv.org/pdf/1804.02339.pdf
We propose and analyze a novel adaptive step size variant of the Davis-Yin three operator splitting, a method that can solve optimization problems composed of a sum of a smooth term for which we have access to its gradient and an arbitrary number of potentially non-smooth terms for which we have access to their proximal operator. The proposed method leverages local information of the objective function, allowing for larger step sizes while preserving the convergence properties of the original method. It only requires two extra function evaluations per iteration and does not depend on any step size hyperparameter besides an initial estimate. We provide a convergence rate analysis of this method, showing sublinear convergence rate for general convex functions and linear convergence under stronger assumptions, matching the best known rates of its non adaptive variant. Finally, an empirical comparison with related methods on 6 different problems illustrates the computational advantage of the adaptive step size strategy.
Faster Practical Block Compression for Rank/Select DictionariesRakuten Group, Inc.
We present faster practical encoding and decoding procedures for block compression. Such encoding and decoding procedures are important to efficiently support rank/select queries on compressed bit vectors. This paper was presented at the 24th International Symposium on String Processing and Information Retrieval (SPIRE 2017) in Palermo, Italy.
In this talk, we discuss some recent advances in probabilistic schemes for high-dimensional PIDEs. It is known that traditional PDE solvers, e.g., finite element, finite difference methods, do not scale well with the increase of dimension. The idea of probabilistic schemes is to link a wide class of nonlinear parabolic PIDEs to stochastic Levy processes based on nonlinear version of the Feynman-Kac theory. As such, the solution of the PIDE can be represented by a conditional expectation (i.e., a high-dimensional integral) with respect to a stochastic dynamical system driven by Levy processes. In other words, we can solve the PIDEs by performing high-dimensional numerical integration. A variety of quadrature methods could be applied, including MC, QMC, sparse grids, etc. The probabilistic schemes have been used in many application problems, e.g., particle transport in plasmas (e.g., Vlasov-Fokker-Planck equations), nonlinear filtering (e.g., Zakai equations), and option pricing, etc.
This paper presents an interesting idea how to compute a consensus of several k-partitions of a set by means of finding an antichain in the concept lattice of an appropriate formal context.
AIST is a scientific conference on Analysis of Images, Social Networks, and Texts. The conference is intended for computer scientists and practitioners whose research interests involve Internet mathematics and other related fields of data science. Similar to the previous year, the conference will be focused on applications of data mining and machine learning techniques to various problem domains: image processing, analysis of social networks, and natural language processing. We hope that the participants will benefit from the interdisciplinary nature of the conference and exchange experience.
In our previous work an efficient one-pass online algorithm for triclustering of binary data (triadic formal contexts) was proposed. This algorithm is a modified version of the basic algorithm for OAC- triclustering approach; it has linear time and memory complexities. In this paper we parallelise it via map-reduce framework in order to make it suitable for big datasets. The results of computer experiments show the efficiency of the proposed algorithm; for example, it outperforms the online counterpart on Bibsonomy dataset with ≈ 800, 000 triples.
Experimental Economics and Machine Learning workshopDmitrii Ignatov
This presentation summarises recent activities on EEML workshop organisation. In fact, this is a successful event which attracts economists and computers scientists who would like to use recent advances in machine learning and data mining to understand human behavior in different domains related to Economics and Social Science.
On the Family of Concept Forming Operators in Polyadic FCADmitrii Ignatov
Triadic Formal Concept Analysis (3FCA) was introduced by Lehman and Wille almost two decades ago. And many researchers work in Data Mining and Formal Concept Analysis using the notions of closed sets, Galois and closure operators, closure systems. However, up-to-date even though that different researchers actively work on mining triadic and n-ary relations, a proper closure operator for enumeration of triconcepts, i.e. maximal triadic cliques of tripartite hypergaphs, was not introduced. In this talk we show that the previously introduced operators for obtaining triconcepts are not always consistent, describe their family and study their properties. We also introduce the notion of maximal switching generator to explain why such concept-forming operators are not closure operators due to violation of monotonicity property.
Boolean matrix factorisation for collaborative filteringDmitrii Ignatov
We propose a new approach for Collaborative filtering which
is based on Boolean Matrix Factorisation (BMF) and Formal Concept
Analysis. In a series of experiments on real data (MovieLens dataset) we
compare the approach with an SVD-based one in terms of Mean Average
Error (MAE). One of the experimental consequences is that it is enough
to have a binary-scaled rating data to obtain almost the same quality
in terms of MAE by BMF as for the SVD-based algorithm in case of
non-scaled data.
NIPS 2016, Tensor-Learn@NIPS, and IEEE ICDM 2016Dmitrii Ignatov
Some photo impressions from NIPS & ICDM 2016 in Barcelona mixed with workshops like Learning with Tensors (http://tensor-learn.org/) and related stuff.
Pattern-based classification of demographic sequencesDmitrii Ignatov
We have proposed prefix-based gapless sequential patterns for classification of demographic sequences. In comparison to black-box machine learning techniques, this one provides interpretable patterns suitable for treatment by professional demographers. As for the language, we have used Pattern Structures as an extension of Formal Concept Analysis for the case of complex data like sequences, graphs, intervals, etc.
A short introduction into Sequential Pattern Mining in Russia. We consider frequent and frequent closed sequences along with two algorithms (SPADE and PrefixSpan). A demographic case study is provided as well. One can find links and references to relevant literature and software. We mainly follow Han & Kamber Data Mining book (2nd edition, Chapter 8.3).
Краткое введение в Sequential Pattern Mining на русском языке. Рассматриваются алгоритмы для поиска частых и частых замкнутых последовательностей (SPADE и PrefixSpan) Кейс-стади на примере демографических последовательностей. Приведены ссылки на библиотеки и реализации некоторых базовых алгоритмов. Основное изложение по мотивам учебника Джиавея Хана и Мишелин Камбер.
Поиск частых множеств признаков (товаров) и ассоциативные правилаDmitrii Ignatov
Краткое введение в анализ ассоциативных правил в терминах Анализа Формальных Понятий. Примеры задач: поиск документов почти-дубликатов, анализ посещаемости сайтов, контекстная реклама.
RAPS: A Recommender Algorithm Based on Pattern StructuresDmitrii Ignatov
We propose a new algorithm for recommender systems with numeric
ratings which is based on Pattern Structures (RAPS). As the input the algorithm
takes rating matrix, e.g., such that it contains movies rated by users. For a target
user, the algorithm returns a rated list of items (movies) based on its previous ratings
and ratings of other users.We compare the results of the proposed algorithm
in terms of precision and recall measures with Slope One, one of the state-of-the-art
item-based algorithms, on Movie Lens dataset and RAPS demonstrates the
best or comparable quality.
Pattern Mining and Machine Learning for Demographic SequencesDmitrii Ignatov
In this talk, we present the results of our first studies in application of pattern mining and machine learning to analysis of demographic sequences in Russia based on data of 11 generations from 1930 till 1984. The main goal is not prediction and data mining methods themselves, but rather extraction of interesting patterns and knowledge acquisition from substantial datasets of demographic data. We use decision trees as techniques for demographic events prediction and emerging patterns for searching significant and potentially useful sequences.
Online Recommender System for Radio Station Hosting: Experimental Results Rev...Dmitrii Ignatov
We present a new recommender system developed for the Russian interactive radio network FMhost based on a previously proposed model. The underlying model combines a collaborative user-based approach with information from tags of listened tracks in order to match user and radio station profiles.
It follows an adaptive online learning strategy based on the user history. We compare the proposed algorithms and an industry standard technique based on singular value decomposition (SVD)
in terms of precision, recall, and NDCG measures; experiments show that in our case the fusion-based approach shows the best results.
Searching for optimal patterns in Boolean tensorsDmitrii Ignatov
This is our slides for a spotlight talk at Learning with Tensors workshop at NIPS 2016. We have shortly summarise comparison of five different triclustering algorithms (TRIAS, TriBox, OACPrime, OACBox, and SpecTric).
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...Dmitrii Ignatov
Mining ternary relations or triadic Boolean tensors is one of the recent trends in knowledge discovery that allows one to take into account various modalities of input object-attribute data.
For example, in movie databases like IMBD, an analyst may find not only movies grouped by specific genres but see their common keywords. In the so called folksonomies, users can be grouped according to their shared resources and used tags. In gene expression analysis, genes can be grouped along with samples of tissues and time intervals providing comprehensible patterns. However, pattern explosion effects even with one more dimension are seriously aggravated. In this paper, we continue our previous study on searching for a smaller collection of ``optimal'' patterns in triadic data with respect to a set of quality criteria such as patterns' cardinality, density, diversity, coverage, etc. We show how a simple data preprocessing has enabled us to use the frequent itemset mining algorithm.
My talk at the International Conference on Monte Carlo Methods and Applications (MCM2032) related to advances in mathematical aspects of stochastic simulation and Monte Carlo methods at Sorbonne Université June 28, 2023, about my recent works (i) "Numerical Smoothing with Hierarchical Adaptive Sparse Grids and Quasi-Monte Carlo Methods for Efficient Option Pricing" (link: https://doi.org/10.1080/14697688.2022.2135455), and (ii) "Multilevel Monte Carlo with Numerical Smoothing for Robust and Efficient Computation of Probabilities and Densities" (link: https://arxiv.org/abs/2003.05708).
To describe the dynamics taking place in networks that structurally change over time, we propose an approach to search for attributes whose value changes impact the topology of the graph. In several applications, it appears that the variations of a group of attributes are often followed by some structural changes in the graph that one may assume they generate. We formalize the triggering pattern discovery problem as a method jointly rooted in sequence mining and graph analysis. We apply our approach on three real-world dynamic graphs of different natures - a co-authoring network, an airline network, and a social bookmarking system - assessing the relevancy of the triggering pattern mining approach.
Simple representations for learning: factorizations and similarities Gael Varoquaux
Real-life data seldom comes in the ideal form for statistical learning.
This talk focuses on high-dimensional problems for signals and
discrete entities: when dealing with many, correlated, signals or
entities, it is useful to extract representations that capture these
correlations.
Matrix factorization models provide simple but powerful representations. They are used for recommender systems across discrete entities such as users and products, or to learn good dictionaries to represent images. However they entail large computing costs on very high-dimensional data, databases with many products or high-resolution images. I will present an
algorithm to factorize huge matrices based on stochastic subsampling that gives up to 10-fold speed-ups [1].
With discrete entities, the explosion of dimensionality may be due to variations in how a smaller number of categories are represented. Such a problem of "dirty categories" is typical of uncurated data sources. I will discuss how encoding this data based on similarities recovers a useful category structure with no preprocessing. I will show how it interpolates between one-hot encoding and techniques used in character-level natural language processing.
[1] Stochastic subsampling for factorizing huge matrices, A Mensch, J Mairal, B Thirion, G Varoquaux, IEEE Transactions on Signal Processing 66 (1), 113-128
[2] Similarity encoding for learning with dirty categorical variables. P Cerda, G Varoquaux, B Kégl Machine Learning (2018): 1-18
Typically quantifying uncertainty requires many evaluations of a computational model or simulator. If a simulator is computationally expensive and/or high-dimensional, working directly with a simulator often proves intractable. Surrogates of expensive simulators are popular and powerful tools for overcoming these challenges. I will give an overview of surrogate approaches from an applied math perspective and from a statistics perspective with the goal of setting the stage for the "other" community.
Simplifying Gaussian Mixture Models Via Entropic Quantization (EUSIPCO 2009)Frank Nielsen
Slides for the paper presented at EUSIPCO 2009:
Simplifying Gaussian Mixture Models Via Entropic Quantization
http://www.eurasip.org/Proceedings/Eusipco/Eusipco2009/contents/papers/1569187249.pdf
We apply tensor train (TT) data format to solve an elliptic PDE with uncertain coefficients. We reduce complexity and storage from exponential to linear. Post-processing in TT format is also provided.
Interpretable Concept-Based Classification with Shapley ValuesDmitrii Ignatov
The slides contain our talk on Shapley values as an interpretable Machine learning technique for JSM-method, a rule-based classification and reasoning technique, for ranking particular attributes of an undetermined example under classification.
https://doi.org/10.1007/978-3-030-57855-8_7
These are opening slides of the 8th International Conference on Analysis of Images, Social Networks and Texts (AIST 2019). We summarise general facts on AIST conf. series. See http://aistconf.org website for more details.
Social Learning in Networks: Extraction Deterministic RulesDmitrii Ignatov
In this talk, we want to introduce experimental
economics to the field of data mining and vice versa. It continues
related work on mining deterministic behavior rules of human
subjects in data gathered from experiments. Game-theoretic
predictions partially fail to work with this data. Equilibria also
known as game-theoretic predictions solely succeed with experienced
subjects in specific games – conditions, which are rarely
given. Contemporary experimental economics offers a number of
alternative models apart from game theory. In relevant literature,
these models are always biased by philosophical plausibility
considerations and are claimed to fit the data. An agnostic
data mining approach to the problem is introduced in this
paper – the philosophical plausibility considerations follow after
the correlations are found. No other biases are regarded apart
from determinism. The dataset of the paper “Social Learning in
Networks” by Choi et al 2012 is taken for evaluation. As a result,
we come up with new findings. As future work, the design of a
new infrastructure is discussed.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
A One-Pass Triclustering Approach: Is There any Room for Big Data?
1. A One-Pass Triclustering Approach: Is There any Room
for Big Data?
Dmitry V. Gnatyshak1 Dmitry I. Ignatov1 Sergei O. Kuznetsov1 Lhouari
Nourine2
National Research University Higher School of Economics, Russian Federation
Blaise Pascal University, LIMOS, CNRS, France
10.10.2014
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 1 / 26
2. Outline
1 Motivation
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 2 / 26
3. Outline
1 Motivation
2 Prime OAC-triclustering
Formal concept analysis
Basic algorithm
Online version of the algorithm
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 2 / 26
4. Outline
1 Motivation
2 Prime OAC-triclustering
Formal concept analysis
Basic algorithm
Online version of the algorithm
3 Experiments
Description of the experiments
Datasets
Results
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 2 / 26
5. Outline
1 Motivation
2 Prime OAC-triclustering
Formal concept analysis
Basic algorithm
Online version of the algorithm
3 Experiments
Description of the experiments
Datasets
Results
4 Conclusion
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 2 / 26
6. Outline
1 Motivation
2 Prime OAC-triclustering
Formal concept analysis
Basic algorithm
Online version of the algorithm
3 Experiments
Description of the experiments
Datasets
Results
4 Conclusion
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 3 / 26
7. Motiation
Big amount of multimodal data:
Gene expression data
Folksonomies
. . .
Non-binary data can be scaled (possibly increasing the dimensionality)
Increasing amount of big data: fast algorithms required (linear or sublinear,
one-pass)
Existing methods — finding all p-clusters satisfying some conditions (often
exponential number)
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 4 / 26
8. Outline
1 Motivation
2 Prime OAC-triclustering
Formal concept analysis
Basic algorithm
Online version of the algorithm
3 Experiments
Description of the experiments
Datasets
Results
4 Conclusion
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 5 / 26
9. Prime OAC-triclustering
Formal concept analysis: triadic case
Definition
Let G, M, B be some sets. Let the ternary relation I be a subset of their cartesian
product: I ⊆ G × M × B. Then the tuple K = {G,M,B, I } is called a triadic
formal context.
G — a set of objects, M — a set of attributes, B — a set of conditions.
GM m1 m2 m3 m1 m2 m3 m1 m2 m3
g1 x x x x x x x x
g2 x x x x x
g3 x x x x
g4 x x x x x x
B b1 b2 b3
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 6 / 26
10. Prime OAC-triclustering
Formal concept analysis: triadic case
Definition
Galois operators (prime operators) are defined the same way as in dyadic case:
2G → 2M × 2B
2M → 2G × 2B
2B → 2G × 2M
2G × 2M → 2B
2G × 2B → 2M
2M × 2B → 2G
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 6 / 26
11. Prime OAC-triclustering
Formal concept analysis: triadic case
GM m1 m2 m3 m1 m2 m3 m1 m2 m3
g1 x x x x x x x x
g2 x x x x x
g3 x x x x
g4 x x x x x x
B b1 b2 b3
({g1, g2}, {m1,m2})′ = {b1, b3}
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 6 / 26
12. Prime OAC-triclustering
Formal concept analysis: triadic case
GM m1 m2 m3 m1 m2 m3 m1 m2 m3
g1 x x x x x x x x
g2 x x x x x
g3 x x x x
g4 x x x x x x
B b1 b2 b3
m′2
= {(g1, b1), (g2, b1), (g3, b1), (g1, b2), (g1, b3), (g2, b3), (g4, b3)}
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 6 / 26
13. Prime OAC-triclustering
Formal concept analysis: triadic case
Definition
The triple (X,Y , Z) is called triadic formal concept of the context
K = (G,M,B, I ), if X ⊆ G,Y ⊆ M, Z ⊆ B, (X,Y )′ = Z, (X, Z)′ = Y ,
(Y , Z)′ = X.
X is called (formal) extent, Y — (formal) intent, Z — (formal) modus.
GM m1 m2 m3 m1 m2 m3 m1 m2 m3
g1 x x x x x x x x
g2 x x x x x
g3 x x x x
g4 x x x x x x
B b1 b2 b3
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 6 / 26
14. Prime OAC-triclustering
Basic algorithm
This method uses the following types of prime operators (for the context
K = (G,M,B, I )):
(g,m)′ = {b ∈ B | (g,m, b) ∈ I },
(g, b)′ = {m ∈ M | (g,m, b) ∈ I },
(m, b)′ = {g ∈ G | (g,m, b) ∈ I }
Definition
Then the triple T = ((m, b)′, (g, b)′, (g,m)′) is called prime OAC-tricluster based
on triple (g,m, b) ∈ I . The sets of tricluster are called, respectively, extent,
intent, and modus. Triple (g,m, b) is called a generating triple of the tricluster T.
Definition
Density of a tricluster: ρ(X,Y , Z) = |I∩(X×Y×Z)|
|X||Y||Z|
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 7 / 26
15. Prime OAC-triclustering
Basic algorithm
An example of a tricluster based on triple (eg,em
,eb
):
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 8 / 26
16. Prime OAC-triclustering
Basic algorithm
Require: K = (G,M, B, I ) — triadic context;
ρmin — density threshold
Ensure: T = {T = (X, Y, Z)}
1: T := ∅
2: for all (g,m) : g ∈ G,m ∈ M do
3: PrimesObjAttr [g,m] = (g,m)′
4: end for
5: for all (g, b) : g ∈ G,b ∈ B do
6: PrimesObjCond[g, b] = (g, b)′
7: end for
8: for all (m, b) : m ∈ M,b ∈ B do
9: PrimesAttrCond[m, b] = (m, b)′
10: end for
11: for all (g,m, b) ∈ I do
12: T = (PrimesAttrCond[m, b], PrimesObjCond[g, b], PrimesObjAttr [g,m])
13: Tkey = hash(T)
14: if Tkey̸∈ T .keys ∧ ρ(T) ≥ ρmin then
15: T [Tkey] := T
16: end if
17: end for
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 9 / 26
17. Prime OAC-triclustering
Online version of the algorithm
Let K = (G,M,B, I ) be a triadic context. We do not know G, M, B, I , or their
cardinalities.
Input on each iteration: {(g,m, b)} = J ⊆ I .
Goal — maintain an updated version of the results and efficiently update them
when new triples are received.
We need to keep in memory the results of prime operators’ application (prime
sets):
PrimesObjAttr — dictionary with elements of type ((g,m), {b ∈ B}), g ∈ G,
m ∈ M;
PrimesObjCond — dictionary with elements of type ((g, b), {m ∈ M}),
g ∈ G, b ∈ B;
PrimesAttrCond — dictionary with elements of type ((m, b), {g ∈ G}),
m ∈ M, b ∈ B.
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 10 / 26
18. Prime OAC-triclustering
Online version of the algorithm
Remark
In this case we need to consider triclusters based on different triples different, even
if their extents, intents, and modi are equal.
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 11 / 26
19. Prime OAC-triclustering
Online version of the algorithm
Algorithm of triples addition (standard):
Require: J — a set of triples to add;
T = {T = (∗X, ∗Y , ∗Z)} — current tricluster set;
PrimesObjAttr , PrimesObjCond, PrimesAttrCond;
Ensure: T = {T = (∗X, ∗Y , ∗Z)};
PrimesObjAttr , PrimesObjCond, PrimesAttrCond;
1: for all (g,m, b) ∈ J do
2: PrimesObjAttr [g,m] := PrimesObjAttr [g,m] ∪ b
3: PrimesObjCond[g, b] := PrimesObjCond[g, b] ∪ m
4: PrimesAttrCond[m, b] := PrimesAttrCond[m, b] ∪ g
5: T :=
T ∪ (&PrimesAttrCond[m, b],&PrimesObjCond[g, b],&PrimesObjAttr [g,m])
6: end for
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 12 / 26
20. Prime OAC-triclustering
Online version of the algorithm
Algorithm of triples removal (optional):
Triclusters must be kept in the dictionary with generating triples being keys.
Require: J — a set of triples to remove;
T = {(key(T),T = (∗X, ∗Y , ∗Z))} — current tricluster dictionary;
PrimesObjAttr , PrimesObjCond, PrimesAttrCond;
Ensure: T = {T = (∗X, ∗Y , ∗Z)};
PrimesObjAttr , PrimesObjCond, PrimesAttrCond;
1: for all (g,m, b) ∈ J do
2: T := T T [(g,m, b)]
3: PrimesObjAttr [g,m] := PrimesObjAttr [g,m] b
4: PrimesObjCond[g, b] := PrimesObjCond[g, b] m
5: PrimesAttrCond[m, b] := PrimesAttrCond[m, b] g
6: end for
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 13 / 26
21. Prime OAC-triclustering
Online version of the algorithm
If a user have asked for an output, we may need to remove the triclusters with the
same extent, intent and modi at the post-processing stage. At this stage we can
also check various conditions (for instance, minimal density condition).
Require: T = {T = (∗X, ∗Y , ∗Z)} — current tricluster set;
Ensure: T = {T = (∗X, ∗Y , ∗Z)} — processed tricluster hash-set;
1: for all T ∈ T do
2: Compute hash(T)
3: if hash(T)̸∈ T then
4: T := T ∪ T
5: end if
6: end for
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 14 / 26
22. Prime OAC-triclustering
Online version of the algorithm
Remark 1
To allow an efficient access to the prime sets dictionaries PrimesObjAttr ,
PrimesObjCond, and PrimesAttrCond must be implemented as hash tables.
Remark 2
For an efficient computation of triclusters’ hash values we can keep hash values of
prime sets along with prime sets. Then the calculation of the triclusters’ hash
values will require to find a value of some function of the prime sets’ hash values
(multiplied by non-repeating coefficients, for instance).
It is important not to use LHS hash-function (Locality Sensitive Hashing).
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 15 / 26
23. Prime OAC-triclustering
Online version of the algorithm
Complexities:
Time complexity: O(|I |) (as there is a constant number of operations on
each step);
More precisely: 8|I | operations in total;
1 Modification of 3 prime sets (3);
2 Creation of a new tricluster (1);
3 Addition of pointers to its extent, intent, and modus (3);
4 Addition of the tricluster to the set of all triclusters (1).
Memory complexity: O(|I |) (as we need to keep in memory only prime sets,
|I | elements in each dictionary + keys).
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 16 / 26
24. Prime OAC-triclustering
Online version of the algorithm
Example:
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 17 / 26
25. Prime OAC-triclustering
Online version of the algorithm
→ (g1,m1, b1)
1 PrimesObjAttr = {((g1,m1), {b1})}
2 PrimesObjCond = {((g1, b1), {m1})}
3 PrimesAttrCond = {((m1, b1), {g1})}
4 T := T ∪ {PrimesAttrCond[m1, b1], PrimesObjCond[g1, b1], PrimesObjAttr [g1,m1]}
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 17 / 26
26. Prime OAC-triclustering
Online version of the algorithm
→ (g1,m2, b1)
1 PrimesObjAttr = {((g1,m1), {b1}), ((g1,m2), {b1})}
2 PrimesObjCond = {((g1, b1), {m1,m2})}
3 PrimesAttrCond = {((m1, b1), {g1}), ((m2, b1), {g1})}
4 T := T ∪ {PrimesAttrCond[m2, b1], PrimesObjCond[g1, b1], PrimesObjAttr [g1,m2]}
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 17 / 26
27. Prime OAC-triclustering
Online version of the algorithm
→ (g2,m1, b1)
1 PrimesObjAttr = {((g1,m1), {b1}), ((g1,m2), {b1}), ((g2,m1), {b1})}
2 PrimesObjCond = {((g1, b1), {m1,m2}), ((g2, b1), {m1})}
3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1})}
4 T := T ∪ {PrimesAttrCond[m1, b1], PrimesObjCond[g2, b1], PrimesObjAttr [g2,m1]}
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 17 / 26
28. Prime OAC-triclustering
Online version of the algorithm
→ (g2,m2, b1)
1 PrimesObjAttr = {((g1,m1), {b1}), ((g1,m2), {b1}), ((g2,m1), {b1}), ((g2,m2), {b1})}
2 PrimesObjCond = {((g1, b1), {m1,m2}), ((g2, b1), {m1,m2})}
3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2})}
4 T := T ∪ {PrimesAttrCond[m2, b1], PrimesObjCond[g2, b1], PrimesObjAttr [g2,m2]}
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 17 / 26
29. Prime OAC-triclustering
Online version of the algorithm
→ (g3,m3, b1)
1 PrimesObjAttr =
{((g1,m1), {b1}), ((g1,m2), {b1}), ((g2,m1), {b1}), ((g2,m2), {b1}), ((g3,m3), {b1})}
2 PrimesObjCond = {((g1, b1), {m1,m2}), ((g2, b1), {m1,m2}), ((g3, b1), {m3})}
3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2}), ((m3, b1), {g3})}
4 T := T ∪ {PrimesAttrCond[m3, b1], PrimesObjCond[g3, b1], PrimesObjAttr [g3,m3]}
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 17 / 26
30. Prime OAC-triclustering
Online version of the algorithm
→ (g1,m2, b2)
1 PrimesObjAttr = {((g1,m1), {b1}), ((g1,m2), {b1, b2}), ((g2,m1),
{b1}), ((g2,m2), {b1}), ((g3,m3), {b1})}
2 PrimesObjCond = {((g1, b1), {m1,m2}), ((g2, b1), {m1,m2}), ((g3, b1),
{m3}), ((g1, b2), {m2})}
3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2}), ((m3, b1),
{g3}), ((m2, b2), {g1})}
4 T := T ∪ {PrimesAttrCond[m2, b2], PrimesObjCond[g1, b2], PrimesObjAttr [g1,m2]}
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 17 / 26
31. Prime OAC-triclustering
Online version of the algorithm
→ (g2,m1, b2)
1 PrimesObjAttr = {((g1,m1), {b1}), ((g1,m2), {b1, b2}), ((g2,m1), {b1, b2}),
((g2,m2), {b1}), ((g3,m3), {b1})}
2 PrimesObjCond = {((g1, b1), {m1,m2}), ((g2, b1), {m1,m2}), ((g3, b1), {m3}),
((g1, b2), {m2}), ((g2, b2), {m1})}
3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2}), ((m3, b1), {g3}),
((m2, b2), {g1}), ((m1, b2), {g2})}
4 T := T ∪ {PrimesAttrCond[m1, b2], PrimesObjCond[g2, b2], PrimesObjAttr [g2,m1]}
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 17 / 26
32. Prime OAC-triclustering
Online version of the algorithm
→ (g2,m2, b2)
1 PrimesObjAttr = {((g1,m1), {b1}), ((g1,m2), {b1, b2}), ((g2,m1), {b1, b2}),
((g2,m2), {b1, b2}), ((g3,m3), {b1})}
2 PrimesObjCond = {((g1, b1), {m1,m2}), ((g2, b1), {m1,m2}), ((g3, b1), {m3}),
((g1, b2), {m2}), ((g2, b2), {m1,m2})}
3 PrimesAttrCond = {((m1, b1), {g1, g2}), ((m2, b1), {g1, g2}), ((m3, b1), {g3}),
((m2, b2), {g1, g2}), ((m1, b2), {g2})}
4 T := T ∪ {PrimesAttrCond[m2, b2], PrimesObjCond[g2, b2], PrimesObjAttr [g2,m2]}
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 17 / 26
34. Prime OAC-triclustering
Online version of the algorithm
Postprocessing:
1 T(g1,m1,b1) = (g1, g2,m1,m2, b1) ← add
2 T(g1,m2,b1) = (g1, g2,m1,m2, b1, b2) ← add
3 T(g2,m1,b1) = (g1, g2,m1,m2, b1, b2) ← the same as T(g1,m2,b1), skip
4 T(g2,m2,b1) = (g1, g2,m1,m2, b1, b2) ← the same as T(g1,m2,b1), skip
5 T(g3,m3,b1) = (g3,m3, b1, b2) ← add
6 T(g1,m2,b2) = (g1, g2,m2, b1, b2) ← add
7 T(g2,m1,b2) = (g2,m1,m2, b1, b2) ← add
8 T(g2,m2,b2) = (g1, g2,m1,m2, b1, b2) ← the same as T(g1,m2,b1), skip
9 T(g3,m3,b2) = (g3,m3, b1, b2) ← the same as T(g3,m3,b1), skip
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 17 / 26
35. Prime OAC-triclustering
Online version of the algorithm
The final output set of triclusters:
1 T1 = ({g1, g2}, {m1,m2}, {b1})
2 T2 = ({g1, g2}, {m1,m2}, {b1, b2})
3 T3 = ({g3}, {m3}, {b1, b2})
4 T4 = ({g1, g2}, {m2}, {b1, b2})
5 T5 = ({g2}, {m1,m2}, {b1, b2})
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 17 / 26
36. Outline
1 Motivation
2 Prime OAC-triclustering
Formal concept analysis
Basic algorithm
Online version of the algorithm
3 Experiments
Description of the experiments
Datasets
Results
4 Conclusion
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 18 / 26
37. Experiments
Description of the experiments
Goals:
Show that the online algorithm of p-dimensional attribute clustering
outperforms the basic algorithm
Confirm the complexity estimations
For each dataset for each version of the algorithm 11 experiments were conducted:
for each there were different density threshold (from 0 to 1 with 0.1 intervals). To
evaluate the time more precisely, for each context there were 5 runs of the
algorithms with the average result recorded.
Additional tests to check the performance on big datasets and confirm linearity of
the online algorithm.
All experiments were conducted on the computer with Intel Core i7-351U 2.40
GHz processor, 8 GB RAM, Windows 8 operating system.
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 19 / 26
38. Experiments
Datasets
5 pseudo-random uniform contexts 50 × 50 × 50. Probability of each
quadruple’s presence varied from 0.02 to 0.1 with 0.02 interval
10 pseudo-random uniform contexts with average density equal to 0.001.
Cardinalities of sets varied from 100 to 1000 with 100 interval
Top-250 list of IMDB (Internet Movie Database) (triples: (movie, tag,
genre))
Sample of 3000 triples of the first 100 000 triples of Bibsonomy.org dataset
(triples: (user, bookmark, tag))
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 20 / 26
43. Experiments
Results
Densities for the contexts:
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 23 / 26
44. Experiments
Results
Densities for the contexts:
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 23 / 26
45. Outline
1 Motivation
2 Prime OAC-triclustering
Formal concept analysis
Basic algorithm
Online version of the algorithm
3 Experiments
Description of the experiments
Datasets
Results
4 Conclusion
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 24 / 26
46. Conclusion
Prime OAC-triclustering algorithm was described
One-pass linear online version of its basic algorithm was proposed
Efficiency of the online algorithm and complexities of both algorithms were
confirmed.
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 25 / 26
47. Thank you!
Questions?
Dmitry V. Gnatyshak et al. A One-Pass Triclustering Approach 10.10.2014 26 / 26