Instead of just modelling word frequencies (or in general word characteristics) by learning a weight vector w, we are also learning a weight for each user (u). Thus, from a linear regression model, we now go bilinear. This idea was applied for predicting voting intention polls from tweets in two countries (Austria and the UK), but it is also applicable on various other NLP tasks, such as the extraction of socioeconomic patterns from the news.
This is a progress report presented to the Phylogenomics Group at UVigo in May 2013, about the current status of the software guenomu and the Bayesian model implemented.
At that time I was experimenting with a mixture model, that has been since then abandoned, and the Hdist that is still experimental. The presentation also describes the exhange algorithm to solve doubly-intractable distributions, the generalized Multiple-Try Metropolis, and the parallel PRNG used to minimize communication between jobs.
The world is ever changing. As a result, many of the systems and phenomena we are interested in evolve over time resulting in time evolving datasets. Timeseries often display any interesting properties and levels of correlation. In this tutorial we will introduce the students to the use of Recurrent Neural Networks and LSTMs to model and forecast different kinds of timeseries.
GitHub: https://github.com/DataForScience/RNN
User-generated content: collective and personalised inference tasksVasileios Lampos
Over the past few years large-scale, user-generated, online resources, such as social media or search engine queries, have been driving various interesting research efforts. This talk will showcase a series of ideas that we have developed based on the hypothesis that this online stream of content should represent a biased -at least- fraction of real-world situations, opinions or behaviour. Firstly, core algorithms have been proposed for mapping Twitter or Google search data to estimations about the current rate of an infectious disease, such as influenza. Then, through the addition of a user dimension in linear regularised text regression models, a family of bilinear models has been introduced, capable of improving approximations of voting intention trends. Finally, by redirecting the previous modelling principle, social media content has been used to learn as well as
qualitatively interpret user attributes, such as impact, occupation,
income and socioeconomic status.
Learning To Hop Using Guided Policy Search / ETH Zurich Computer Science Mast...Julian Viereck
A 5 minute presentation on the thesis "Learning To Hop Using Guided Policy Search". Presented during the ETH Zurich Computer Science Master Ceremony 2017.
This is a progress report presented to the Phylogenomics Group at UVigo in May 2013, about the current status of the software guenomu and the Bayesian model implemented.
At that time I was experimenting with a mixture model, that has been since then abandoned, and the Hdist that is still experimental. The presentation also describes the exhange algorithm to solve doubly-intractable distributions, the generalized Multiple-Try Metropolis, and the parallel PRNG used to minimize communication between jobs.
The world is ever changing. As a result, many of the systems and phenomena we are interested in evolve over time resulting in time evolving datasets. Timeseries often display any interesting properties and levels of correlation. In this tutorial we will introduce the students to the use of Recurrent Neural Networks and LSTMs to model and forecast different kinds of timeseries.
GitHub: https://github.com/DataForScience/RNN
User-generated content: collective and personalised inference tasksVasileios Lampos
Over the past few years large-scale, user-generated, online resources, such as social media or search engine queries, have been driving various interesting research efforts. This talk will showcase a series of ideas that we have developed based on the hypothesis that this online stream of content should represent a biased -at least- fraction of real-world situations, opinions or behaviour. Firstly, core algorithms have been proposed for mapping Twitter or Google search data to estimations about the current rate of an infectious disease, such as influenza. Then, through the addition of a user dimension in linear regularised text regression models, a family of bilinear models has been introduced, capable of improving approximations of voting intention trends. Finally, by redirecting the previous modelling principle, social media content has been used to learn as well as
qualitatively interpret user attributes, such as impact, occupation,
income and socioeconomic status.
Learning To Hop Using Guided Policy Search / ETH Zurich Computer Science Mast...Julian Viereck
A 5 minute presentation on the thesis "Learning To Hop Using Guided Policy Search". Presented during the ETH Zurich Computer Science Master Ceremony 2017.
The first part of the talk will cover our latest research on 1-million-agent reinforcement learning and its potential applications. Our findings show that the dynamics of the population from AI agents, driven by reinforcement learning and self-interest, share a similar pattern as those found in Nature. At the second part of the talk I shall move to a reinforcement learning setting where the game environment is strategic and designable. We present a simple case on how to design a difficult Maze, but the techniques can be used for various applications where the system level objectives are inconsistent with agents’ goals. I will finally conclude the talk by pointing out the future direction on this exciting field of AI.
The slides majorly covers
1.Introduction to CNN.
2. Visual Representation of Perceptron being trained on a dataset. 3. Key Concepts used in CNN modeling.
Newsletter Lika Electronic Aprile 2016 - Prodotti del mese: encoder bearingless robusti per condizioni difficili.
MIK36, MSK36 e MMK36 sono gli encoder senza albero e senza cuscinetti progettati per garantire solidità e affidabilità nelle condizioni ambientali più severe. Disponibili sia in versione incrementale che assoluta, anche multigiro.
SMAX e SMAZ sono gli encoder assoluti per applicazioni lineari dotati di un pacchetto completo di caratteristiche e funzioni al prezzo di un comune potenziometro! Garantiscono una protezione fino a IP69K e un funzionamento sicuro negli ambienti più difficili.
The first part of the talk will cover our latest research on 1-million-agent reinforcement learning and its potential applications. Our findings show that the dynamics of the population from AI agents, driven by reinforcement learning and self-interest, share a similar pattern as those found in Nature. At the second part of the talk I shall move to a reinforcement learning setting where the game environment is strategic and designable. We present a simple case on how to design a difficult Maze, but the techniques can be used for various applications where the system level objectives are inconsistent with agents’ goals. I will finally conclude the talk by pointing out the future direction on this exciting field of AI.
The slides majorly covers
1.Introduction to CNN.
2. Visual Representation of Perceptron being trained on a dataset. 3. Key Concepts used in CNN modeling.
Newsletter Lika Electronic Aprile 2016 - Prodotti del mese: encoder bearingless robusti per condizioni difficili.
MIK36, MSK36 e MMK36 sono gli encoder senza albero e senza cuscinetti progettati per garantire solidità e affidabilità nelle condizioni ambientali più severe. Disponibili sia in versione incrementale che assoluta, anche multigiro.
SMAX e SMAZ sono gli encoder assoluti per applicazioni lineari dotati di un pacchetto completo di caratteristiche e funzioni al prezzo di un comune potenziometro! Garantiscono una protezione fino a IP69K e un funzionamento sicuro negli ambienti più difficili.
Charles Bargue Drawing Course. This book is a complete reprint of the fabled but rare Drawing course (Cours de dessin) of Charles Bargue and Jean-Léon Gérôme, published in Paris en the 1860s and 1870s. For moste of the next half-century, this set of nearly 200 masterful lithographs was copied by art students worldwide before they attempted to draw from a live model. This book will be valuable to a wide range of artists, students, historians and collectors, even as it introduces them to the hitherto-neglected master, Charles Bargue. The drawing course is separated into three sections, in an ascending order of difficulty. The first section consists of lithographs by Bargue after casts of sculptures, mostly antique examples, that present the structure of the human body with remarkable clarty and intelligence. The second part contains the lithographs that Bargue made after master drawings by Renaissance and moderns artists and the third section almost 60 exemplary drawings of nude male models.
August White Paper 1/2015: Getting a Better Grip on External SpendingAugust Associates
Traditional savings reporting and accounting keep business owners and finance in the dark about external expenditure development. It doesn’t have to be that way.
Wilson Laleau nommé Directeur du Cabinet du Président de la République, avec rang de ministre.
Rénald Lubérice nommé Secrétaire Général du Conseil des ministres avec rang de ministre.
Yves Germain Joseph nommé Secrétaire Général de la Présidence avec rang de ministre
Marc Marie Yves Mazile chef du Protocole du Palais National avec rang d'ambassadeur
Dans le coffre à outils du gestionnaire marketing d'une marque en tourisme, il existe une panoplie de possibilités et il peut parfois être tentant de succomber à la saveur du jour, à la nouvelle tactique ou envisager une nouvelle technologie...
Présentation par François Mistral de l'engrenage entre LibQUAL et une démarche qualité, illustré par le cas lorrain dans le cadre de la journée Libqual Fr 2013
Mining socio-political and socio-economic signals from social media contentVasileios Lampos
Over the past few years user-generated content, primarily originating from social media platforms, has been the focus of considerable research effort. One of the main motivations is the potential of using this data as an additional source of information for enhancing our understanding of societal or more personalised
signals. This talk will showcase a series of ideas which explored that basic hypothesis, proposing statistical natural language processing frameworks for the estimation of collective patterns (e.g. emotions or voting intentions) or of more focused demographic user attributes (e.g. the occupation, income and socioeconomic status). Apart from the essential statistical evaluation of the proposed methods, interesting insights, stemming from a qualitative analysis, will be presented as well.
Physics, Astrophysics & Simulation of Gravitational Wave Source (Lecture 2)Christian Ott
Lecture on the physics, astrophysics, and simulation of gravitational wave sources delivered in March 2015 at the International School on Gravitational Wave Physics, Yukawa Institute for Theoretical Physics, Kyoto University
A Network-Aware Approach for Searching As-You-Type in Social MediaINRIA-OAK
We present a novel approach for as-you-type top-k keyword search over social media. We adopt a natural "network-aware" interpretation for information relevance, by which information produced by users who are closer to the seeker is considered more relevant. In practice, this query model poses new challenges for effectiveness and efficiency in online search, even when a complete query is given as input in one keystroke. This is mainly because it requires a joint exploration of the social space and classic IR indexes such as inverted lists. We describe a memory-efficient and incremental prefix-based retrieval algorithm, which also exhibits an anytime behavior, allowing to output the most likely answer within any chosen running-time limit. We evaluate it through extensive experiments for several applications and search scenarios , including searching for posts in micro-blogging (Twitter and Tumblr), as well as searching for businesses based on reviews in Yelp. They show that our solution is effective in answering real-time as-you-type searches over social media.
These slides refer to the talk I gave at the last ASE/IEEE SocialCom 2013 International Conference, where I presented the research work entitled "Trending Topics on Twitter Improve the Prediction of Google Hot Queries", which turned to be selected among the top-5% best accepted papers.
Once every five minutes, Twitter publishes a list of trending topics by monitoring and analyzing tweets from its users. Similarly, Google makes available hourly a list of hot queries that have been issued to the search engine. In this work, we analyze the time series derived from the daily volume index of each trend, either by Twitter or Google. Our study on a real-world dataset reveals that about 26% of the trending topics raising from Twitter "as-is" are also found as hot queries issued to Google. Also, we find that about 72% of the similar trends appear first on Twitter. Thus, we assess the relation between comparable Twitter and Google trends by testing three classes of time series regression models. We validate the forecasting power of Twitter by showing that models, which use Google as the dependent variable and Twitter as the explanatory variable, retain as significant the past values of Twitter 60% of times.
Le développement du Web et des réseaux sociaux ou les numérisations massives de documents contribuent à un renouvellement des Sciences Humaines et Sociales, des études des patrimoines littéraires ou culturels, ou encore de la façon dont est exploitée la littérature scientifique en général.
Les humanités numériques, qui croisent diverses disciplines avec l’informatique, posent comme centrales les questions du volume des données, de leur diversité, de leur origine, de leur véracité ou de leur représentativité. Les informations sont véhiculées au sein de « documents » textuels (livres, pages Web, tweets...), audio, vidéo ou multimédia. Ils peuvent comporter des illustrations ou des graphiques.
Appréhender de telles ressources nécessite le développement d'approches informatiques robustes, capables de passer à l’échelle et adaptées à la nature fondamentalement ambiguë et variée des informations manipulées (langage naturel ou images à interpréter, points de vue multiples…).
Si les approches d’apprentissage statistique sont monnaie courante pour des tâches de classification ou d’extraction d’information, elles doivent faire face à des espaces vectoriels creux et de dimension très élevées (plusieurs millions), être en mesure d’exploiter des ressources (par exemple des lexiques ou des thesaurus) et tenir compte ou produire des annotations sémantiques qui devront pouvoir être réutilisées.
Pour faire face à ces enjeux, des infrastructures ont été créées telle HumaNum à l’échelle nationale, DARIAH ou CLARIN à l’échelle européenne et des recommandations établies à l’échelle mondiale telle que la TEI (Text Encoding Initiative). Des plateformes au service de l’information scientifique comme l’équipement d’excellence OpenEdition.org sont une autre brique essentielle pour la préservation et l’accès aux « Big Digital Humanities » mais aussi pour favoriser la reproductibilité et la compréhension des expérimentations et des résultats obtenus.
Slides from my lecture for the Information Retrieval and Data Mining course at University College London
The slides cover introductory concepts on topic models, vector semantics and basic end applications
Mining online data for public health surveillanceVasileios Lampos
Online user-generated data (search engine activity, social media) offers health-related insights that traditional health monitoring systems cannot always provide. A central paradigm for this has been the estimation of the rate of a disease in a population by modelling the corresponding online search engine activity. However, the various inaccuracies in the estimates of Google Flu Trends (GFT) ―a popular approach for estimating influenza-like illness (ILI) rates from online search logs― have generated a debate as to whether this is something feasible. In this talk, we will first go through GFT and try to explain why it has been unsuccessful. We will then present alternative approaches which deploy more adequate statistical as well as semantic modelling techniques to successfully capture ILI rates both in the US and England. Long-term empirical analysis will showcase that models retain a strong predictive performance across many consecutive flu seasons. Expanding on the previous approaches, we will also present a brief overview of more recent work, where multi-task learning across different geographies is used to further improve ILI models, especially in situations where sporadic historical health reports are available.
Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...Vasileios Lampos
This paper presents a method to classify social media users based on their socioeconomic status. Our experiments are conducted on a curated set of Twitter profiles, where each user is represented by the posted text, topics of discussion, interactive behaviour and estimated impact on the microblogging platform. Initially, we formulate a 3-way classification task, where users are classified as having an upper, middle or lower socioeconomic status. A nonlinear, generative learning approach using a composite Gaussian Process kernel provides significantly better classification accuracy (75%) than a competitive linear alternative. By turning this task into a binary classification - upper vs. medium and lower class - the proposed classifier reaches an accuracy of 82%.
An introduction to digital health surveillance from online user-generated con...Vasileios Lampos
This talk provides a brief introduction on methods for performing health surveillance tasks using online user-generated data. Four case studies are being presented: a) the original Google Flu Trends model, b) a basic model for mapping Twitter data to influenza rates, c) an improved Google Flu Trends model, and d) a method for assessing the impact of a health intervention from Internet data.
Extracting interesting concepts from large-scale textual dataVasileios Lampos
Over the past few years large-scale textual resources, such as news articles, social media, search queries or even digitised books, have been the centre of various research efforts. In particular, the open nature of the microblogging platform of Twitter provided the unique opportunity for various appealing ideas to be evaluated. Based on the hypothesis that this online stream of content should represent at least a biased fraction of real-world situations or opinions, we have proposed core algorithms for estimating the current rate of an infectious disease, such as influenza, or even of a natural, less stable phenomenon like rainfall rates. A simplified emotion analysis on a longitudinal set of tweets revealed interesting patterns, including signs of rising anger and fear before the UK riots in August, 2011. A similar analysis on the Google N-gram book database uncovered collective patterns of affect over the course of the 20th century. Through the extension of linear text regression approaches by the addition of a user dimension, we proposed a family of bilinear regularised regression models, which found application in the approximation of voting intention trends. Finally, we reversed the previous modelling principle in an attempt to qualitatively analyse user attributes or behaviours.
Assessing the impact of a health intervention via user-generated Internet con...Vasileios Lampos
Assessing the effect of a health-oriented intervention by traditional epidemiological methods is commonly based only on population segments that use healthcare services. Here we introduce a complementary framework for evaluating the impact of a targeted intervention, such as a vaccination campaign against an infectious disease, through a statistical analysis of user-generated content submitted on web platforms. Using supervised learning, we derive a nonlinear regression model for estimating the prevalence of a health event in a population from Internet data. This model is applied to identify control location groups that correlate historically with the areas, where a specific intervention campaign has taken place. We then determine the impact of the intervention by inferring a projection of the disease rates that could have emerged in the absence of a campaign. Our case study focuses on the influenza vaccination program that was launched in England during the 2013/14 season, and our observations consist of millions of geo-located search queries to the Bing search engine and posts on Twitter. The impact estimates derived from the application of the proposed statistical framework support conventional assessments of the campaign.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Deep Software Variability and Frictionless Reproducibility
Bilinear text regression and applications
1. Bilinear Text Regression
and Applications
Vasileios Lampos
Department of Computer Science
University College London
March, 2015
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 1/45
1
/45
7. Regression basics — Ridge Regression (2/2)
• observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX
• responses yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β]
Ridge Regression (RR)
argmin
www∗
XXX∗www∗ − yyy 2
2
+ λ www 2
2
+++ size constraint on the weight coefficients (regularisation)
→ resolves problems caused by collinear variables
+++ less degrees of freedom, better predictive accuracy than OLS
−−− does not perform feature selection (nonzero coefficients)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 7/45
7
/45
8. Regression basics — Lasso
• observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX
• responses yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β]
111–norm regularisation or lasso (Tibshirani, 1996)
argmin
www,β
n
i=1
yi − β −
m
j=1
xijwj
2
+ λ
m
j=1
|wj|
or argmin
www∗
XXX∗www∗ − yyy 2
2
+ λ www 1
−−− no closed form solution — quadratic programming problem
+++ Least Angle Regression explores entire reg. path (Efron et al., 2004)
+++ sparse www, interpretability, better performance (Hastie et al., 2009)
−−− if m > n, at most n variables can be selected
−−− strongly corr. predictors → model-inconsistent (Zhao & Yu, 2009)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 8/45
8
/45
9. Regression basics — Lasso for Text Regression
• n-gram frequencies xxxi ∈ Rm, i ∈ {1, ..., n} — XXX
• flu rates yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β]
111–norm regularisation or lasso
or argmin
www∗
XXX∗www∗ − yyy 2
2
+ λ www 1
‘unwel’, ‘temperatur’, ‘headach’, ‘appetit’, ‘symptom’, ‘diarrhoea’, ‘muscl’, ‘feel’, ...
180 200 220 240 260 280 300 320 340
0
50
100
Day Number (2009)
Flurate
HPA
Inferred
0 10 20 30 40 50 60 70 80 90
0
50
100
150
Days
Flurate
HPA
Inferred
A B C D E
Figure 1 : Flu rate predictions for the UK by applying lasso on Twitter data
(Lampos & Cristianini, 2010)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 9/45
9
/45
10. Regression basics — Elastic Net
• observations xxxi ∈ Rm, i ∈ {1, ..., n} — XXX
• responses yi ∈ R, i ∈ {1, ..., n} — yyy
• weights, bias wj, β ∈ R, j ∈ {1, ..., m} — www∗ = [www; β]
[Linear] Elastic Net (LEN)
(Zhou & Hastie, 2005)
argmin
www∗
XXX∗www∗ − yyy 2
2
OLS
+ λ1 www 2
2
RR reg.
+ λ2 www 1
Lasso reg.
+++ ‘compromise’ between ridge regression (handles collinear
predictors) and lasso (favours sparsity)
+++ entire reg. path can be explored by modifying LAR
+++ if m > n, number of selected variables not limited to n
−−− may select redundant variables!
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 10/45
10
/45
11. Would a slightly different text regression approach
be more suitable for Social Media content?
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 11/45
11
/45
12. About Twitter (1/2)
Tweet Examples
@PaulLondon: I would strongly support a coalition government. It is the best
thing for our country right now. #electionsUK2010
@JohnsonMP: Socialism is something forgotten in our country #supportLabour
@FarageNOT: Far-right ‘movements’ come along with crises in capitalism #UKIP
@JohnK 1999: RT @HannahB: Stop talking about politics and listen to Justin!!
Bieber rules, peace and love ♥ ♥ ♥
The Twitter basics:
• 140 characters per status (tweet)
• users follow and be followed
• embedded usage of topics (#elections)
• retweets (RT), @replies, @mentions, favourites
• real-time nature
• biased user demographics
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 12/45
12
/45
13. About Twitter (2/2)
Tweet Examples
@PaulLondon: I would strongly support a coalition government. It is the best
thing for our country right now. #electionsUK2010
@JohnsonMP: Socialism is something forgotten in our country #supportLabour
@FarageNOT: Far-right ‘movements’ come along with crises in capitalism #UKIP
@JohnK 1999: RT @HannahB: Stop talking about politics and listen to Justin!!
Bieber rules, peace and love ♥ ♥ ♥
• contains a vast amount of information about various topics
• this information (XXX) can be used to assist predictions (yyy)
(Lampos & Cristianini, 2012; Sakaki et al., 2010; Bollen et al., 2011)
−−− f : XXX → yyy, f usually formulates a linear regression task
−−− XXX represents word frequencies only...
+++ is it possible to incorporate a user contribution somehow?
word selection +++ user selection
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 13/45
13
/45
20. Multi-Task Learning
What
• Instead of learning/optimising a single task (one target variable)
• ... optimise multiple tasks jointly
Why (Caruana, 1997)
• improves generalisation performance exploiting domain-specific
information of related tasks
• a good choice for under-sampled distributions — knowledge
transfer
• application-driven reasons (e.g. explore interplay between political
parties)
How
• Multi-task regularised regression
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 20/45
20
/45
21. The 2,12,12,1-norm regularisation
WWW 2,1 =
m
j=1
WWWj 2 , where WWWj denotes the j-th row
2,12,12,1-norm regularisation
argmin
WWW,βββ
XXXWWW − YYY 2
F
+ λ
m
j=1
WWWj 2
• multi-task learning: instead of www ∈ Rm, learn WWW ∈ Rm×τ , where τ
is the number of tasks
• 2,1-norm regularisation, i.e. the sum of WWW’s row 2-norms (Argyriou
et al., 2008; Liu et al., 2009) extends the notion of group lasso (Yuan &
Lin, 2006)
• group lasso: instead of single variables, selects groups of variables
• ‘groups’ now become the τ-dimensional rows of WWW
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 21/45
21
/45
24. Bilinear Group 2,12,12,1 (BGL) (1/2)
• tasks τ ∈ Z+
• users p ∈ Z+
• observations QQQi ∈ Rp×m, i ∈ {1, ..., n} — XXX
• responses yyyi ∈ Rτ , i ∈ {1, ..., n} — YYY
• weights, bias uuuk,wwwj,βββ ∈ Rτ , k ∈ {1, ..., p} — UUU, WWW, βββ
j ∈ {1, ..., m}
argmin
UUU,WWW,βββ
τ
t=1
n
i=1
uuuT
t QQQiwwwt + βt − yti
2
+ λu
p
k=1
UUUk 2 + λw
m
j=1
WWWj 2
• BGL can be broken into 2 convex tasks: first learn {WWW,βββ}, then
{UUU,βββ} and vv + iterate through this process
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 24/45
24
/45
25. Bilinear Group 2,12,12,1 (BGL) (2/2)
argmin
UUU,WWW,βββ
τ
t=1
n
i=1
uuuT
t QQQiwwwt + βt − yti
2
+ λu
p
k=1
UUUk 2 + λw
m
j=1
WWWj 2
× ×
UUUT QQQi WWW
• a feature (user/word) is selected for all tasks (not just one), but
possibly with different weights
• especially useful in the domain of politics (e.g. user pro party A,
against party B)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 25/45
25
/45
27. Political Opinion/Voting Intention Mining — Brief Recap
Primary papers
• predict the result of an election via Twitter (Tumasjan et al., 2010)
• model socio-political sentiment polls (O’Connor et al., 2010)
• above 2 failed on 2009 US congr. elections (Gayo-Avello, 2011)
• desired properties of such models (Metaxas et al., 2011)
Features
• lexicon-based, e.g. using LIWC (Tausczik & Pennebaker, 2010)
• task-specific keywords (names of parties, politicians)
• tweet volume
reviewed in (Gayo-Avello, 2013)
−−− political descriptors change in time, differ per country
−−− personalised modelling (present in actual polls) missing
−−− multi-task learning?
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 27/45
27
/45
28. Voting Intention Modelling — Data (United Kingdom)
• 42K users distributed proportionally to regional population figures
• 60m tweets from 30/04/2010 to 13/02/2012
• 80, 976 unigrams (word features)
• 240 voting intention polls (YouGov)
• 3 parties: Conservatives (CON), Labour Party (LAB), Liberal
Democrats (LIB)
• main language: English
5 30 55 80 105 130 155 180 205 230
0
5
10
15
20
25
30
35
40
45
VotingIntention%
Time
CON
LAB
LIB
Figure 3 : Voting intention time series for the UK (YouGov)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 28/45
28
/45
29. Voting Intention Modelling — Data (Austria)
• 1.1K users manually selected by Austrian political analysts (SORA)
• 800K tweets from 25/01 to 01/12/2012
• 22, 917 unigrams (word features)
• 98 voting intention polls from various pollsters
• 4 parties: Social Democratic Party (SP¨O), People’s Party (¨OVP),
Freedom Party (FP¨O), Green Alternative Party (GR¨U)
• main language: German
5 20 35 50 65 80 95
0
5
10
15
20
25
30
VotingIntention%
Time
SPÖ
ÖVP
FPÖ
GRÜ
Figure 4 : Voting intention time series for Austria
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 29/45
29
/45
30. Voting Intention Modelling — Evaluation
• 10-fold validation
−−− train a model using data based on a set of contiguous polls A
−−− test on the next D = 5 polls
−−− expand training set to {A ∪ D}, test on the next |D | = 5 polls
• realistic scenario: train on past, predict future polls
• overall we test predictions on 50 polls (in each case study)
Baselines
• Bµµµ: constant prediction based on µ(yyy) in the training set
• Blast: constant prediction based on last(yyy) in the training set
• LEN: (linear) Elastic Net prediction (using word frequencies)
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 30/45
30
/45
31. Voting Intention Modelling — Performance tables
Average RMSEs on the voting intention percentage predictions in
the 10-step validation process
Table 1 : UK case study
CON LAB LIB µµµ
Bµµµ 2.272 1.663 1.136 1.69
Blast 2 2.074 1.095 1.723
LEN 3.845 2.912 2.445 3.067
BEN 1.939 1.644 1.136 1.573
BGL 1.7851.7851.785 1.5951.5951.595 1.0541.0541.054 1.4781.4781.478
Table 2 : Austrian case study
SP¨O ¨OVP FP¨O GR¨U µµµ
Bµµµ 1.535 1.373 3.3 1.197 1.851
Blast 1.1481.1481.148 1.556 1.6391.6391.639 1.536 1.47
LEN 1.291 1.286 2.039 1.1521.1521.152 1.442
BEN 1.392 1.31 2.89 1.205 1.699
BGL 1.619 1.0051.0051.005 1.757 1.374 1.4391.4391.439
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 31/45
31
/45
32. Voting Intention Modelling — Prediction figures
Polls BEN BGL
UK
5 10 15 20 25 30 35 40 45
0
5
10
15
20
25
30
35
40
VotingIntention%
Time
CON
LAB
LIB
5 10 15 20 25 30 35 40 45
0
5
10
15
20
25
30
35
40
VotingIntention%
Time
CON
LAB
LIB
5 10 15 20 25 30 35 40 45
0
5
10
15
20
25
30
35
40
VotingIntention%
Time
CON
LAB
LIB
Austria
5 10 15 20 25 30 35 40 45
0
5
10
15
20
25
30
VotingIntention%
Time
SPÖ
ÖVP
FPÖ
GRÜ
5 10 15 20 25 30 35 40 45
0
5
10
15
20
25
30
VotingIntention%
Time
SPÖ
ÖVP
FPÖ
GRÜ
5 10 15 20 25 30 35 40 45
0
5
10
15
20
25
30
VotingIntention%
Time
SPÖ
ÖVP
FPÖ
GRÜ
Figure 5 : Performance figures for BEN and BGL in the UK/Austria case studies
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 32/45
32
/45
33. Voting Intention Modelling — Qualitative evaluation
Party Tweet Score Author
CON PM in friendly chat with top EU mate, Sweden’s Fredrik Reinfeldt, before
family photo
1.334 Journalist
Have Liberal Democrats broken electoral rules? Blog on Labour com-
plaint to cabinet secretary
−0.991 Journalist
LAB I am so pleased to hear Paul Savage who worked for the Labour group
has been Appointed the Marketing manager for the baths hall GREAT
NEWS
−0.552 Politician
(Labour)
LBD RT @user: Must be awful for TV bosses to keep getting knocked back
by all the women they ask to host election night (via @user)
0.874 LibDem
MP
SP¨O Inflationsrate in ¨O. im Juli leicht gesunken: von 2,2 auf 2,1%. Teurer
wurde Wohnen, Wasser, Energie.
Translation: Inflation rate in Austria slightly down in July from 2,2 to
2,1%. Accommodation, Water, Energy more expensive.
0.745 Journalist
¨OVP kann das buch “res publica” von johannes #voggenhuber wirklich
empfehlen! so zum nachdenken und so... #europa #demokratie
Translation: can really recommend the book “res publica” by johannes
#voggenhuber! Food for thought and so on #europe #democracy
−2.323 User
FP¨O Neue Kampagne der #Krone zur #Wehrpflicht: “GIB BELLO EINE
STIMME!”
Translation: New campaign by the #Krone on #Conscription: “GIVE
WOOFY A VOICE!”
7.44 Political
satire
GR¨U Protestsong gegen die Abschaffung des Bachelor-Studiums Interna-
tionale Entwicklung: <link> #IEbleibt #unibrennt #uniwut
Translation: Protest songs against the closing-down of the bachelor
course of International Development: <link> #IDremains #uniburns
#unirage
1.45 Student
Union
Table 3 : Scored tweet examples from both case studies using BGL
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 33/45
33
/45
35. Socioeconomic Patterns — Data
News Summaries
• Open Europe Think Tank: summaries of news articles on EU or
member countries (focus on politics, perhaps right-wing biased!)
• from February 2006 to mid-November 2013
1913 days or 94 months or 888 years
• involving 435435435 international news outlets
• extracted 8, 413 unigrams and 19, 045 bigrams
Socioeconomic Indicators
• EU Economic Sentiment Indicator (ESI)
→ predictor for future economic developments (Gelper & Croux, 2010)
→ consists of 5 weighted confidence sub-indicators:
◦ industrial (40%), services (30%), consumer (20%)
construction (5%), retail trade (5%)
• EU Unemployment — seasonally adjusted ratio of the non
employed over the entire EU labour force
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 35/45
35
/45
36. Socioeconomic Patterns — Task description
+++ qualitative differences to voting intention modelling
◦ aim is NOT to predict socioeconomic indicators
◦ characterise news by conducting a supervised analysis on them driven
by socioeconomic factors
+++ use predictive performance as an informal guarantee that the
model is reasonable
+++ the better the predictive performance, the more trustful the
extracted patterns should be
Slightly modified BEN
argmin
ooo≥0,www,β
n
i=1
oooT
QQQiwww + β − yi
2
+ λo1 ooo 2
2
+ λo2 ooo 1
+ λw1 www 2
2
+ λw2 www 1
• min (ooo) ≥ 0 to enhance weight interpretability for both news
outlets and n-grams
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 36/45
36
/45
37. Socioeconomic Patterns — Predictive performance
• similar evaluation as in voting intention prediction
• differences: time frame is now a month, train using a moving
window of 64 contiguous months, test on the next 3 months
• make predictions for a total of 30 months
2007 2008 2009 2010 2011 2012 2013
0
50
100
actual
predictions
2007 2008 2009 2010 2011 2012 2013
0
5
10
actual
predictions
Figure 6 : Monthly rates of EU-wide ESI (right) and Unemployment (left)
together with BEN’s predictions for the last 30 months
ESI Unemployment
LEN 9.253 (9.89%) 0.9275 (8.75%)
BEN 8.2098.2098.209 (8.77%) 0.90470.90470.9047 (8.52%)
Table 4 : 10-fold validation average RMSEs (and error rates) for LEN and BEN
on ESI and unemployment rates prediction
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 37/45
37
/45
38. Socioeconomic Patterns — Qualitative analysis (ESI)
Frequency
Word
Outlet
Weight
a a
Polarity
Yes
Yes
+ -
Figure 7 : Visualisation of BEN’s outputs for EU’s ESI in the last fold (i.e. model
trained on 64 months up to August 2013). The word cloud depicts the top-60
positively and negatively weighted n-grams (120) in total together with the top-30
outlets.
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 38/45
38
/45
39. Socioeconomic Patterns — Qualitative analysis (Unempl.)
Frequency
Word
Outlet
Weight
a a
Polarity
Yes
Yes
+ -
Figure 8 : Visualisation of BEN’s outputs for EU-Unemployment in the last fold
(i.e. model trained on 64 months up to August 2013). The word cloud depicts the
top-60 positively and negatively weighted n-grams (120) in total together with the
top-30 outlets.
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 39/45
39
/45
40. Conclusions
+++ introduced a new class of methods for bilinear text regression
+++ directly applicable to Social Media content
+++ or other types of textual content such as news articles
+++ better predictive performance than the linear alternative (in the
investigated case studies)
+++ extended to bilinear multi-task learning
To do
−−− investigate finer grained modelling settings by applying different
regularisation functions (or different combinations of them)
−−− further understand the properties of bilinear versus linear text
regression, e.g. when and why is it a good choice or how different
combinations of regularisation settings affect performance
−−− task-specific improvements
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 40/45
40
/45
41. In collaboration with
Trevor Cohn, University of Melbourne
Daniel Preot¸iuc-Pietro, University of Pennsylvania
Sina Samangooei, Amazon Research
Douwe Gelling, University of Sheffield
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 41/45
41
/45
42. Thank you
Any questions?
Download the slides from
http://www.lampos.net/research/talks-posters
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 42/45
42
/45
43. References I
Al-Khayyal and Falk. Jointly Constrained Biconvex Programming. MOR, 1983.
Argyriou, Evgeniou and Pontil. Convex multi-task feature learning. Machine Learning,
2008.
Beck and Teboulle. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear
Inverse Problems. J. Imaging Sci., 2009.
Bermingham and Smeaton. On using Twitter to monitor political sentiment and predict
election results. SAAIP, 2011.
Bollen, Mao and Zeng. Twitter mood predicts the stock market. JCS, 2011.
Caruana. Multitask Learning. Machine Learning, 1997.
Efron, Hastie, Johnstone and Tibshirani. Least Angle Regression. The Annals of
Statistics, 2004.
Gayo-Avello. A Meta-Analysis of State-of-the-Art Electoral Prediction From Twitter
Data. SSCR, 2013.
Gayo-Avello, Metaxas and Mustafaraj. Limits of Electoral Predictions using Twitter.
ICWSM, 2011.
Gelper and Croux. On the construction of the European Economic Sentiment Indicator.
OBES, 2010.
Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. 2009.
Hoerl and Kennard. Ridge regression: Biased estimation for nonorthogonal problems.
Technometrics, 1970.
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 43/45
43
/45
44. References II
Horst and Tuy. Global Optimization: Deterministic Approaches. 1996.
Lampos and Cristianini. Tracking the flu pandemic by monitoring the Social Web. CIP,
2010.
Lampos and Cristianini. Nowcasting Events from the Social Web with Statistical
Learning. ACM TIST, 2012.
Lampos, Preot¸iuc-Pietro and Cohn. A user-centric model of voting intention from Social
Media. ACL, 2013.
Lampos, Preot¸iuc-Pietro, Samangooei, Gelling and Cohn. Extracting Socioeconomic
Patterns from the News: Modelling Text and Outlet Importance Jointly. ACL
LACSS, 2014.
Liu, Ji and Ye. Multi-task feature learning via efficient 2,12,12,1-norm minimization. UAI,
2009.
Mairal, Jenatton, Obozinski and Bach. Network Flow Algorithms for Structured Sparsity.
NIPS, 2010.
Metaxas, Mustafaraj and Gayo-Avello. How (not) to predict elections. SocialCom, 2011.
O’Connor, Balasubramanyan, Routledge and Smith. From Tweets to polls: Linking text
sentiment to public opinion time series. ICWSM, 2010.
Pirsiavash, Ramanan and Fowlkes. Bilinear classifiers for visual recognition. NIPS, 2009.
Quesada and Grossmann. A global optimization algorithm for linear fractional and
bilinear programs. JGO, 1995.
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 44/45
44
/45
45. References III
Sakaki, Okazaki and Matsuo. Earthquake shakes Twitter users: real-time event
detection by social sensors. WWW, 2010.
Tausczik and Pennebaker. The Psychological Meaning of Words: LIWC and
Computerized Text Analysis Methods. JLSP, 2010.
Tibshirani. Regression Shrinkage and Selection via the LASSO. JRSS, 1996.
Tumasjan, Sprenger, Sandner and Welpe. Predicting elections with Twitter: What 140
characters reveal about political sentiment. ICWSM, 2010.
Yuan and Lin. Model selection and estimation in regression with grouped variables.
JRSS, 2006.
Zhao and Yu. On model selection consistency of LASSO. JMLR, 2006.
Zhou and Hastie. Regularization and variable selection via the elastic net. JRSS, 2005.
v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 45/45
45
/45