Social media sentiment and consumer confidencePiet J.H. Daas
Presentation on the association between the sentiment in public Dutch social media messages and Dutch consumer confidence. Given at the ECB conference on Big data and forecasting and statistics in Frankfurt
LiMoSINe Press kit introduces this project that integrates the studies of leading researchers over diverse topics with a view to enable new kinds of language-based technology search. Now we are developing 5 demonstrators: ORMA, ThemeStreams, FlickrDemo, DEESSE and Streamwatchr. http://limosine-project.eu/
EPIDEMIC OUTBREAK PREDICTION USING ARTIFICIAL INTELLIGENCEijcsit
Intelligent Models for predicting diseases whether building a model to help the doctor or even preventing its spread in an area globally, is increasing day by day. Here we present a noble approach to predict the disease prone area using the power of Text Analysis and Machine Learning. Epidemic Search model using the power of the social network data analysis and then using this data to provide a probability score of the spread and to analyse the areas whether going to suffer from any epidemic spread-out, is the main focus of this work. We have tried to analyse and showcase how the model with different kinds of pre-processing and algorithms predict the output. We have used the combination of words-n grams, word embeddings and TFIDF with different data mining and deep learning algorithms like SVM, Naïve Bayes and RNN-LSTM. Naïve Bayes with TF-IDF performed better in comparison to others.
Information Contagion through Social Media: Towards a Realistic Model of the ...Axel Bruns
Paper by Axel Bruns, Patrik Wikström, Peta Mitchell, Brenda Moon, Felix Münch, Lucia Falzon, and Lucy Resnyansky presented at the ACSPRI 2016 conference, Sydney, 19-22 July 2016/
A MODEL BASED ON SENTIMENTS ANALYSIS FOR STOCK EXCHANGE PREDICTION - CASE STU...csandit
Predicting the behavior of shares in the stock market is a complex problem, that involves variables not always known and can undergo various influences, from the collective emotion to high-profile news. Such volatility, can represent considerable financial losses for investors. In order to anticipate such changes in the market, it has been proposed various mechanisms to try to predict the behavior of an asset in the stock market, based on previously existing information.
Such mechanisms include statistical data only, without considering the collective feeling. This article, is going to use natural language processing algorithms (LPN) to determine the collective mood on assets and later with the help of the SVM algorithm to extract patterns in an attempt to predict the active behavior. Nevertheless it is important to note that such approach is not intended to be the main factor in the decision making process, but rather an aid tool, which combined with other information, can provide higher accuracy for the solution of this problem
A MODEL BASED ON SENTIMENTS ANALYSIS FOR STOCK EXCHANGE PREDICTION - CASE STU...cscpconf
Predicting the behavior of shares in the stock market is a complex problem, that involves variables not always known and can undergo various influences, from the collective emotion to the high-profile news. Such volatility can represent considerable financial losses for investors. In order to anticipate such changes in the market, it has been proposed various mechanisms to try to predict the behavior of an asset in the stock market, based on previously existing information. Such mechanisms include statistical data only, without considering the collective feeling. This article is going to use natural language processing algorithms (LPN) to determine the collective mood on assets and later with the help of the SVM algorithm to extract patterns in an
attempt to predict the active behavior. Nevertheless it is important to note that such approach is not intended to be the main factor in the decision making process, but rather an aid tool, which combined with other information, can provide higher accuracy for the solution of this problem.
Intelligent Models for predicting diseases whether building a model to help the doctor or even preventing its spread in an area globally, is increasing day by day. Here we present a noble approach to predict the disease prone area using the power of Text Analysis and Machine Learning. Epidemic Search model using the power of the social network data analysis and then using this data to provide a probability score of the spread and to analyse the areas whether going to suffer from any epidemic spread-out, is the main focus of this work. We have tried to analyse and showcase how the model with different kinds of pre-processing and algorithms predict the output. We have used the combination of words-n grams, word embeddings and TFIDF with different data mining and deep learning algorithms like SVM, Naïve Bayes and RNN-LSTM. Naïve Bayes with TF-IDF performed better in comparison to others.
Social media sentiment and consumer confidencePiet J.H. Daas
Presentation on the association between the sentiment in public Dutch social media messages and Dutch consumer confidence. Given at the ECB conference on Big data and forecasting and statistics in Frankfurt
LiMoSINe Press kit introduces this project that integrates the studies of leading researchers over diverse topics with a view to enable new kinds of language-based technology search. Now we are developing 5 demonstrators: ORMA, ThemeStreams, FlickrDemo, DEESSE and Streamwatchr. http://limosine-project.eu/
EPIDEMIC OUTBREAK PREDICTION USING ARTIFICIAL INTELLIGENCEijcsit
Intelligent Models for predicting diseases whether building a model to help the doctor or even preventing its spread in an area globally, is increasing day by day. Here we present a noble approach to predict the disease prone area using the power of Text Analysis and Machine Learning. Epidemic Search model using the power of the social network data analysis and then using this data to provide a probability score of the spread and to analyse the areas whether going to suffer from any epidemic spread-out, is the main focus of this work. We have tried to analyse and showcase how the model with different kinds of pre-processing and algorithms predict the output. We have used the combination of words-n grams, word embeddings and TFIDF with different data mining and deep learning algorithms like SVM, Naïve Bayes and RNN-LSTM. Naïve Bayes with TF-IDF performed better in comparison to others.
Information Contagion through Social Media: Towards a Realistic Model of the ...Axel Bruns
Paper by Axel Bruns, Patrik Wikström, Peta Mitchell, Brenda Moon, Felix Münch, Lucia Falzon, and Lucy Resnyansky presented at the ACSPRI 2016 conference, Sydney, 19-22 July 2016/
A MODEL BASED ON SENTIMENTS ANALYSIS FOR STOCK EXCHANGE PREDICTION - CASE STU...csandit
Predicting the behavior of shares in the stock market is a complex problem, that involves variables not always known and can undergo various influences, from the collective emotion to high-profile news. Such volatility, can represent considerable financial losses for investors. In order to anticipate such changes in the market, it has been proposed various mechanisms to try to predict the behavior of an asset in the stock market, based on previously existing information.
Such mechanisms include statistical data only, without considering the collective feeling. This article, is going to use natural language processing algorithms (LPN) to determine the collective mood on assets and later with the help of the SVM algorithm to extract patterns in an attempt to predict the active behavior. Nevertheless it is important to note that such approach is not intended to be the main factor in the decision making process, but rather an aid tool, which combined with other information, can provide higher accuracy for the solution of this problem
A MODEL BASED ON SENTIMENTS ANALYSIS FOR STOCK EXCHANGE PREDICTION - CASE STU...cscpconf
Predicting the behavior of shares in the stock market is a complex problem, that involves variables not always known and can undergo various influences, from the collective emotion to the high-profile news. Such volatility can represent considerable financial losses for investors. In order to anticipate such changes in the market, it has been proposed various mechanisms to try to predict the behavior of an asset in the stock market, based on previously existing information. Such mechanisms include statistical data only, without considering the collective feeling. This article is going to use natural language processing algorithms (LPN) to determine the collective mood on assets and later with the help of the SVM algorithm to extract patterns in an
attempt to predict the active behavior. Nevertheless it is important to note that such approach is not intended to be the main factor in the decision making process, but rather an aid tool, which combined with other information, can provide higher accuracy for the solution of this problem.
Intelligent Models for predicting diseases whether building a model to help the doctor or even preventing its spread in an area globally, is increasing day by day. Here we present a noble approach to predict the disease prone area using the power of Text Analysis and Machine Learning. Epidemic Search model using the power of the social network data analysis and then using this data to provide a probability score of the spread and to analyse the areas whether going to suffer from any epidemic spread-out, is the main focus of this work. We have tried to analyse and showcase how the model with different kinds of pre-processing and algorithms predict the output. We have used the combination of words-n grams, word embeddings and TFIDF with different data mining and deep learning algorithms like SVM, Naïve Bayes and RNN-LSTM. Naïve Bayes with TF-IDF performed better in comparison to others.
Similar to Extracting information from ' messy' social media data (20)
The elusive 'Data Scientist' is a word that pops up more and more. Is this a buzzword or is something really changing in the world? Piet Daas of the CBS will take us on a tour of the changes that he sees around him.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Extracting information from ' messy' social media data
1. Piet Daas, Marco Puts, Ali Hürriyetoglu
Extracting information from
‘messy’ social media data
2. Using Big Data for official statistics
– Can and how can we use Big Data for the production of
official statistics?
– Statistics Netherlands produces reliable and consistent
statistical information
‐ The official statistics of the country
– These figures are based on target populations
‐ E.g. the country, its inhabitants and its companies
– We want to use as much data as is (freely) available
‐ Less questionnaires, use more administrative and Big Data
Combining this is challenging !!2
3. It is important to know that
– Statistics Netherlands is the first organization that has
produced a Big Data based official statistics
‐ Road sensor data based traffic intensity statistics
– Statistics Netherlands is the leading organisation in the
official statistical world regarding the use of Big Data
– Have recently created a ‘Center for Big Data Statistics’
‐ With many partners involved (> 30)
3
4. Pros and cons of using Big Data
– Positive (2 of the 3 V’s)
‐ A lot of data
‐ Readily available
– Negative
‐ Variety (not that stable)
‐ Potentially biased (selective part of population)
‐ Most are event based (e.g. message oriented, not user)
‐ Little information is available on the users
‐ It’s a challenging data source for producing statistics with
high quality!
4
5. Big Data studies on Social media
– Statistics oriented
‐ Social media sentiment and Consumer Confidence
‐ Social media based (un)safety monitor
– Population oriented
‐ Users (People, Companies and Others)
‐ Determining background characteristics
‐ We use twiqs.nl, Coosto and Twitter API
5
6. Social media in the Netherlands
Map by Eric Fischer (via Fast Company)
7. Social media sentiment
– Studied public Dutch social media collected by Coosto
‐ Not only Twitter, but also Facebook, etc.
‐ Looked at the sentiment (+/-/n) in these messages
‐ Studied the change in overall sentiment over time
‐ Around 3-4 million messages per day
‐ Overall sentiment = (pos. messages – neg. messages)/total
(%)
‐ Day/week/month
7
10. Table 1. Social media messages properties for various platforms and their correlation with consumer confidence
Correlation coefficient of
Social media platform Number of social Number of messages as monthly sentiment index and
media messages1
percentage of total (%) consumer confidence ( r )2
All platforms combined 3,153,002,327 100 0.75 0.78
Facebook 334,854,088 10.6 0.81* 0.85*
Twitter 2,526,481,479 80.1 0.68 0.70
Hyves 45,182,025 1.4 0.50 0.58
News sites 56,027,686 1.8 0.37 0.26
Blogs 48,600,987 1.5 0.25 0.22
Google+ 644,039 0.02 -0.04 -0.09
Linkedin 565,811 0.02 -0.23 -0.25
Youtube 5,661,274 0.2 -0.37 -0.41
Forums 134,98,938 4.3 -0.45 -0.49
1
period covered June 2010 untill November 2013
2
confirmed by visual inspecting scatterplots and additional checks (see text)
*cointegrated
Platform specific results
10 Combination of Facebook andTwitter is best (r > 0.9)
(association continues after that period)
11. Overall findings
11
– Correlation and cointegration
‐ Consumer confidence survey is conducted during first 2 weeks of a month
‐ Comparing various periods revealed that best correlation and cointegration
is with last 2 weeks of previous month and first 2 weeks of current month
• Highest correlation 0.93* (all Facebook * filteredTwitter)
– Granger causality
‐ Changes in Consumer confidence precede changes in Social media
sentiment
‐ For all combinations shown!
• However: social media is quicker available to us!
– Prediction
‐ Slightly better than random chance
‐ Best for the 4th ‘week’ of month
12. (Un)safety feeling in social media
– Interviewed people and create a list of words associated
with feelings of (un)safety (347)
– Checked if these words are used in social media (81)
– Only included the most frequently used words (24)
– First version of indicator
‐ Need to: Check context of messages included
‐ Need to: Compare height of peaks with ‘severity’ of
event
13. Unsafety monitor (first version)
Bomb airport
Brussel
22-03-2016
Truck attack
Nice
14-07-2016
Terrorist attacks
Paris
14-11-2015
Intruder NOS
29-01-2015
Charlie Hebdo
Paris
09-01-2015
MH17 day of
National mourning
23-07-2014
Spain-Neth.
Football (1-5)
13-06-2014
13
14. (Un)safety feeling in social media (2)
– Interviewed people and create a list of words associated
with feelings of (un)safety (347)
– Check if these words are used in social media (81)
– Only include the frequently used words (24)
– First version of indicator
‐ Need to: Check context of messages included
‐ Want to: Compare height of peaks with other data
15. Population studies
– Looked at composition of the units active on Twitter
– Type of units
‐ People, companies/organizations, and others
– Tried to determine background characteristics
‐ Not many units provide such information directly
‐ E.g. gender, age, income, level of education etc.
15
16. Starting point
– Draw a sample of a 1000 user id’s from Twitter
‐ Had a list of 330.000 from a previous study
– It was found that:
‐ 844 still existed
• 691 are persons (82%)
• 119 are companies/organizations (14%)
• 34 are ‘others’ (4%)
• Tried to determine gender
16
18. Gender findings: 1) First name
• Used Dutch ‘Voornamenbank’ website (First name database)
• Score between 0 and 1 (female – male); 676 of 844 (80%) names were registered
• Unknown names scored -1 (usually companies/organizations)
19. Gender findings: 2) Short bio
– If a short bio is provided
– Quite a number of people mention there ‘position’ in the family
‐ Mother, father, papa, mama, ‘son of’, etc.
– Need to check both English and Dutch texts
– 155 of 583 (27%) indicated there gender in short bio
‐ Very precise for women!!
19
20. Gender findings: 3) Tweets content
– In cooperation with University ofTwente (Dong Nguyen)
– Machine learning approach that checks gender specific writing style
‐ Language specific: Messages need to be Dutch!
‐ 437 of 473 (92%) persons that created tweets could be classified
21. Gender findings: 4) Profile picture
– Use OpenCV to process pictures
– 1) Face recognition
– 2) Standardisation of faces (resize & rotate)
– 3) Classify faces according to gender
– - 603 of 804 (75%) profile pictures had 1 or more faces on it
1
2
3
22. Gender findings: overall results (1)
Diagnostic Odds Ratio =
(TP/FN) / (FP/TN)
Random guessing
log(DOR) = 0
‐ Multi-agent findings
• Need ‘clever’ ways to combine these
• Take processing efficiency of the ‘agent’ into consideration
Diagnostic Odds
Ratio (log)
First name 4.33
Short bio 2.70
Tweet content 1.96
Picture (faces) 0.57
22
23. Gender findings: overall results (2)
Combine results in the best possible way
Unassigned (%) Approach used
844 (100%) 1. Use short bio scores (very precise for females)
689 (82%) 2. Use first name scores
153 (18%) 3. Use Tweet content
29 (3.4%) 4. Use picture
20 (2.4%) 5. Assign male gender
Final log(DOR) is 7.02, an accuracy of 96.5%!
23
24. Conclusions and future studies
– Social media is one of the most challenging data sources
for official statistics
– Using it requires that we:
‐ Focus on the information available
‐ Think outside the box (i.e. sentiment study)
– Good source to study potential ways to correct for the
selectivity of Big Data sources
– In future studies we will be looking at:
‐ Sentiment, unsafety and more. Population
composition, population dynamics and other
background characteristics
24