SlideShare a Scribd company logo
1 of 43
Download to read offline
Processing queries to search engine of Yandex:
possibilities of analysis and forecast
AINL FRUCT 2016
Boldyreva Anna (RANEPA-MIPT)
Content
Introduction
Databases
Tasks of analysis
Tasks of forecast
Conclusion
INTRODUCTION
Terminology
Search query is a request made by an Internet user to
obtain information from a search engine; statistics on
search queries are obtained from services from search
engines:
https://www.google.ru/trends/
https://adwords.google.com/
http://wordstat.yandex.ru/.
Descriptor is a word or a phrase that forms part of
search queries introduced by users;
INTRODUCTION
Terminology
Indicators are economic, social, demographic and other
indicators that are analyzed or forecasted by analysts and
researchers;
Top-rated lists of descriptors are search queries that are
the most highly correlated with selected indicators;
Barometer is the mean value of the normalized dynamics
of the top-rated selection.
INTRODUCTION
Hypothesis
There is a stable statistical dependence between the intensity of
search queries and real-world events and social processes.
Fig. 1. The dynamics of the descriptor ‘swimsuit’ in
U.S.A.: peaks in February and May-June
INTRODUCTION
INTRODUCTION
Relevance
We can use search queries
• for monitoring the economic situation in regions in real time
avoiding difficulties related to the lack of data, as explained
above;
• for parallel control of official information, which allows to reveal
distortions introduced by official institutions;
• for forecasting economic, demographic and social parameters
during a crisis period;
• for forecasting dynamics of various socio-economic and socio-
political processes;
• for analysis of other countries. Here we do not need official data
that is published with delay.
State of the art
2009 — Google has launched a service showing pest holes
in real time based on the intensity of queries from different
regions
2009 — H. Choi and H. Varian introduced the first model
predicting fluctuations in business cycles with the help of
search queries statistics
2011 — D. Engelberg, C. Da and P. Gao demonstrated that
analyzing the dynamics in Google searches for companies
gives a 10% advantage to traders;
INTRODUCTION
State of the art
2011 — Michael Stolbov (MGIMO-University) demonstrated
the feasibility of using Google search statistics to explain the
dynamics of aggregated financial indicators (for example,
deposits of individuals).
2013 — Tobias Prize demonstrated the work "Complex
dynamics of our economic life on different scales: insights
from search engine query data“;
The work is dedicated to the market shares; he analyzed
outbursts in searches «Subprime», «Lehman Brothers» and
«Financial Crisis», followed by a drop in S&P 500 Index.
INTRODUCTION
Content
Introduction
Databases
Tasks of analysis
Tasks of forecast
Conclusion
Domain-oriented databases of descriptors
• economic terms — 25000 SQ;
• juridical terms — 4500 SQ;
• crime articles — 365 SQ;
• well-known brands and goods — 3013 SQ;
• emotions:
with positive tonality — 400 SQ
with negative tonality — 400 SQ;
• slang used in finance, computers and other fields — 3300 SQ;
• medical terms —1600 SQ.
DATABASES
Technical databases
• lemmas —18638 ПЗ;
• n-grams ( n=2,3,..8 ) of letters and syllables ~ 90 000 ПЗ.
Lemma – the initial form of the word
Examples: avant-garde, sauna, drum, dune, velvet, bass, basketball, the
battalion commander, a comet, a compass, Icon, contour, piggy, mop,
cordon
n-gram
Examples: ев, ег, ед, ее, еж, ти, тк, тл, тм, тн, то, авв, авг, авп, бре, бри,
бро, век, вел, вес, лак, лал, лам, лан, лао, лап, греч, декс, сдел, кром
Emotion words with positive tonality
Examples: good, great, beautiful, holiday, goodness, beauty, super, fun,
cool, happy, dream, luck, well, success, joy, laugh, nice
Emotion words with negative tonality
Examples: chaos, amoral, immoral, sabotage, punishment, violation,
cattle, schmuck, moron, hopeless, useless, helpless
DATABASES
Barometers
Examples of words that got into the "barometer" with direct positive
correlation with the indicator “Consumer Price Index":
"treat" – 0.93
"okmarket" (hypermarket) – 0.91
"pariet" (drug for ulcer) – 0.89
"patents" – 0.87
"mfbank" (commercial bank) – 0.87
"headhunter" (site to find job) – 0.86
"pediashur" (baby food) – 0.86
"convenient" – 0.86
"often" – 0.85
"close" – 0.85
DATABASES
Barometers
Examples of words that got into the "barometer" with direct positive
correlation with the indicator “Consumer Price Index":
"chemical" (british musical duet) – -0.92
"artofvar" (musical group of war veterans) – -0.91
"incest" – -0.87
"group" – -0.87
"babylon" (the italian brand of clothing) – -0.87
“young child" – -0.86
"diprivan" (a sedative) – -0.86
"ilarauto" (van selling) – -0.86
"miss" – -0.86
DATABASES
Bases of indicators
• Retail trade turnover (mln of roubles); 
• Consumer Price Index;
• Entrepreneurs Price Index on industrial products;
• Entrepreneurs Price Index on minerals;
• Unemployment (thousands);
• Sales of new passenger cars and light commercial vehicles (units)
• per capite income (thousands of roubles);
• The dollar/ruble exchange rate. (USDTOM_UTS); 
• Brent price (ICE.Brent), USD/баррель;
DATABASES
Базы индикаторов
• newborns (thousands);
• marriages (thousands);
• real activity (thousands);
• deaths (thousands); 
• registered economic crimes
DATABASES
Programes
1. The program for the collection of search queries’ dynamics from
the statistical service of Yandex;
2. The program for the automatic processing of the files and the
formation of an Excel spreadsheet;
3. The program for the automatic processing of the tables and
selection of top search queries.
DATABASES
Content
Introduction
Databases
Tasks of analysis
Tasks of forecast
Conclusion
Distribution of positive searches on the correlation with the indicator
“Retail turnover"
Statistics of queries by regions
ANALYSIS
Values of correlation coefficients are located on the ordinate axis.
The number of positive descriptors with corresponding level of
correlation relative to the indicator "Turnover of retail trade“ are located
on the horizontal axis.
ANALYSIS
Statistics of queries by regions
Query statistics on domain-oriented databases
Example: distribution of queries from the database "Brands and products"
relative to the indicators, with which there is a high level of correlation.
Observation: newlyweds are buying more than young parents
ANALYSIS
Example: distribution of queries with a high level of correlation with the
indicator "Sales of new cars» through thematic databases.
Observation: the active usage of slang, a variety of products/services
Consumer profiling
ANALYSIS
спазм диафрагмы +21% вертиго +54%
потеря вкусовых
ощущений
+108% горький вкус во
рту
+52%
дежурный врач +105% телефон аптеки +46%
полный пульс +91% приемный покой +46%
кашель с желтой
мокротой
+73% маниакальная
фаза
+45%
онемение шеи +70% нафтизин +43%
нечувствительность +70% кровотечение из
ушей
+38%
ночная потливость +69% вызвать врача +38%
стерильные бинты +67% эфералган +37%
абстинент +53% лекарства купить +36%
дежурная больница +55% свистящее
дыхание
+34%
Excess frequency of search
queries on the base of medical
terms in Leningrad
Region compared to
Data for Russia
Data for Russia are accepted for
100%
Increased mortality in Leningrad Region
ANALYSIS
Content
Introduction
Databases
Tasks of analysis
Tasks of forecast
Conclusion
Group method of data handling (GMDH)
allows to select the model of optimal complexity in a given class of
models to describe the current set of experimental data
Polynomial class of models:
where x = {xi | i = 1, … , m} is a set of indicators
and w = (wi , wij, wijk, … | i, j, k = 1, … , m) is a weight vector.
FORECAST
GMDH Shell actualize GMDH
Possibilities:
• Approximation
• Extrapolation
• Classification
http://www.gmdhshell.com
Main constructor: Candidate of Technical Sciences Koshulko A.A.
Program GMDH Shell
FORECAST
1st criterion: MAPE (mean absolute percentage error):
𝑀𝐴𝑃𝐸 =
1
𝑁
𝑦𝑡 − 𝑦𝑡
𝑦𝑡
∗ 100%
𝑁
𝑡=1
,
where N is sample size, 𝑦𝑡 is real value for 𝑡, 𝑦𝑡 is estimated value for 𝑡;
2nd criterion: P (one-month step forward forecast error):
𝑃 =
𝑦 𝑁+1 − 𝑦 𝑁+1
𝑦 𝑁+1
∗ 100%.
Error evaluation
FORECAST
Observations are pseudo mixed;
Checking method is cross-checking with two parts;
Internal criterion is OLS;
External criteria is RMSE (root mean squared error) with a
penalty in the form of the difference between the RMSE
value on training and examination parts of the sample;
Neuron function is linear;
The maximum number of layersis 6;
The initial layer width is 5.
Forecast settings
FORECAST
Neural algorithm with linear barometers
MAPE = 1.0%,
One-month forward forecast error P=1.8%
Forecast of retail turnover
FORECAST
Y1[t] = 64.4813 + Cm3m_pol[t-3]*40.7607 + N2*0.966131
N2[t] = -599.916 - Cm1m_pol[t-1]*410.022 + N3*1.3329
N3[t] = -45.4194 + N12*0.261196 + N4*0.759475
N4[t] = -99.6924 + ORT_PK_otr[t-2]*118.488 + N6*1.02163
N6[t] = 5.48719 - ORT_PK_pol[t-1]*181.667 + N10*1.0246
N10[t] = 1926.58 + Cm1m_pol[t-1]*1209.97 - Cm3m_otr[t-1]*241.064
N12[t] = 2327.24 + Cm3m_pol[t-3]*685.968 - Cm3m_otr[t-3]*581.729
FORECAST
Forecast of retail turnover
Neural algorithm with barometers in squared roots form
MAPE = 1.4%
One-month forward forecast error P=-2%.
Forecast of USD/ruble exchange course
FORECAST
Y1[t] = -2.49737e-10 + N2*1
N2[t] = -769.864 + "Cm2_otr[t-2], sqrt"*522.473 + N3*1.09103
N3[t] = 2267.23 - "Cm3_otr[t-3], sqrt"*1563.91 + N6*0.738373
N6[t] = -5926.55 + ""$_PK_otr"[t-2], sqrt"*4826.58 + N10*1.48421
N10[t] = 8962.17 - "Cm2_otr[t-2], sqrt"*3666.07 - "Cm3_otr[t-4], sqrt"*2607.62
FORECAST
Forecast of USD/ruble exchange rate
Combinatorial algorithm with linear variables
MAPE = 4.5%
One-month forward error in forecast P=-2.9%.
Forecast of economic crimes
FORECAST
Y[t] = 5368.54 + Cm2_pol[t-1]*8610.05 + Cm2_pol[t-2]*4452.71 +
+Cm2_otr[t-2]*(-11350) + Cm3_pol[t-2]*11285.4
Forecast of economic crimes
FORECAST
Comparison of algorithms. MAPE
МАРЕ Neuro, no
roots
Neuro, with
squared
roots
Combi, no
roots
Combi, with
squared
roots
Retail turnover in Russia 1,0% 2,1% 2,3% 1,4%
Unemployment 0,8% 0,5% 0,9% 0,6%
Marriages 7,5% 6,9% 9,9% 8,3%
Real activity 0,1% 0,1% 0,1% 0,1%
Price Consumer Index 0,1% 0,1% 0,1% 0,1%
Entrepreneurs Price Index on industrial
products 0,3% 0,3% 0,3% 0,3%
Entrepreneurs Price Index on minerals 1,4% 1,4% 1,4% 1,4%
The dollar/ruble exchange rate 0,8% 1,4% 1,1% 1,2%
Newborns 1,2% 1,8% 1,9% 2,3%
Sales of new passenger cars and light
commercial vehicles 3,5% 1,6% 3,9% 5,4%
Per capita income 2,4% 1,6% 1,9% 1,0%
Economic crimes 3,4% 6,4% 4,5% 6,5%
Oil prices 1,0% 1,7% 1,7% 1,6%
FORECAST
One-month forward forecast Neuro, no
roots
Neuro, with
squared
roots
Combi, no
roots
Combi, with
squared roots
Retail turnover in Russia 1,8% -3,8% 3,8% -2,9%
Unemployment -2,8% 2,5% -0,9% 3,1%
Marriages -33,2% -20,4% -52,1% -20,2%
Real activity -0,3% -0,2% -0,3% -0,2%
Price Consumer Index 0,1% -0,2% 0,1% 0,7%
Entrepreneurs Price Index on industrial
products 2,3% -0,2% 4,1% -0,2%
Entrepreneurs Price Index on minerals 86,6% 37,0% 1,5% 37,0%
The dollar/ruble exchange rate 12,7% -2,0% 10,6% 17,3%
Newborns -0,7% -0,7% -0,6% 7,6%
Sales of new passenger cars and light
commercial vehicles 7,8% -16,9% 24,9% -100,0%
Per capita income 6,7% 10,8% 13,8% 23,6%
Economic crimes -16,5% 6,6% -2,9% 6,4%
Oil prices -5,3% -7,0% -14,0% -7,0%
Comparison. One-month forward deviations in forecast with barometers
FORECAST
Content
Introduction
Databases
Tasks of analysis
Tasks of forecast
Conclusion
CONCLUSION
Scientific results – 1 (databases)
Experimentally we have shown the possibilities of effective
implications of:
• Few domain-oriented databases instead of one;
• Bases of n-grams (𝑛 = 2,8);
• Significantly negatively correlated descriptors along with
significantly positively correlated descriptors;
CONCLUSION
Scientific results – 2 (analysis)
We suggested interpretation of results of statistic analysis:
• in the field of the evalution of reasons of increased
mortality at the beginning of 2015 in the regions of
Russia;
• in the field of the evalution of people’s reaction on retail
trade turnover;
• in the field of the revealing groups of consumers.
CONCLUSION
Scientific results – 3 (forecast)
Experimentally we have shown high accuracy of GMDH
algorithms, which allows such error levels as
~3%-6% in the best models of crimes;
~1%–4% in the models for economy and social indicators;
As the future work we consider
• proposing a technology to use the mentions of descriptors in
social media;
• developing a procedure for processing queries including
outliers related to major circumstances;
• developing models for fuzzy forecasting taking into account
qualitative dynamics of queries.
Future research
CONCLUSION
Thank you!
anna.boldyreva@phystech.edu
+7-916-542-37-64
X_pol – барометр с сильной прямой положительной корреляцией
относительно индикатора X
X_otr – барометр с сильной прямой отрицательной корреляцией
относительно индикатора X
Cmim_pol – барометр с сильной положительной корреляцией с
лагом в i месяцев относительно индикатора X
Cmim_otr – барометр с сильной отрицательной корреляцией с
лагом в i месяцев относительно индикатора X
Обозначения
FORECAST
Latest research papers
• Boldyreva A., Alexandrov M., Koshulko O., Sobolevskiy O.: Queries to Internet
as a tool for analysis of regional police work and forecast of crimes in regions:
Proc. of 15th Mexican Intern. Conf. on Artificial Intelligence, Springer, LNCS,
2016, 12 p. [to be published]
• Boldyreva A., Sobolevskiy O., Alexandrov M., Danilova V.: Creating collections
of descriptors based on Internet queries: Proc. of 15th Mexican Intern. Conf. on
Artificial Intelligence, Springer, LNCS, 11 p. [to be published]
• Boldyreva A.: An integral method for investigating attitudes of Internet users
based on search queries. “Mathematical modeling of social processes”, Proc.
of Sociological Faculty of MSU, Publ. House MSU (Moscow State Lomonosov
Univ.), 2016, vol. 18, pp. 26-34, [rus]
• Boldyreva A.: Building predictive models of economic and social conditions
based on the intensity of search queries to the Internet. “Modern economics:
theory, policy, innovation. Collection of student research papers”, Moscow,
Publ. House RANEPA, 2016, pp. 36-61, [rus]
• Boldyreva A., Alexandrov M., Surkova D.: Words with negative sentiment in
search queries to the Internet as an indicator of per capita income in the
Federal Districts of Russia. Inductive modeling of complex systems, NAS of
Ukraine, Kyev, 2015, vol. 7, pp. 77-92, [rus]

More Related Content

Viewers also liked

AINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, KazorinAINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, KazorinLidia Pivovarova
 
AINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, NikolenkoAINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, NikolenkoLidia Pivovarova
 
AINL 2016: Galinsky, Alekseev, Nikolenko
AINL 2016: Galinsky, Alekseev, NikolenkoAINL 2016: Galinsky, Alekseev, Nikolenko
AINL 2016: Galinsky, Alekseev, NikolenkoLidia Pivovarova
 
AINL 2016: Romanova, Nefedov
AINL 2016: Romanova, NefedovAINL 2016: Romanova, Nefedov
AINL 2016: Romanova, NefedovLidia Pivovarova
 
AINL 2016: Bastrakova, Ledesma, Millan, Zighed
AINL 2016: Bastrakova, Ledesma, Millan, ZighedAINL 2016: Bastrakova, Ledesma, Millan, Zighed
AINL 2016: Bastrakova, Ledesma, Millan, ZighedLidia Pivovarova
 
AINL 2016: Panicheva, Ledovaya
AINL 2016: Panicheva, LedovayaAINL 2016: Panicheva, Ledovaya
AINL 2016: Panicheva, LedovayaLidia Pivovarova
 
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...Lidia Pivovarova
 

Viewers also liked (20)

AINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, KazorinAINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, Kazorin
 
AINL 2016: Muravyov
AINL 2016: MuravyovAINL 2016: Muravyov
AINL 2016: Muravyov
 
AINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, NikolenkoAINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, Nikolenko
 
AINL 2016: Bugaychenko
AINL 2016: BugaychenkoAINL 2016: Bugaychenko
AINL 2016: Bugaychenko
 
AINL 2016: Skornyakov
AINL 2016: SkornyakovAINL 2016: Skornyakov
AINL 2016: Skornyakov
 
AINL 2016: Kozerenko
AINL 2016: Kozerenko AINL 2016: Kozerenko
AINL 2016: Kozerenko
 
AINL 2016: Galinsky, Alekseev, Nikolenko
AINL 2016: Galinsky, Alekseev, NikolenkoAINL 2016: Galinsky, Alekseev, Nikolenko
AINL 2016: Galinsky, Alekseev, Nikolenko
 
AINL 2016: Romanova, Nefedov
AINL 2016: Romanova, NefedovAINL 2016: Romanova, Nefedov
AINL 2016: Romanova, Nefedov
 
AINL 2016: Kuznetsova
AINL 2016: KuznetsovaAINL 2016: Kuznetsova
AINL 2016: Kuznetsova
 
AINL 2016: Bastrakova, Ledesma, Millan, Zighed
AINL 2016: Bastrakova, Ledesma, Millan, ZighedAINL 2016: Bastrakova, Ledesma, Millan, Zighed
AINL 2016: Bastrakova, Ledesma, Millan, Zighed
 
AINL 2016: Goncharov
AINL 2016: GoncharovAINL 2016: Goncharov
AINL 2016: Goncharov
 
AINL 2016: Panicheva, Ledovaya
AINL 2016: Panicheva, LedovayaAINL 2016: Panicheva, Ledovaya
AINL 2016: Panicheva, Ledovaya
 
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
 
AINL 2016: Yagunova
AINL 2016: YagunovaAINL 2016: Yagunova
AINL 2016: Yagunova
 
AINL 2016: Proncheva
AINL 2016: PronchevaAINL 2016: Proncheva
AINL 2016: Proncheva
 
AINL 2016: Strijov
AINL 2016: StrijovAINL 2016: Strijov
AINL 2016: Strijov
 
AINL 2016: Maraev
AINL 2016: MaraevAINL 2016: Maraev
AINL 2016: Maraev
 
AINL 2016: Khudobakhshov
AINL 2016: KhudobakhshovAINL 2016: Khudobakhshov
AINL 2016: Khudobakhshov
 
AINL 2016: Malykh
AINL 2016: MalykhAINL 2016: Malykh
AINL 2016: Malykh
 
AINL 2016: Filchenkov
AINL 2016: FilchenkovAINL 2016: Filchenkov
AINL 2016: Filchenkov
 

Similar to AINL 2016: Boldyreva

Multiple Linear Regression: a powerful statistical tool to understand and imp...
Multiple Linear Regression: a powerful statistical tool to understand and imp...Multiple Linear Regression: a powerful statistical tool to understand and imp...
Multiple Linear Regression: a powerful statistical tool to understand and imp...Monica Mazzoni
 
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...Dr. Haxel Consult
 
Es credit scoring_2020
Es credit scoring_2020Es credit scoring_2020
Es credit scoring_2020Eero Siljander
 
DATA ANALYTICS.pptx
DATA ANALYTICS.pptxDATA ANALYTICS.pptx
DATA ANALYTICS.pptxJoselitoTan2
 
Unit 2 Chapter 4.pdf
Unit 2 Chapter 4.pdfUnit 2 Chapter 4.pdf
Unit 2 Chapter 4.pdfmmdspgl
 
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationLFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationAmazon Web Services
 
1WR RapiTests for Sensory
1WR RapiTests for Sensory1WR RapiTests for Sensory
1WR RapiTests for SensoryAlexandre Khan
 
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRYSTATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRYkeerthana151
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series AnalysisAmanda Reed
 
Lobsters, Wine and Market Research
Lobsters, Wine and Market ResearchLobsters, Wine and Market Research
Lobsters, Wine and Market ResearchTed Clark
 
Quality Journey -Introduction to 7QC Tools2.0.pdf
Quality Journey -Introduction to 7QC Tools2.0.pdfQuality Journey -Introduction to 7QC Tools2.0.pdf
Quality Journey -Introduction to 7QC Tools2.0.pdfNileshJajoo2
 
Usage of open source software for Real World Data Analysis in pharmaceutical ...
Usage of open source software for Real World Data Analysis in pharmaceutical ...Usage of open source software for Real World Data Analysis in pharmaceutical ...
Usage of open source software for Real World Data Analysis in pharmaceutical ...Kees van Bochove
 
Introduction to Business Analytics-sample.pptx
Introduction to Business Analytics-sample.pptxIntroduction to Business Analytics-sample.pptx
Introduction to Business Analytics-sample.pptxabedeh1
 
softwares in public health
softwares in public healthsoftwares in public health
softwares in public healthPragyan Parija
 

Similar to AINL 2016: Boldyreva (20)

man0 ppt.pptx
man0 ppt.pptxman0 ppt.pptx
man0 ppt.pptx
 
Multiple Linear Regression: a powerful statistical tool to understand and imp...
Multiple Linear Regression: a powerful statistical tool to understand and imp...Multiple Linear Regression: a powerful statistical tool to understand and imp...
Multiple Linear Regression: a powerful statistical tool to understand and imp...
 
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
 
Es credit scoring_2020
Es credit scoring_2020Es credit scoring_2020
Es credit scoring_2020
 
DATA ANALYTICS.pptx
DATA ANALYTICS.pptxDATA ANALYTICS.pptx
DATA ANALYTICS.pptx
 
How Big Data revolutionizes decision support in tourism
How Big Data revolutionizes decision support in tourismHow Big Data revolutionizes decision support in tourism
How Big Data revolutionizes decision support in tourism
 
Go Predictive Analytics
Go Predictive AnalyticsGo Predictive Analytics
Go Predictive Analytics
 
Unit 2 Chapter 4.pdf
Unit 2 Chapter 4.pdfUnit 2 Chapter 4.pdf
Unit 2 Chapter 4.pdf
 
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationLFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
 
1WR RapiTests for Sensory
1WR RapiTests for Sensory1WR RapiTests for Sensory
1WR RapiTests for Sensory
 
210 2008 using text mining to classify requests and prepare semiautomatic ans...
210 2008 using text mining to classify requests and prepare semiautomatic ans...210 2008 using text mining to classify requests and prepare semiautomatic ans...
210 2008 using text mining to classify requests and prepare semiautomatic ans...
 
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRYSTATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
Lobsters, Wine and Market Research
Lobsters, Wine and Market ResearchLobsters, Wine and Market Research
Lobsters, Wine and Market Research
 
Demand forecasting
Demand forecastingDemand forecasting
Demand forecasting
 
General Statistics
General StatisticsGeneral Statistics
General Statistics
 
Quality Journey -Introduction to 7QC Tools2.0.pdf
Quality Journey -Introduction to 7QC Tools2.0.pdfQuality Journey -Introduction to 7QC Tools2.0.pdf
Quality Journey -Introduction to 7QC Tools2.0.pdf
 
Usage of open source software for Real World Data Analysis in pharmaceutical ...
Usage of open source software for Real World Data Analysis in pharmaceutical ...Usage of open source software for Real World Data Analysis in pharmaceutical ...
Usage of open source software for Real World Data Analysis in pharmaceutical ...
 
Introduction to Business Analytics-sample.pptx
Introduction to Business Analytics-sample.pptxIntroduction to Business Analytics-sample.pptx
Introduction to Business Analytics-sample.pptx
 
softwares in public health
softwares in public healthsoftwares in public health
softwares in public health
 

More from Lidia Pivovarova

Classification and clustering in media monitoring: from knowledge engineering...
Classification and clustering in media monitoring: from knowledge engineering...Classification and clustering in media monitoring: from knowledge engineering...
Classification and clustering in media monitoring: from knowledge engineering...Lidia Pivovarova
 
Convolutional neural networks for text classification
Convolutional neural networks for text classificationConvolutional neural networks for text classification
Convolutional neural networks for text classificationLidia Pivovarova
 
Grouping business news stories based on salience of named entities
Grouping business news stories based on salience of named entitiesGrouping business news stories based on salience of named entities
Grouping business news stories based on salience of named entitiesLidia Pivovarova
 
Интеллектуальный анализ текста
Интеллектуальный анализ текстаИнтеллектуальный анализ текста
Интеллектуальный анализ текстаLidia Pivovarova
 
AINL 2016: Shavrina, Selegey
AINL 2016: Shavrina, SelegeyAINL 2016: Shavrina, Selegey
AINL 2016: Shavrina, SelegeyLidia Pivovarova
 

More from Lidia Pivovarova (8)

Classification and clustering in media monitoring: from knowledge engineering...
Classification and clustering in media monitoring: from knowledge engineering...Classification and clustering in media monitoring: from knowledge engineering...
Classification and clustering in media monitoring: from knowledge engineering...
 
Convolutional neural networks for text classification
Convolutional neural networks for text classificationConvolutional neural networks for text classification
Convolutional neural networks for text classification
 
Grouping business news stories based on salience of named entities
Grouping business news stories based on salience of named entitiesGrouping business news stories based on salience of named entities
Grouping business news stories based on salience of named entities
 
Интеллектуальный анализ текста
Интеллектуальный анализ текстаИнтеллектуальный анализ текста
Интеллектуальный анализ текста
 
AINL 2016: Shavrina, Selegey
AINL 2016: Shavrina, SelegeyAINL 2016: Shavrina, Selegey
AINL 2016: Shavrina, Selegey
 
AINL 2016:
AINL 2016: AINL 2016:
AINL 2016:
 
AINL 2016: Grigorieva
AINL 2016: GrigorievaAINL 2016: Grigorieva
AINL 2016: Grigorieva
 
AINL 2016: Just AI
AINL 2016: Just AIAINL 2016: Just AI
AINL 2016: Just AI
 

Recently uploaded

Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 

Recently uploaded (20)

Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 

AINL 2016: Boldyreva

  • 1. Processing queries to search engine of Yandex: possibilities of analysis and forecast AINL FRUCT 2016 Boldyreva Anna (RANEPA-MIPT)
  • 2. Content Introduction Databases Tasks of analysis Tasks of forecast Conclusion INTRODUCTION
  • 3. Terminology Search query is a request made by an Internet user to obtain information from a search engine; statistics on search queries are obtained from services from search engines: https://www.google.ru/trends/ https://adwords.google.com/ http://wordstat.yandex.ru/. Descriptor is a word or a phrase that forms part of search queries introduced by users; INTRODUCTION
  • 4. Terminology Indicators are economic, social, demographic and other indicators that are analyzed or forecasted by analysts and researchers; Top-rated lists of descriptors are search queries that are the most highly correlated with selected indicators; Barometer is the mean value of the normalized dynamics of the top-rated selection. INTRODUCTION
  • 5. Hypothesis There is a stable statistical dependence between the intensity of search queries and real-world events and social processes. Fig. 1. The dynamics of the descriptor ‘swimsuit’ in U.S.A.: peaks in February and May-June INTRODUCTION
  • 6. INTRODUCTION Relevance We can use search queries • for monitoring the economic situation in regions in real time avoiding difficulties related to the lack of data, as explained above; • for parallel control of official information, which allows to reveal distortions introduced by official institutions; • for forecasting economic, demographic and social parameters during a crisis period; • for forecasting dynamics of various socio-economic and socio- political processes; • for analysis of other countries. Here we do not need official data that is published with delay.
  • 7. State of the art 2009 — Google has launched a service showing pest holes in real time based on the intensity of queries from different regions 2009 — H. Choi and H. Varian introduced the first model predicting fluctuations in business cycles with the help of search queries statistics 2011 — D. Engelberg, C. Da and P. Gao demonstrated that analyzing the dynamics in Google searches for companies gives a 10% advantage to traders; INTRODUCTION
  • 8. State of the art 2011 — Michael Stolbov (MGIMO-University) demonstrated the feasibility of using Google search statistics to explain the dynamics of aggregated financial indicators (for example, deposits of individuals). 2013 — Tobias Prize demonstrated the work "Complex dynamics of our economic life on different scales: insights from search engine query data“; The work is dedicated to the market shares; he analyzed outbursts in searches «Subprime», «Lehman Brothers» and «Financial Crisis», followed by a drop in S&P 500 Index. INTRODUCTION
  • 10. Domain-oriented databases of descriptors • economic terms — 25000 SQ; • juridical terms — 4500 SQ; • crime articles — 365 SQ; • well-known brands and goods — 3013 SQ; • emotions: with positive tonality — 400 SQ with negative tonality — 400 SQ; • slang used in finance, computers and other fields — 3300 SQ; • medical terms —1600 SQ. DATABASES Technical databases • lemmas —18638 ПЗ; • n-grams ( n=2,3,..8 ) of letters and syllables ~ 90 000 ПЗ.
  • 11. Lemma – the initial form of the word Examples: avant-garde, sauna, drum, dune, velvet, bass, basketball, the battalion commander, a comet, a compass, Icon, contour, piggy, mop, cordon n-gram Examples: ев, ег, ед, ее, еж, ти, тк, тл, тм, тн, то, авв, авг, авп, бре, бри, бро, век, вел, вес, лак, лал, лам, лан, лао, лап, греч, декс, сдел, кром Emotion words with positive tonality Examples: good, great, beautiful, holiday, goodness, beauty, super, fun, cool, happy, dream, luck, well, success, joy, laugh, nice Emotion words with negative tonality Examples: chaos, amoral, immoral, sabotage, punishment, violation, cattle, schmuck, moron, hopeless, useless, helpless DATABASES
  • 12. Barometers Examples of words that got into the "barometer" with direct positive correlation with the indicator “Consumer Price Index": "treat" – 0.93 "okmarket" (hypermarket) – 0.91 "pariet" (drug for ulcer) – 0.89 "patents" – 0.87 "mfbank" (commercial bank) – 0.87 "headhunter" (site to find job) – 0.86 "pediashur" (baby food) – 0.86 "convenient" – 0.86 "often" – 0.85 "close" – 0.85 DATABASES
  • 13. Barometers Examples of words that got into the "barometer" with direct positive correlation with the indicator “Consumer Price Index": "chemical" (british musical duet) – -0.92 "artofvar" (musical group of war veterans) – -0.91 "incest" – -0.87 "group" – -0.87 "babylon" (the italian brand of clothing) – -0.87 “young child" – -0.86 "diprivan" (a sedative) – -0.86 "ilarauto" (van selling) – -0.86 "miss" – -0.86 DATABASES
  • 14. Bases of indicators • Retail trade turnover (mln of roubles);  • Consumer Price Index; • Entrepreneurs Price Index on industrial products; • Entrepreneurs Price Index on minerals; • Unemployment (thousands); • Sales of new passenger cars and light commercial vehicles (units) • per capite income (thousands of roubles); • The dollar/ruble exchange rate. (USDTOM_UTS);  • Brent price (ICE.Brent), USD/баррель; DATABASES
  • 15. Базы индикаторов • newborns (thousands); • marriages (thousands); • real activity (thousands); • deaths (thousands);  • registered economic crimes DATABASES
  • 16. Programes 1. The program for the collection of search queries’ dynamics from the statistical service of Yandex; 2. The program for the automatic processing of the files and the formation of an Excel spreadsheet; 3. The program for the automatic processing of the tables and selection of top search queries. DATABASES
  • 18. Distribution of positive searches on the correlation with the indicator “Retail turnover" Statistics of queries by regions ANALYSIS
  • 19. Values of correlation coefficients are located on the ordinate axis. The number of positive descriptors with corresponding level of correlation relative to the indicator "Turnover of retail trade“ are located on the horizontal axis. ANALYSIS Statistics of queries by regions
  • 20. Query statistics on domain-oriented databases Example: distribution of queries from the database "Brands and products" relative to the indicators, with which there is a high level of correlation. Observation: newlyweds are buying more than young parents ANALYSIS
  • 21. Example: distribution of queries with a high level of correlation with the indicator "Sales of new cars» through thematic databases. Observation: the active usage of slang, a variety of products/services Consumer profiling ANALYSIS
  • 22. спазм диафрагмы +21% вертиго +54% потеря вкусовых ощущений +108% горький вкус во рту +52% дежурный врач +105% телефон аптеки +46% полный пульс +91% приемный покой +46% кашель с желтой мокротой +73% маниакальная фаза +45% онемение шеи +70% нафтизин +43% нечувствительность +70% кровотечение из ушей +38% ночная потливость +69% вызвать врача +38% стерильные бинты +67% эфералган +37% абстинент +53% лекарства купить +36% дежурная больница +55% свистящее дыхание +34% Excess frequency of search queries on the base of medical terms in Leningrad Region compared to Data for Russia Data for Russia are accepted for 100% Increased mortality in Leningrad Region ANALYSIS
  • 24. Group method of data handling (GMDH) allows to select the model of optimal complexity in a given class of models to describe the current set of experimental data Polynomial class of models: where x = {xi | i = 1, … , m} is a set of indicators and w = (wi , wij, wijk, … | i, j, k = 1, … , m) is a weight vector. FORECAST
  • 25. GMDH Shell actualize GMDH Possibilities: • Approximation • Extrapolation • Classification http://www.gmdhshell.com Main constructor: Candidate of Technical Sciences Koshulko A.A. Program GMDH Shell FORECAST
  • 26. 1st criterion: MAPE (mean absolute percentage error): 𝑀𝐴𝑃𝐸 = 1 𝑁 𝑦𝑡 − 𝑦𝑡 𝑦𝑡 ∗ 100% 𝑁 𝑡=1 , where N is sample size, 𝑦𝑡 is real value for 𝑡, 𝑦𝑡 is estimated value for 𝑡; 2nd criterion: P (one-month step forward forecast error): 𝑃 = 𝑦 𝑁+1 − 𝑦 𝑁+1 𝑦 𝑁+1 ∗ 100%. Error evaluation FORECAST
  • 27. Observations are pseudo mixed; Checking method is cross-checking with two parts; Internal criterion is OLS; External criteria is RMSE (root mean squared error) with a penalty in the form of the difference between the RMSE value on training and examination parts of the sample; Neuron function is linear; The maximum number of layersis 6; The initial layer width is 5. Forecast settings FORECAST
  • 28. Neural algorithm with linear barometers MAPE = 1.0%, One-month forward forecast error P=1.8% Forecast of retail turnover FORECAST
  • 29. Y1[t] = 64.4813 + Cm3m_pol[t-3]*40.7607 + N2*0.966131 N2[t] = -599.916 - Cm1m_pol[t-1]*410.022 + N3*1.3329 N3[t] = -45.4194 + N12*0.261196 + N4*0.759475 N4[t] = -99.6924 + ORT_PK_otr[t-2]*118.488 + N6*1.02163 N6[t] = 5.48719 - ORT_PK_pol[t-1]*181.667 + N10*1.0246 N10[t] = 1926.58 + Cm1m_pol[t-1]*1209.97 - Cm3m_otr[t-1]*241.064 N12[t] = 2327.24 + Cm3m_pol[t-3]*685.968 - Cm3m_otr[t-3]*581.729 FORECAST Forecast of retail turnover
  • 30. Neural algorithm with barometers in squared roots form MAPE = 1.4% One-month forward forecast error P=-2%. Forecast of USD/ruble exchange course FORECAST
  • 31. Y1[t] = -2.49737e-10 + N2*1 N2[t] = -769.864 + "Cm2_otr[t-2], sqrt"*522.473 + N3*1.09103 N3[t] = 2267.23 - "Cm3_otr[t-3], sqrt"*1563.91 + N6*0.738373 N6[t] = -5926.55 + ""$_PK_otr"[t-2], sqrt"*4826.58 + N10*1.48421 N10[t] = 8962.17 - "Cm2_otr[t-2], sqrt"*3666.07 - "Cm3_otr[t-4], sqrt"*2607.62 FORECAST Forecast of USD/ruble exchange rate
  • 32. Combinatorial algorithm with linear variables MAPE = 4.5% One-month forward error in forecast P=-2.9%. Forecast of economic crimes FORECAST
  • 33. Y[t] = 5368.54 + Cm2_pol[t-1]*8610.05 + Cm2_pol[t-2]*4452.71 + +Cm2_otr[t-2]*(-11350) + Cm3_pol[t-2]*11285.4 Forecast of economic crimes FORECAST
  • 34. Comparison of algorithms. MAPE МАРЕ Neuro, no roots Neuro, with squared roots Combi, no roots Combi, with squared roots Retail turnover in Russia 1,0% 2,1% 2,3% 1,4% Unemployment 0,8% 0,5% 0,9% 0,6% Marriages 7,5% 6,9% 9,9% 8,3% Real activity 0,1% 0,1% 0,1% 0,1% Price Consumer Index 0,1% 0,1% 0,1% 0,1% Entrepreneurs Price Index on industrial products 0,3% 0,3% 0,3% 0,3% Entrepreneurs Price Index on minerals 1,4% 1,4% 1,4% 1,4% The dollar/ruble exchange rate 0,8% 1,4% 1,1% 1,2% Newborns 1,2% 1,8% 1,9% 2,3% Sales of new passenger cars and light commercial vehicles 3,5% 1,6% 3,9% 5,4% Per capita income 2,4% 1,6% 1,9% 1,0% Economic crimes 3,4% 6,4% 4,5% 6,5% Oil prices 1,0% 1,7% 1,7% 1,6% FORECAST
  • 35. One-month forward forecast Neuro, no roots Neuro, with squared roots Combi, no roots Combi, with squared roots Retail turnover in Russia 1,8% -3,8% 3,8% -2,9% Unemployment -2,8% 2,5% -0,9% 3,1% Marriages -33,2% -20,4% -52,1% -20,2% Real activity -0,3% -0,2% -0,3% -0,2% Price Consumer Index 0,1% -0,2% 0,1% 0,7% Entrepreneurs Price Index on industrial products 2,3% -0,2% 4,1% -0,2% Entrepreneurs Price Index on minerals 86,6% 37,0% 1,5% 37,0% The dollar/ruble exchange rate 12,7% -2,0% 10,6% 17,3% Newborns -0,7% -0,7% -0,6% 7,6% Sales of new passenger cars and light commercial vehicles 7,8% -16,9% 24,9% -100,0% Per capita income 6,7% 10,8% 13,8% 23,6% Economic crimes -16,5% 6,6% -2,9% 6,4% Oil prices -5,3% -7,0% -14,0% -7,0% Comparison. One-month forward deviations in forecast with barometers FORECAST
  • 37. CONCLUSION Scientific results – 1 (databases) Experimentally we have shown the possibilities of effective implications of: • Few domain-oriented databases instead of one; • Bases of n-grams (𝑛 = 2,8); • Significantly negatively correlated descriptors along with significantly positively correlated descriptors;
  • 38. CONCLUSION Scientific results – 2 (analysis) We suggested interpretation of results of statistic analysis: • in the field of the evalution of reasons of increased mortality at the beginning of 2015 in the regions of Russia; • in the field of the evalution of people’s reaction on retail trade turnover; • in the field of the revealing groups of consumers.
  • 39. CONCLUSION Scientific results – 3 (forecast) Experimentally we have shown high accuracy of GMDH algorithms, which allows such error levels as ~3%-6% in the best models of crimes; ~1%–4% in the models for economy and social indicators;
  • 40. As the future work we consider • proposing a technology to use the mentions of descriptors in social media; • developing a procedure for processing queries including outliers related to major circumstances; • developing models for fuzzy forecasting taking into account qualitative dynamics of queries. Future research CONCLUSION
  • 42. X_pol – барометр с сильной прямой положительной корреляцией относительно индикатора X X_otr – барометр с сильной прямой отрицательной корреляцией относительно индикатора X Cmim_pol – барометр с сильной положительной корреляцией с лагом в i месяцев относительно индикатора X Cmim_otr – барометр с сильной отрицательной корреляцией с лагом в i месяцев относительно индикатора X Обозначения FORECAST
  • 43. Latest research papers • Boldyreva A., Alexandrov M., Koshulko O., Sobolevskiy O.: Queries to Internet as a tool for analysis of regional police work and forecast of crimes in regions: Proc. of 15th Mexican Intern. Conf. on Artificial Intelligence, Springer, LNCS, 2016, 12 p. [to be published] • Boldyreva A., Sobolevskiy O., Alexandrov M., Danilova V.: Creating collections of descriptors based on Internet queries: Proc. of 15th Mexican Intern. Conf. on Artificial Intelligence, Springer, LNCS, 11 p. [to be published] • Boldyreva A.: An integral method for investigating attitudes of Internet users based on search queries. “Mathematical modeling of social processes”, Proc. of Sociological Faculty of MSU, Publ. House MSU (Moscow State Lomonosov Univ.), 2016, vol. 18, pp. 26-34, [rus] • Boldyreva A.: Building predictive models of economic and social conditions based on the intensity of search queries to the Internet. “Modern economics: theory, policy, innovation. Collection of student research papers”, Moscow, Publ. House RANEPA, 2016, pp. 36-61, [rus] • Boldyreva A., Alexandrov M., Surkova D.: Words with negative sentiment in search queries to the Internet as an indicator of per capita income in the Federal Districts of Russia. Inductive modeling of complex systems, NAS of Ukraine, Kyev, 2015, vol. 7, pp. 77-92, [rus]