This document is a cover sheet for an assignment submission in a Big Data Analytics module. It includes the student's name and student number, module details, submission deadline, word count, number of pages, and topic of the assignment. The topic is an analysis of the retail sector using datasets from online social media, specifically Flickr. The assignment is organized into sections that introduce the datasets, describe the analysis and results, and discuss the findings. The results show a moderate positive correlation between shopping photos on Flickr and the actual retail performance index, indicating a relationship between social media behavior and retail sector performance.
1. Masters Programmes
Assignment Cover Sheet
Submitted by: 1557875
Module Title: Big Data Analytics
Module Code: IB9CSB
Date/Year of Module: Spring Term 2016
Submission Deadline: 29/03/2016
Word Count: 1842
Number of Pages: 7
Topic: Analysis on Retail Sector with Datasets from Online Social Media
“This is to certify that the work I am submitting is my own. All external references and
sources are clearly acknowledged and identified within the contents. I am aware of the
University of Warwick regulation concerning plagiarism and collusion.
No substantial part(s) of the work submitted here has also been submitted by me in
other assessments for accredited courses of study, and I acknowledge that if this has
been done an appropriate reduction in the mark I might otherwise have received will be
made.”
2. 1
INTRODUCTION
Retail business has always been an important role in a country’s economy. It is a sector
consistently changing with cultures, seasons, festivals, etc. In other words, the movement of
retail market is closely connected to consumers’ behaviour affected by many factors, and
which could be really complicated. Therefore, from the angle of decision making, information
is never good enough to be just accurate, it also has to be agile and able to provide the newest
market trends. Considering economic and business researches can be really time-consuming,
which is not likely to deliver up-to-date data with traditional methodology. A new approach is
adopted on the basis of the concept called ‘now-casting’.
The word ‘now-casting’ is a combination of two words, now and forecasting. It is the
prediction of the present, and also includes a short period of time from recent past to near
future (Giannone, et al. 2008). Now-casting has become more important recently in economic
and business studies in that most of the key statistics are only available after a certain time of
delay. The idea is exploiting timely information like surveys or polls to observe current
developments and therefore obtain early-stage estimations base on these type of data before
actual figures are released from authorities (Bańbura, et al. 2013). Alternatively, while the
popularity of online social media is growing faster than ever. As long as a feasible and
reasonable approach could be defined to quantify the information acquired. Various social
media platforms can become another channel of information gathering with the advantage of
lower conducting cost, higher update frequency, agility and accessibility compare to surveys
and polls.
From the perspective of statistics and data science, the objective of this analysis is to find
whether there is a positive correlation between the actual indicator of retail performance and
people’s behaviours observed from certain social media, which in this study, is Flickr. The
ideal result would be presenting a reliable and positive relationship between these two
datasets. Which can be used for complementation or adjustment to the real statistical data for
better projection in many forms. In order to further improve research quality related to
consumer behaviour or the strategy making process for retail marketing.
The essay is organized as follows. The second and third sections would introduce the
datasets chosen for this analysis and the selecting criteria. In the fourth section, analysis result
would be clarified with visualized graphs. Finally, comes to section five, extended discussion
base on the findings.
3. 2
SHOPPING PHOTOS ON FLICKR
Flickr is one of the most famous online platform of photos and videos sharing. Established
in 2004, its registered user has grown to more than 112 million people crossing 63 countries
at the end of 2015, sharing at least 10 billion photos and increasing with 1 million new uploaded
per day. The high level of user number and update frequency make Flickr a desirable data
source on researches involved with rapidly changing trends or abrupt incidents.
It is certain that various languages could be found on Flickr since it now has users from
63 countries. This means the search results on specific texts and contents may differ a lot from
languages and regions. Output data could be less representative in multi-language using
countries such as Switzerland. Therefore, it is recommended to focus research regions on
countries with majority official languages like France as French or the UK as English. In this
essay, the study would concentrate on the United Kingdom including England, Wales,
Scotland and Northern Ireland.
Furthermore, instead of analysing with the total photo counts by month directly retrieved
from Flickr, a further processed ratio is adopted. The ratio is calculated by dividing the counts
of photo marked with specific text which in this research, is ‘shopping’, by the counts of all
photo shared. This step could eliminate the influence from the factor of fluctuating user number
and image data. To deliver a neater output result which would be able to demonstrate a clear
pattern that we are looking for but without a most likely upward trend due to the increasing
user population.
Last but not the least, the data range of this analysis is chosen from 2006 to 2015 base
on the reason that Flickr might not be widely used straight away since launched in 2004.
4. 3
RETAIL TURNOVER INDEX
The index of retail trade turnover is a monthly measured business indicator retrieved from
Eurostat, a professional statistical office under the European Union, providing wide range of
research statistics covering the EU region and some other European countries. It is used to
estimate consumers’ spending on retail goods and presenting the performance of retail sector,
measured by indices with base value as 100 of 2010. The index is also widely used by both
private and public sector institutions, to assist in informed decision and policy making.
Retail turnover index consists of different subcategories for different purposes, including
food and beverages, clothing and textiles, recreational goods, etc. Considering food products
and fuel consumed by vehicles are the necessities goods, and will not have severe fluctuation
throughout times, removing these two categories would seem more logical for the original
objective. While many research would keen on seasonally or calendar adjusted data,
alleviating the periodical pattern within a year in order to highlight the possible trends from
long term perspective. This analysis, on the contrary, is looking forward to find a certain pattern
repeated over years. Finally, inflation rate is a potential factor that might causing a significant
gradient which obviously not desirable.
Base on these reasons, the deflated, unadjusted, retail turnover index excluding food
products and fuel of United Kingdom is chosen here, to conduct the comparison with the
processed data downloaded from Flickr as mentioned above.
5. 4
RESULT
This analysis demonstrates the changes over time of the amount of shopping photos and
the performance of retail sector by visualising the processed data. We are able to see that the
changes are not only periodical, but also matching to each other. Furthermore, statistical
analysis is implemented as the final step to measure the correlation between two datasets and
to reach the conclusion for the initial research objective.
From Figure 1, despite of fluctuations, we could still observe a significantly seasonal
pattern with peak values at every December, except one extremely high value at June of 2009.
This might be able to contribute to some events or sales promotions in the period of time.
Besides the peaks around the end of years, the seasonal pattern shown in Figure 2 consists
of another minor peak regularly occurs at every July. The graph is also neater with less
fluctuation. In addition to that, an upward trend from 2012 is also suggesting there are actually
more retail goods being purchased and consumed since the factor of inflation rate is already
removed from the data selected. It could be deemed as one of the signs of reviving economy
in the UK. These two graphs share the similar pattern and have the strongest consistency at
Decembers, but relatively, the rest of months are failed to deliver the same level of consistency.
Figure 1: ShoppingNormalised is calculated as a ratio of the numbers of photo tagged with “shopping” to the
numbers of all photos (shopping photos / all photos) in monthly basis.
6. 5
Figure 2: Retail sales of non-food products and except fuel (deflated turnover index) from 2006 to 2015, with
base year 2010=100. A slightly upward slope can be observed since 2012.
Finally, the relationship between shopping photo data and retail turnover index is
illustrated in Figure 3. The Pearson correlation test is used in order to analyse how exactly
these two datasets are associated. The p-value lower than 0.05 (<0.0004) indicates they are
significantly correlated. Thus, we are able to reject the null hypothesis which true correlation
is equal to 0, and accept the alternative hypothesis (true correlation is not equal to 0). The
coefficient (r = 0.32) falls at a level showing a moderate positive correlation between the
observation on Flickr and actual retail performance.
Figure 3: The comparison of 120 sets of monthly data of normalised photo counts of shopping and retail sales
index through 2006 to 2015. Demonstrating a moderate level of correlation (Pearson correlation coefficient r =
0.323, df = 118, p<0.0004). The regression line is presented in blue dashed line.
7. 6
DISCUSSION
In the analysis, similar patterns are found in both graphs, peak values are regularly
appeared in December. The correlation test also proved these two datasets are significantly
correlated. So it is consistent with the original assumption that there is a relationship between
the performance of retail sector and people’s behaviour observed from online social media.
However, this association only falls at a moderate level. The potential explanation of it can be
derived from the chosen social media. Comparing with Instagram, another famous image
sharing platform. Flickr is more professionally-oriented and mostly favoured by either amateur
or professional photographers. On the other hand, Instagram is more widely accepted by
general public with more than 400 million users, over 3 times of the number of the former.
Instagram may be more ideal to be adopted as data source in terms of its penetration level
and popularity.
In addition, although these two platforms were established under the same idea of sharing
images and videos, their characteristics are still quite distinct. Flickr is generally used for
recording activities and things which are memorable or with special meanings. While
Instagram is more like a channel of visualized communication, for people interacting with each
other through the stories behind pictures. “Instagram is a community conducive to likes and
comments, whereas Flickr focuses more on displaying collections of photographs in photo
streams, sets and galleries, organized by tags and maps.” expressly stated by Eler (2012).
Consequently, diverse angles of view have led to various composition of photos on these two
platforms. This difference reflects on most of the contents and hence, the texts, tags, or topics
of photos. That is, trivial things in daily lives are more likely to appear on Instagram rather than
Flickr. For instance, when people working out at gyms and taking selfies of their sweaty look.
This type of pictures would have better chance to be shared through Instagram in that people
are actually sharing their lives as a part of social networking. Likewise, the similar outcome
might happen on the topic of ‘shopping’. From this point of view, two possible reasons can be
deducted to clarify the previous finding about the similar pattern among two datasets. Firstly,
Christmas is the most important festival in most western countries including the UK. Therefore,
shopping for festive season is more meaningful at these days and worthy to be recorded.
Secondly, the countless banners and posts come along with promotions or advertisements
are also becoming the potential materials of pictures.
Thus, to further improve the quality of this research, it can start with comparing data
across various platforms of social media. Which would give us more clues about how people
using online social medias in order to infer and estimate their actual behaviours more
accurately.
8. 7
REFERENCES
l Bańbura, M. et al. 2013, Now-casting and the Real-time Data Flow, ECB Working Paper
No. 1564, European Central Bank, July, 2013, pp. 4-8.
l Eler, A. 2012, How Photographs on Instagram Differ from Flickr, ReadWrite, 27 April,
2012
Available at:
http://readwrite.com/2012/04/27/how_photographs_on_instagram_differ_from_flickr/
l Giannone, D. et al. 2008, Nowcasting: The Real-time Informational Content of
Macroeconomic Data, Journal of Monetary Economics, 55(4), pp. 665–676.
l Ruppert, D. and Matteson, D. S. 2015, Statistics and Data Analysis for Financial
Engineering with R Examples (Second Edition), Springer Science + Business Media, New
York.