Minerva Solution Technical Report

Minerva: An interactive Android solution for
enriching the people’s tourism experience
Julio César Carrasquel, Razieh Akbari
Università degli Studi di Roma La Sapienza,
Dipartimento di Ingegneria Informatica, Automatica
e Gestionale Antonio Ruberti (DIAG)
Via Ariosto 25, 00185 Rome, ITALY
{carrasquelgamez.1726154, akbari.1771868}@studenti.uniroma1.it
Abstract.- The paradigm of Pervasive Computing refers to the nowadays rich
ecosystem of different computer systems & electronic devices (mobiles, tablets,
microcontrollers, systems on a chip, sensors, etc.) that interact with the physical
environment in a fully connected & integrated scenario supported by the
plethora of current network protocols and computing processing technologies.
This new era of context-awareness computing has positively impacted sectors
such as health-care, food production and transportation to name a few; as a
result, new work & research areas have appeared under the terms of smart
buildings or smart cities. The work presented in this article lands specifically
within the area of smart tourism. Here we introduce Minerva, an Android
powered application whose purpose is to enrich the experience of people when
they do tourism providing them an interface in which they are able to search for
landmarks information in an useful & interactive way. For achieving this, the
user provides the landmark image as an input using the Android device camera.
Internally, in its first version, Minerva makes the landmark recognition task
using the Google Cloud Vision API. The first section of this work introduces the
problem scenario and our proposal. The second section presents the User
Experience Design of our solution. The third section describes the application
System design, and finally, the fourth section presents some conclusions and
reflections of the experience of having developed this application.
1 Introduction
1.1 The problem scenario
Within the context of Pervasive Computing it has appeared several terms such as smart buildings or smart cities.
These terms refer to the deployment of computing systems (a combination of devices with their respective
communication protocols) in physical spaces which interact directly or indirectly with users. As a class of these
terms, we introduce the notion of smart tourism. Its notion is analogous to that one of a smart city but now
thinking specifically for tourism. Many questions can be raised in this area of smart tourism. In which way we
can improve people’s experience while they are abroad visiting new places? How pervasive computing -and all
the related technologies that lie under this umbrella term- can contribute for maximizing and enriching what
people can gain while they make tourism? As it can be seen the questions are very open. Actually nowadays
there exists a plethora of projects that undergo within the area of smart tourism. They exploit networks and
computing processing capabilities in order to deliver useful solutions based on the user’s needs.
Usefulness and necessity are two important keywords that will guarantee the succeed of a project in this area.
Having the necessity of people, in this case tourists, as an input, a project within the area of smart tourism
evolves for conceiving a solution whose usefulness will satisfy the user needs. Indeed when making tourism

people faces many problems and necessities. From the very first moment in landing on an airport to the moment
in which people are walking through cultural landmarks on a city. Hence, through all that process, for
contributing in deliver a useful solution on smart tourism we concentrate us in a particular situation and problem.
The problem picked is about recognizing a place, technically known as landmark recognition. When tourists
walk through a city, and they visit cultural places it is very likely that people does not know in a first moment
which are those places that they have in front them. They would like to know the name of the place as quick as
possible as well as to have some information of it. As a natural input for landmark recognition a tourist,
hereinafter a user, can provide an image of the landmark to an “oracle” system whom will deliver him the place
name as well as the information related to it. Nowadays computing technologies permit to construct such
“oracle” system. As an input the image can be received through a camera. Then the image can be sent to a
remote engine which can process the image using data mining & machine learning techniques being supported
as well by some data sources. Finally, this “oracle” will send back to the user the name of the landmark as well
as additional information of it.
1.2 Minerva
Minerva [1] is an Android powered application that addresses the problem of landmark recognition. This within
the general idea of enriching the experience of users while they make tourism on a city, and more specifically to
satisfy the need of knowing about specific landmarks. As it has been said, the idea is to provide an image of the
landmark as an input. This is done by the device camera of the Android device. Then, the application
communicates to a remote engine, namely the Google Cloud Vision API [2]. The Google Cloud Vision API
provides several image analysis capabilities such as landmark recognition, face detection, optical character
recognition, etc. Minerva explicitly connects with the API’s landmark recognition service. Finally, the Cloud
Vision API responses to the Minerva application which is the place name plus the place coordinates. Having
gotten the place name and its coordinates the Minerva solution is able to exploit this data connecting now with
other API services provided on the Internet such as the Wikipedia API [3] or the Google Maps services [4]. For
example, the Wikipedia service is used for getting an extract, if is possible, for providing historical information
to the user about the place. On the other hand, the Google Map service is used for placing both the user and the
landmark in a map in order to giving localization notion to the user.
As it can be seen, the development of the Minerva application left a clear lesson. The idea is that whenever is
being developed a project, try to exploit as long as possible current available services on the web instead of
reinvent the wheel, leaving to Minerva the task of interacting with the user. For this project, those available
services are provided by Google and Wikipedia. Obviously, the use of these third-party services by our
application may change according to the growth and continuity of the project. Nonetheless, the way in which the
Minerva application has been conceived and developed leaves a stable base for future enhancements and
versions of the project.
Section 2 of this article proceeds to the present the UX Experience Design whereas Section 3 describes more
precisely the Application System design as well as it describes in detail the interaction with the third-party
services from Google and Wikipedia.
2 User Experience Design
The following section presents the Minerva graphical user interfaces as well as the application flow associated to
them by which the user can access to the functionalities of the solution, namely, finding the landmark name plus
its associated information given as an input the landmark image. The landmark image initially can be taken using
the device camera, or instead it has been added as well the functionality to let the user a previously taken image
from the user gallery. The process of coming up with these user interfaces within the User Experience design
phase is described in the resources found in [1] in which there was a procedure of testing and gathering of needs
with real users. The following figures present the resulting application user interfaces for this phase:

Figures 1-4. Figure 1 (upper-side left image) presents the main interface which is the start point within the
user flow. After taking the picture or selecting an image the application will continue to the resulting
interface shown in Figure 2 (upper-side right image). Through this interface the use gets informed about
the landmark name plus its information. From the this point the user can check directly the Wikipedia
article for that specific landmark (Figure 3, lower-side left image) or it can checked the landmark location
(Figure 4, lower-side right image)

Apart from the application user interfaces shown in Figures 1-4, there are other user interfaces that are not
shown in this article, and be checked in [1]. For example, when the user press the camera button it is launched a
camera interface that is in charge of providing the user with a camera view for taking the picture. Technically
speaking this refer to embedded the camera mirror image within the interface. Conversely, if the gallery button is
pressed then the application will launch a gallery interface in which is embedded the device gallery for selecting
a picture to be chosen.
In addition there is a settings interface that has been conceived for letting the user change some application
settings such as deciding whether or not to the save the image taken automatically, or for changing the
application language. There is as well an about-us interface providing information about the team members. For
both, the settings and the about-us section, they can be accessed within the overflow menu in the main interface
presented in Figure 1.
Within the notion of Android development, these interfaces actually correspond to activities. An activity in
Android manages the logic about a user can do within an user interface. Internally they are managed through
Java classes. The following figure provides a view of the application activities giving an insight of the
application flow.
Figure 5. Application activities sequence flow
First, it is launched the MainActivity whose associated user interface has been provided in Figure 1. Later on, the
user can either take a photo of the landmark or choose an image of the gallery. Having chosen the image, the
application will take the image as an input querying the landmark recognition service of the Google Vision API.
Having received a successful response, the application launches the ResultActivity whose associated user
interface is presented in Figure 2. Here it is shown the landmark name plus a related landmark information being
retrieved from the Wikipedia API. In addition the user here is able to save the taken image in the gallery for the
case in which he has taken the picture Finally, as an added value for this solution prototype it has been
developed a WikipediaArticleActivity (with its UI shown in Figure 3) for the case in which the user would like to
query the full article on Wikipedia; the other functionality developed was a Google Maps-based functionality
MapsActivity in which the user can see on the map which is the position of the landmark (see Figure 4).
A point to be considered is that this project has been conceived within the context of applying this solution for
the city of Rome, Italy. Thus, it can be seen that the application’s look & feel has been based for that purpose.
Nevertheless, since the Google Vision API’s landmark recognition feature works for any kind places globally,
then, the application look & feel can be adjusted in order to satisfy a global audience.
Main
Activity
Gallery
Activity
Camera
Activity
Result
Activity
Wikipedia
Article
Activity
Maps
Activity
Settings
Activity

3 Application System Design
This section introduces the proposed system design and architecture for the Minerva application specially
emphasizing the communication procedure that is carried out for accessing services from the third-parties. It is
described how the application is connected with the landmark recognition service of the Google Cloud Vision
API as well as how to query an article extract of a landmark using the Wikipedia API. The following figure
presents an insight over the interaction of the application with the back-end elements which correspond to the
third-party services.
Figure 6. Interaction of the application with the third-party services.
In the left side we have the Minerva solution which is installed in an Android device. In its first version the
application has been developed to work on Android platforms that goes from the Android API Level 17 (version
4.2) up to the API Level 25 (version 7.1). In addition to do that it is completely assumed the provision of a
camera on the device. The application runs according the activity flow sequence shown in the previous section.
Whenever the application selects an image (either from the camera through the CameraActivity or from the
device image gallery through the GalleryActivity) it is sent a HTTP POST request message to the Google Cloud
Vision API. The Google Cloud Vision API address is https://vision.googleapis.com/v1/images:annotate?
key=appKey where appKey refers to the API that is given to the application having registered it within the Cloud
Vision API internal system. The request message also transports a JSON message within its payload in which is
inserted the landmark image compressed as well as an indication about which feature of the Vision API we want
to use which in the application case is landmark recognition.
The following example provides an insight about the JSON which travels within this request message:
{
"requests": [
{
"image": {
"content": "/9j/7QBEUGhvdG9zaG9...base64-encoded-image-
content...fXNWzvDEeYxxxzj/Coa6Bax//Z"
},
"features": [
{
"type": "LANDMARK_DETECTION"
}
]
}
]
}
Android-powered
device (API 17-25) with
Minerva application
embedded
Google
Cloud Vision
API
Wikipedia
API
HTTP POST https://vision.googleapis.com/v1/images:annotate?key=appKey
Compressed image plus feature selection (landmark recognition)
in the HTTP body message (JSON encoded)
HTTP Response (for 200 OK: it returns the name plus coordinates,
JSON encoded)
HTTP GET https://en.wikipedia.org/w/api.php?..titles=landmarkName
HTTP Response (for 200 OK: it returns the name plus coordinates,
JSON encoded)

In case of having found the place, the API will response with a HTTP 200 OK message with the following
response also in JSON format:
{ "responses": [ { "landmarkAnnotations": [{
"mid": "/g/1hg4vfsw1",
"description": "Colosseum",
"score": 0.87093904,
"boundingPoly": {
"vertices": [...]
},
"locations": [
{
"latLng": {
"latitude": 37.802900859931917,
"longitude": -122.447777
}
[closing remaining brackets]
}
The description attribute of the JSON object in the response gives to the application the landmark name whereas
the locations attribute provides the coordinates of the place. There are other attributes that are not being currently
used by the application, but that in future application enhancements can be exploited. For example, the score
value provides a measurement (ranged between 0 and 1) about how sure the Vision engine is about that it has
given a correct response. In the tests carried out for this project this score value was always high. Conversely, for
very complicated and hard cases (images with very bad quality or images in which the landmark was pretty far
within the landscape) the response returned was not this JSON object. Other interesting attribute is the
boundingPoly which gives a set of vertices that identify the particular frame in which the landmark is located
within the image. More detailed information about how the call to the Vision API works for the landmark
recognition feature can be found in [5].
Having received the landmark name plus its coordinates, this is internally saved in the application state, and then
it is presented to the user the ResultActivity interface as it was described in Section 2. Within this module it is
called the service of the Wikipedia API as it was shown in the Figure x. For this case the request is a simple
HTTP GET message in which the request attributes are put in the URL query string. The following string is a
call example towards the Wikipedia API.
https://en.wikipedia.org/w/api.php?
format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Colosseum&redirects=1
The basic address is of the API is https://en.wikipedia.org/w/api.php. The rest of the string are the parameters
used for the call. The format parameter explains how we expect to receive the data, in this case we set JSON.
The action and prop attributes indicate that this call wants to query article extracts. The motivation of this
specification is because the Wikipedia API is more than query articles. The Wikipedia API provides other
services such as modify articles meta-data or even modify an article. The exintro parameter specify that we are
only interested just in the extract introduction instead of the whole article. The explaintext parameter indicates to
the service that we want to receive the article extract in plain-text format (e.g. without HTML or other special
tags).
The titles parameter indicates which is the article extract that we are looking for (e.g. the extract about the
Colosseum). Finally, the redirects parameter will redirect the search to the article extract with more similarity on
the name. This last parameter is very useful since we are working with data and names that come from different
systems (Google and Wikipedia respectively). For instance, if the Google Cloud Vision API gives as a response
“Trevi Fountain”, and in other hand the article name about the Trevi Fountain is called instead “The Fountain of
Trevi” the redirects parameter will achieve the match between these two strings thereby assuring a correct
response.

The response of this service call to the Wikipedia API will be a JSON object that will embedded the article in
one of its attributes. The parsing is handling by the application in order to show to the user the information about
the landmark.
Finally, the solution provides as well other two additional activities. If the user would like to access to the
complete whole article it can be launched the FullWikipediaArticle activity in which it will be displayed the
article web page regarding the landmark. For instance, visiting the Wikipedia page about the Colosseum. The
second added activity is a Google Maps powered activity in which we place both the user and the landmark in a
map. These two last added functionalities are just examples about the universe of services that we can use having
gotten the landmark name plus its coordinates. The idea and motivation is that the Minerva application can take
advantage of available services on the web in order to enrich the user experience. For deeper details regarding
the system design and how the application communicates with the services please download the source code
available which is freely available at [1].
4 Conclusions: project reflections and further remarks
The presented article introduced the Minerva application developed for Android platforms. It attempts to address
a specific problem within the wider context of smart tourism. In the very beginning of this article it was
introduced the current wave of Pervasive Computing, and in which areas it has been taken part of. Later, having
specifically landed on the smart tourism scenario this one attempts to resolves many problems and needs that
tourists can face whenever they travel for visiting a new place. Among all possible needs, Minerva plays the
particular role when tourists do sightseeing. The usefulness of the application relies on providing as quickly as
possible which is the landmark name and starting from that giving all possible information about that landmark.
In order to accomplish that task the application is supported by the Google Cloud Vision API and secondly using
the Wikipedia API for providing additional information about the landmark.
The project has had as a philosophy the user-centered design which goes in contrast with the scientific method.
The following items enumerates the phases analyzed within this approach.
• In this case we began with the problem. What are we trying to solve? There is the general problem
regarding smart tourism, and there is the specific problem of landmark recognition.
• Then we came up with the solution. What are the ways to solve this problem? Certainly there are many
ones, and we have chosen the development of an Android application as a solution. In spite of the
concept of Pervasive Systems is more tightened with other electronic devices such as microcontrollers
and sensors rather than Android devices, we aim to include our application within the wider context of
Pervasive System since at the end we are as well interacting with the user surrounding through the
capturing of images. Moreover, nowadays smart-phones and tablets provides other context-awareness
peripherals that let us interact with the environment (GPS, microphones, temperature sensors, etc.). This
enriches the alternatives that the application may have for future versions.
• The solution developed leaded us to the prototype and having it as a minimum viable product (MVP)
with its core functionalities we arrived to the feedback test phase in which we iteratively refined the
product based on user’s initial feedback in order to achieve a better outcome that will give more
probabilities to the use of the application.
Certainly, we have developed a stable product which accomplish the goals that was set at the beginning of its
development. Nevertheless, the challenge will be a future scenario to test it in a real production environment. In
such a case in which the application had not the expected feedback, again based on the user-centered design we
would have to go back to the previous phases of the user-centered design methodology and ask again to
ourselves, are we in the correct way to solve the problem? Having the user feedback as a support, there is an
alternative way to solve the specific problem that we have set?

Finally, again within the context of the user-centered methodology, there is the vision. Having this product as a
base which future can we create with this value? Minerva, as it has been said, is a promising base for being
enhanced in future versions. Adding more functionalities to the core existing one of landmark recognition may
add important aggregated values. However, this last statement let introduce us to another problem. A
technological one. Unfortunately, the Google Cloud Vision API is not free. Each thousand calls to the system for
feature request have a cost of 1.5$.
Therefore, for maintaining the application as free and at the same time for making the project profitable without
looses, there will be the challenge of changing the business model. For example, a promising idea would be that
whenever the user takes an image in a landmark, it is provided as well some publicity or advertisement of some
business or store that is near to that landmark, and for that publicity that business would have to pay. Other
solution may be instead stop relying in the Google Cloud Vision API and using other freely available services or
to build an own back-end.
In any case of possible business models, it has been demonstrated that this first version of the Minerva solution
serves as a base for future versions of the project addressing the problem of enriching the people’s tourism and
cultural experience within the context of smart tourism. Furthermore, the development of this solution has been
an inexhaustible source of learning for the members of this project.
5 Download
The application is freely available for download through the Google Play Store. In addition to that the project
source code is completely available for download as well within the project GitHub page located at [1].
6 References
[1] Minerva. An interactive Android-based for enriching the cultural experience in the Eternal City. GitHub
repository. Available at: https://smartrome.github.io/minerva/
[2] Google Cloud Platform: Google Cloud Vision API. https://cloud.google.com/vision/
[3] The MediaWiki Action API. Main page. Available at: https://www.mediawiki.org/wiki/API:Main_page.
[4] Google Maps API. https://developers.google.com/maps/
[5] Google Cloud Vision API Documentation. Detecting landmarks. Available at:
https://cloud.google.com/vision/docs/detecting-landmarks

Minerva Solution Technical Report

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Minerva Solution Technical Report

Similar to Minerva Solution Technical Report (20)

Recently uploaded

Recently uploaded (20)

Minerva Solution Technical Report