Bigowl aitech

IATECH MÁLAGA
BIGOWL: Using Semantics to
Develop Big Data Analytics
Solutions
José Manuel García Nieto
jnieto@lcc.uma.es

IATECH MÁLAGA
Outline
• Introduction
• Concepts and Background
• Current practices in Big Data analytics
• Semantic modelling
• Overall approach
• Validation: Case studies
• Discussions
• Conclusions
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions2

IATECH MÁLAGA
Introduction
• Motivation
• Gartner’s report: An emerging challenge in Big
Data is to construct data-driven intelligent
applications that capture and inject domain
knowledge in the analytical processes, including
context and using a standardized format
• Context refers to all the relevant (meta)-information
to support the analysis and to help interpreting its
results
• This will facilitate the integration (in a standardized
way) with third parties’ data, algorithms, business
intelligence (BI) and visualization services

IATECH MÁLAGA
Introduction
• Motivation
• The use of semantics as contextual information will
enhance the analytical power of the algorithms, as well as the
reuse of single components in data analytics workflows
(Ristoski & Paulheim, 2016)
• The development of ways to make the domain knowledge
explicit and usable is needed to improve the data processing
and analysis tasks
• The Semantic Web technology can be used to annotate not
only the knowledge domain of the data, but also the analytics’
meta-data (Keet, Ławrynowicz et al., 2015)

IATECH MÁLAGA
Introduction
• Motivation
• Companies have
already realised

IATECH MÁLAGA
Introduction
• Motivation
• Companies have
already realised
• Administration
(European
Commission) too

IATECH MÁLAGA
Introduction
• Motivation
• Companies have
already realised
• Administration
(European
Commission) too
https://www.big-data-europe.eu/ http://www.bigdataocean.eu/
http://www.semagrow.eu/

IATECH MÁLAGA
Introduction
• Motivation
• Companies have
already realised
• Administration
(European
Commission) too
• In Academics we
aim at going one
step further

IATECH MÁLAGA
Introduction
• Motivation
• In Academics we aim at going one step further
• The Semantic Web technologies can be used to annotate not only the knowledge
domain of the data, but also the analytics’ meta-data, including: algorithms’ parameters,
input variables, tuning experiences, expected behaviours and taxonomies
• This will facilitate the reuse and composition of Big Data
analytics in a proper manner
• As well as to enhance the quality of consumed and produced
data

IATECH MÁLAGA
Introduction
• Hypothesis:
The semantic annotation of Big Data sources, components and algorithms can
acts as a link to capture and incorporate the domain knowledge to guide and
enhance the analytical processes
• In addition, the semantic annotation can provide the background for
reasoning methods based on axiomatic and rule logic recommendations

IATECH MÁLAGA
Introduction
• Proposal:
• Semantic model: ontology-driven approach to support knowledge
management in Big Data analytics workflows
• The proposed ontology is called BIGOWL (BIG data analytics OWL 2 ontology),
which acts as a formal schema for the representation and consolidation of
knowledge in Big Data analytics
• Knowledge incorporation is in turn beneficial for an efficient algorithmic performance, by
taking part in operator’s design, parameter selection, human interactive and decision-
making strategies

IATECH MÁLAGA
Concepts and Background
• Different sites and people will talk about everything from artificial
intelligence to natural language processing to linked data and the Semantic
Web
• What are they all?
• How do they relate to each other?
• How do they relate to you?

IATECH MÁLAGA
• Different sites and people will talk about everything from artificial
intelligence to natural language processing to linked data and the Semantic
Web
• What are they all?
• How do they relate to each other?
• How do they relate to you?
The Semantic Web, Web 3.0, the Linked Data Web, the Web of Data…whatever
you call it, the Semantic Web represents the next major evolution in connecting
information

IATECH MÁLAGA
• How is the “Semantic Web” Different?
• The word semantic itself implies meaning or understanding
• Semantic Web is concerned with the
meaning and not the structure of data
(such as, relational databases or
the World Wide Web itself)

IATECH MÁLAGA
• What Standards Apply to the Semantic Web?
• Mainly 4 technical standards:
• An Ontology provides a formal representation of the real world
• It defines an explicit description of concepts in a domain of discourse
(classes or concepts), properties of each concept describing various
features and attributes of the concept (properties) and restrictions on
properties
• RDF (Resource Description Framework): The data modelling language for the
Semantic Web. All Semantic Web information is stored and represented in the
RDF
• SPARQL (SPARQL Protocol and RDF Query Language): The query language of the
Semantic Web. It is specifically designed to query data across various systems
• OWL (Web Ontology Language): The schema language, or knowledge
representation (KR) language, of the Semantic Web
• OWL enables you to define concepts composably so that these concepts
can be reused as much and as often as possible

IATECH MÁLAGA
• What Standards Apply to the Semantic
Web?
• Imagine two relational tables of two
different databases: movies and cinema
rooms
• Imagine a program can automatically query
your Web site and any other site that has
movie scheduling information in order to
show a complete view in one place
The goal of Linked Data is to publish structured
data in such a way that it can be easily
consumed and combined with other Linked
Data

IATECH MÁLAGA
• Linked Data is the Semantic Web realized via four best practice principles
• Use URIs as names for things.
• An example of a URI is any URL
• Use HTTP URIs so that people can look up those names
• When someone looks up a URI, provide useful information, using the standards such as
RDF* and SPARQL
• Include links to other URIs so that they can discover more things

IATECH MÁLAGA
• How is the “Semantic Web”
Different?
• Once all the rows of our tables
have been uniquely identified,
made dereferenceable through
HTTP, and described with RDF, the
last step is providing links
between different rows across
different tables
• The main aim here is to make
explicit those links that were
implicit before shifting to the
Linked Data approach. In our
example, movies would be linked
to the theatres in which they are
playing

IATECH MÁLAGA
• Once our tables have been so published, the Linked Data rules do
their magic: people across the Web can start referencing and
consuming the data in our rows easily
• If we go further and link from our movies to external popular data
sets such Wikipedia and IMDB then we make it even easier for
people and computers to consume our data and combine it with
other data
• This provides our data with context

IATECH MÁLAGA
Current practices in Big Data analytics
• In current Big Data
technology ecosystems,
when facing a specific
data analytic task, it is
usual to support on
already existing tools

IATECH MÁLAGA
Current practices in Big Data analytics
• Besides technological or commercial aspects, current Big Data platforms still follow
the common procedure when facing data analytics tasks (ACM-SIGKDD, 2014), which
comprises typical steps of classical KDD:
• data collection,
• data transformation,
• data mining,
• pattern evaluation, and
• knowledge presentation (Visualization)

IATECH MÁLAGA
Semantic modelling
• The semantic model: BIGOWL
• Ontological scheme driving the whole process of Big Data analytics
• It is the terminological box (TBox) that defines the vocabulary with concepts and properties in the
domain of Big Data analysis

IATECH MÁLAGA
Semantic modelling
• It is the terminological box (TBox) that
defines the vocabulary with concepts and
properties (relationships) in the domain of
Big Data analysis

IATECH MÁLAGA
Semantic modelling
• It is the terminological box (TBox)
that defines the vocabulary with
concepts and properties in the
domain of Big Data analysis

IATECH MÁLAGA
Overall approach

IATECH MÁLAGA
Validation: Case studies
• Case study 1: Streaming processing of New York City traffic open-data
• Dynamic version of the bi-objective Traveling Salesman Problem (TSP), to minimize the
“travel time” and the “distance” to cover certain routing points in a urban area
• Open Data API provided by the
New York City Department of
Transportation
• Updates traffic information
several times per minute

IATECH MÁLAGA
• Analyser: Multi-objective metaheuristic NSGA-II provided in jMetalSP. It which allows
parallel processing of evaluation functions in Apache Spark environment

IATECH MÁLAGA
• Workflow

IATECH MÁLAGA
• Ontology definition of this workflow

IATECH MÁLAGA
• Workflow

IATECH MÁLAGA
• Case study 1: Streaming
processing of New York City traffic
open-data
• Semantic annotation and querying

IATECH MÁLAGA
• Case study 2: academic problem of Irish flower classification
• Classification algorithm: decision tree J48
• UCI Repository

IATECH MÁLAGA
• For materialization, two different approaches have been used in this case:
• the well-known library for data mining Weka and
• the BigML SaaS API for analysis on-cloud

IATECH MÁLAGA
• Ontology definition of this workflow

IATECH MÁLAGA
• Analytic workflow

IATECH MÁLAGA
• Case study 2: academic problem
of Irish flower classification
• Semantic annotation and querying

IATECH MÁLAGA
• Case study 3: Reasoning
• SWRL rules to perform semantic reasoning jobs mainly devoted to check correctness of
workflows, e.i., to discover those components and tasks with (non-)compatible
connectivity of inputs/outputs, execution orders, data domains, data formats, data types,
etc
• SWRL rules are then evaluated by the reasoner after classifying Big Data components in
accordance with axioms

IATECH MÁLAGA
• Case study 3: Reasoning
• SWRL rules to check correctness of workflows

IATECH MÁLAGA
Conclusions
• Experience in case studies revealed that BIGOWL approach is useful
when integrating knowledge domain concerning a specific analytic
problem
• Consequently, the integrated knowledge is used for guiding the
design of Big Data analytics workflows, by recommending next
components to be linked, and supporting final validation

IATECH MÁLAGA
Research agenda
• First phase to provide automatic facilities for ontology population, hence to enrich
the semantic approach
• To generate new and heterogeneous use cases of analytics workflows that would led
us to find and solve new possible deficiencies, as well as to enrich the knowledge
base

Bigowl aitech

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bigowl aitech

Similar to Bigowl aitech (20)

Recently uploaded

Recently uploaded (20)

Bigowl aitech