SlideShare a Scribd company logo
1 of 21
Identifying semantics characteristics of user’s
interactions datasets through an application of a data
analysis
Fernando de Assis RODRIGUES, Ph. D.
Pedro Henrique Santos BISI
Ricardo César Gonçalves SANT’ANA, Ph. D.
Graduate Program of Information Science
UNESP - São Paulo State University (Brazil)
2
Using data as part of
the decision-making
process is a reality in
professional
environments.
We need to
decide
[something].
Sure! Let’s use
data to support.
Data Data Data Data Data Data
The analyzed fact need to receive inputs from
multiple data sources – structuring, integrating,
storing, and processing the collected data into an
output that supports a better understanding of the
fact from data, allowing new dimensions or
perspectives of analysis
3
4
The use of data as part of the decision-making process in several areas.
Science
Decision-
mankingData
D D D D D D
Education
Decision-
mankingData
D D D D D D
Industry
Decision-
mankingData
D D D D D D
Management
Decision-
mankingData
D D D D D D
Services
Decision-
mankingData
D D D D D D
...
Decision-
mankingData
D D D D D D
Data warehouse
5
DataDataDataDataDataDataDataDataData
Data Analytics
DataDataNew
Data
Look! After the data
analysis, new data
became available! Sounds good!
But don’t forget: from
data sources, all things
depend.
The <entity, attribute, and value> condition implies
a use of aggregated information on these elements to
assure a minimal semantic to understand what is
available, notably related with steps of obtaining data
collections from data sources
6
Goal
7
To identify the semantics characteristics of data attributes
at the moment of collecting, from dataset's structures found
on data export interfaces on user’s interactions analysis tools,
on Internet communication channels, and on web analytics
data tools involved in a scientific journal management,
through an application of a process of data analysis and data
modeling techniques.
Methodology and Method
8
About the observed phenomenon
Investigation of datasets, entities, and attributes related to
the interaction between users and communications
channels from a scientific journal.
Nature
Qualitative-quantitative research.
Purpose
Exploratory analysis to identify characteristics of how data
are available and structured on these data resources.
The Path adopted on investigation
An exploratory research of data export interfaces to collect
information about available data and metadata.
About methods
(i) Data extraction and spreadsheet data handling with
Python 3 programming language.
(ii) Applying of the Entity-Relationship model in
generated data from analytics.
(iii) Use of data structures oriented to making-decision
processing (OLAP).
Methodology and Method
9
Data universe
User’s interactions data from:
→ Internal data sources
→ External data sources
Methodology and Method
10
Data Sample
User’s interactions data* from RECoDAF - Electronic
Journal Digital Skills for Family Farming
→ Internal data sources: Open Journal Systems
→ External data sources: Facebook Insights, Google
Analytics, Google Search Console, and Twitter Analytics
Methodology and Method
11
* Data collected on September 2017. Available at
http://dadosabertos.info/data/collection_recodaf_2017
Results | Discussion
12
255 exportable datasets, using 5 file formats.
Results | Discussion
13
Find on data sources
information about:
→Services
→Resources
→Datasets
→Attributes
→File Formats
→Data types
Diagram of Entity-Relationship model
developed for data collecting
Data warehouse
Dat
a
Dat
aData
Results | Discussion
14
An entity (ex) may have two attributes (ax and ay) sharing
same semantics (S), even when both attributes shows distinct
text labels on data collecting
Results | Discussion
15
ex = Page data, from Facebook Insights
ax = “Lifetime Likes by City - Tupã, SP, Brazil”
ay = “Lifetime Likes by City - Bauru, SP, Brazil”
Example
The attributes “Lifetime Likes by City - Tupã, SP,
Brazil” and “Lifetime Likes by City - Bauru, SP,
Brazil” have different text labels, but share same
semantics.
Results | Discussion
16
The absence of semantics with an exception for the
availability of text labels do not ensure that attributes of two
distinct entities (ex and ey) that shares equal labels (ax),
consequently, are sharing the same formal semantics (S) on
data collecting process by external agents.
Results | Discussion
17
ax = “Impressions”
ex = Search Analytics, from Google Search Console
ey = Tweet Activity, from Twitter Analytics
Example
Both attributes share equal text labels, but this coincidence
does not ensure that the attributes have the same semantics. In
this example, each entity applied different data types to these
attributes, resulting in a mismatch on their values.
It’s a
average!
It’s a total
amount!
Final considerations
18
The data analysis
→ To identify the critical points of descriptive elements on
those datasets.
→The lack of descriptive elements in data collection process
when triggered through the available export interfaces.
Final considerations
19
To reduce this dissonance between attributes, export interfaces
can bring more semantic information bound to datasets.
→ Important information to interpret data available from
different sources.
For example text labeling rules, controlled vocabularies, and restriction
clauses.
The semantic dissonances on these entities may interfere with
the development process of relationships between attributes
from different datasets, decreasing the potential of
interoperability.
References
20
Berg, O. (2015). Collaborating in a social era: ideas, insights and models
that inspire new ways of thinking about collaboration. Göteborg:
Intranätverk.
Cornell, P. (2005). A complete guide to PivotTables: a visual approach.
Berkeley, CA : New York: Apress ; Distributed to the Book trade in the
United States by Springer-Verlag.
Date, C. J. (2016). The new relational database dictionary: a
comprehensive glossary of concepts arising in connection with the
relational model of data, with definitions and illustrative examples:
[terms, concepts, and examples]. Sebastopol, CA: O´Reilly.
Goodwin, P., & Wright, G. (2014). Decision analysis for management
judgment (5th Edition). Hoboken, New Jersey: Wiley.
Gray, J., Bosworth, A., Lyaman, A., & Pirahesh, H. (1996). Data cube: a
relational aggregation operator generalizing GROUP-BY, CROSS-TAB,
and SUB-TOTALS (pp. 152–159). IEEE Comput. Soc. Press.
https://doi.org/10.1109/ICDE.1996.492099
Ikemoto, G. S., & Marsh, J. A. (2007). Cutting Through the “Data-
Driven” Mantra: Different Conceptions of Data-Driven Decision Making.
Yearbook of the National Society for the Study of Education, 106(1),
105–131. https://doi.org/10.1111/j.1744-7984.2007.00099.x
Inmon, W. H. (2005). Building the data warehouse (4th ed). Indianapolis,
Ind: Wiley.
Kimball, R., & Ross, M. (2011). The Data Warehouse Toolkit The
Complete Guide to Dimensional Modeling. New York, United States of
America: John Wiley & Sons. Retrieved from http://nbn-
resolving.de/urn:nbn:de:101:1-2014122311140
Lebo, T., & Williams, G. T. (2010). Converting governmental datasets
into linked data. In Proceedings of the 6th International Conference on
Semantic Systems. Graz, Austria: ACM Press.
https://doi.org/10.1145/1839707.1839755
Rathod, A. (2006). A messaging system to handle semantic dissonance
(Thesis). Rochester Institute of Technology, New York. Retrieved from
http://scholarworks.rit.edu/cgi/viewcontent.cgi?article=1668&context=the
ses
Reddy, G. S., Srinivasu, R., Rao, M. P. C., & Rikkula, S. R. (2010). Data
warehousing, data mining, OLAP, OLTP technologies are essential
elements to support decision-making process in industries. International
Journal on Computer Science and Engineering, 2(9), 2865–2873.
Ross Parry, Nick Poole, & Jon Pratty. (2008). Semantic Dissonance: Do
We Need (And Do We Understand) The Semantic Web? In Toronto:
Archives & Museum Informatics. Retrieved from
http://www.archimuse.com/mw2008/papers/ parry/parry.html
Sant’Ana, R. C. G. (2016). Ciclo de vida dos dados: uma perspectiva a
partir da ciência da informação. Informação & Informação, 21(2), 116.
https://doi.org/10.5433/1981-8920.2016v21n2p116
Santos, P. L. V. A. da C., & Sant’Ana, R. C. G. (2015). Dado e
Granularidade na perspectiva da Informação e Tecnologia: uma
interpretação pela Ciência da Informação. Ciência da Informação, 42(2),
11.
References
21
Shafranovich, Y. (2005). Common Format and MIME Type for Comma-
Separated Values (CSV) Files. The Internet Society. Retrieved from
https://tools.ietf.org/html/rfc4180
Silberschatz, A., Korth, H. F., & Sudarshan, S. (2011). Database system
concepts (6th edition). New York: McGraw-Hill.
Tennison, J., Kellogg, G., & Herman, I. (2015, December 17). Model for
Tabular Data and Metadata on the Web. (J. Tennison & G. Kellogg,
Eds.). World Wide Web Consortium. Retrieved from
https://www.w3.org/TR/tabular-data-model/
Turban, E., Aronson, J. E., & Liang, T.-P. (2004). Decision Support
Systems and Intelligent Systems (7th Edition). Upper Saddle River, NJ,
USA: Prentice-Hall, Inc.
fernando (at) rodrigues.pro.br
phbisi (at) gmail.com
ricardosantana (at) marilia.unesp.br
http://dadosabertos.info

More Related Content

What's hot

Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
IJERA Editor
 
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Sören Auer
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 

What's hot (20)

Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
 
Massive scale analytics with Stratosphere using R
Massive scale analytics with Stratosphere using RMassive scale analytics with Stratosphere using R
Massive scale analytics with Stratosphere using R
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Science
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Preprocessing
PreprocessingPreprocessing
Preprocessing
 
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
 
Self adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofSelf adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation of
 
11.challenging issues of spatio temporal data mining
11.challenging issues of spatio temporal data mining11.challenging issues of spatio temporal data mining
11.challenging issues of spatio temporal data mining
 
Query Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph DatabasesQuery Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph Databases
 
18231979 Data Mining
18231979 Data Mining18231979 Data Mining
18231979 Data Mining
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 

Similar to Identifying semantics characteristics of user’s interactions datasets through an application of a data analysis

Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
IJERA Editor
 
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.pptPDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
ssuser52a19e
 
Singapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docxSingapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docx
jennifer822
 
Singapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docxSingapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docx
edgar6wallace88877
 
2 days agoRajani Sade Data examination Identifying physical.docx
2 days agoRajani Sade Data examination Identifying physical.docx2 days agoRajani Sade Data examination Identifying physical.docx
2 days agoRajani Sade Data examination Identifying physical.docx
felicidaddinwoodie
 

Similar to Identifying semantics characteristics of user’s interactions datasets through an application of a data analysis (20)

The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
Data science innovations
Data science innovations Data science innovations
Data science innovations
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Towards reproducibility and maximally-open data
Towards reproducibility and maximally-open dataTowards reproducibility and maximally-open data
Towards reproducibility and maximally-open data
 
Curadoria digital e dados abertos conectados
Curadoria digital e dados abertos conectadosCuradoria digital e dados abertos conectados
Curadoria digital e dados abertos conectados
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
 
A metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposalA metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposal
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
 
PDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.pptPDS Unit - 1 Introdiction to DS.ppt
PDS Unit - 1 Introdiction to DS.ppt
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVF
 
Singapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docxSingapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docx
 
Singapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docxSingapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docx
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...
 
Lecture_1_Intro_toDS&AI.pptx
Lecture_1_Intro_toDS&AI.pptxLecture_1_Intro_toDS&AI.pptx
Lecture_1_Intro_toDS&AI.pptx
 
Data Innovation Lens: A New Way to Approach Data Design as Value Creation
Data Innovation Lens: A New Way to Approach Data Design as Value CreationData Innovation Lens: A New Way to Approach Data Design as Value Creation
Data Innovation Lens: A New Way to Approach Data Design as Value Creation
 
2 days agoRajani Sade Data examination Identifying physical.docx
2 days agoRajani Sade Data examination Identifying physical.docx2 days agoRajani Sade Data examination Identifying physical.docx
2 days agoRajani Sade Data examination Identifying physical.docx
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Cognitive Computing and Education and Learning
Cognitive Computing and Education and LearningCognitive Computing and Education and Learning
Cognitive Computing and Education and Learning
 

More from Fernando de Assis Rodrigues

More from Fernando de Assis Rodrigues (20)

Perspectivas e impasses na salvaguarda e preservação documental pós Medida Pr...
Perspectivas e impasses na salvaguarda e preservação documental pós Medida Pr...Perspectivas e impasses na salvaguarda e preservação documental pós Medida Pr...
Perspectivas e impasses na salvaguarda e preservação documental pós Medida Pr...
 
Serviços de Redes Sociais On-line e a Comunicação Científica: visibilidade de...
Serviços de Redes Sociais On-line e a Comunicação Científica: visibilidade de...Serviços de Redes Sociais On-line e a Comunicação Científica: visibilidade de...
Serviços de Redes Sociais On-line e a Comunicação Científica: visibilidade de...
 
Ficção Científica e Realidade da Coleta de Dados em Redes Sociais Online: aná...
Ficção Científica e Realidade da Coleta de Dados em Redes Sociais Online: aná...Ficção Científica e Realidade da Coleta de Dados em Redes Sociais Online: aná...
Ficção Científica e Realidade da Coleta de Dados em Redes Sociais Online: aná...
 
Interseções entre Coleta de Dados e Redes Sociais Online
Interseções entre Coleta de Dados e Redes Sociais OnlineInterseções entre Coleta de Dados e Redes Sociais Online
Interseções entre Coleta de Dados e Redes Sociais Online
 
Ficção Científica e Realidade da Coleta de Dados em Redes Sociais Online: aná...
Ficção Científica e Realidade da Coleta de Dados em Redes Sociais Online: aná...Ficção Científica e Realidade da Coleta de Dados em Redes Sociais Online: aná...
Ficção Científica e Realidade da Coleta de Dados em Redes Sociais Online: aná...
 
2018 uel-apresentacao-coleta redes-sociais_online
2018 uel-apresentacao-coleta redes-sociais_online2018 uel-apresentacao-coleta redes-sociais_online
2018 uel-apresentacao-coleta redes-sociais_online
 
Processo de Acesso a Dados e suas fases
Processo de Acesso a Dados e suas fasesProcesso de Acesso a Dados e suas fases
Processo de Acesso a Dados e suas fases
 
Fundamentos teóricos para coleta de dados de redes sociais online
Fundamentos teóricos para coleta de dados de redes sociais onlineFundamentos teóricos para coleta de dados de redes sociais online
Fundamentos teóricos para coleta de dados de redes sociais online
 
Open Source e Open Platform: potenciais catalizadores para uso de Internet da...
Open Source e Open Platform: potenciais catalizadores para uso de Internet da...Open Source e Open Platform: potenciais catalizadores para uso de Internet da...
Open Source e Open Platform: potenciais catalizadores para uso de Internet da...
 
Coleta de Dados em Redes Sociais
Coleta de Dados em Redes SociaisColeta de Dados em Redes Sociais
Coleta de Dados em Redes Sociais
 
Metadados em objetos digitais: conceitos e indexação na Web
Metadados em objetos digitais: conceitos e indexação na WebMetadados em objetos digitais: conceitos e indexação na Web
Metadados em objetos digitais: conceitos e indexação na Web
 
Metadados e Interoperabilidade
Metadados e InteroperabilidadeMetadados e Interoperabilidade
Metadados e Interoperabilidade
 
Aplicações da Teoria dos Grafos em coletas de dados
Aplicações da Teoria dos Grafos em coletas de dadosAplicações da Teoria dos Grafos em coletas de dados
Aplicações da Teoria dos Grafos em coletas de dados
 
Raspagem de dados em websites governamentais
Raspagem de dados em websites governamentaisRaspagem de dados em websites governamentais
Raspagem de dados em websites governamentais
 
Contextualização de conceitos teóricos no processo de coleta de dados de Rede...
Contextualização de conceitos teóricos no processo de coleta de dados de Rede...Contextualização de conceitos teóricos no processo de coleta de dados de Rede...
Contextualização de conceitos teóricos no processo de coleta de dados de Rede...
 
Pontos de contato entre a Esfera Pública e Instituições: reflexões sobre pote...
Pontos de contato entre a Esfera Pública e Instituições: reflexões sobre pote...Pontos de contato entre a Esfera Pública e Instituições: reflexões sobre pote...
Pontos de contato entre a Esfera Pública e Instituições: reflexões sobre pote...
 
Categorização de elementos de privacidade identificados nos termos de uso de ...
Categorização de elementos de privacidade identificados nos termos de uso de ...Categorização de elementos de privacidade identificados nos termos de uso de ...
Categorização de elementos de privacidade identificados nos termos de uso de ...
 
ANÁLISE DA COLETA DE DADOS EM REDES SOCIAIS: aspectos de privacidade de dados...
ANÁLISE DA COLETA DE DADOS EM REDES SOCIAIS: aspectos de privacidade de dados...ANÁLISE DA COLETA DE DADOS EM REDES SOCIAIS: aspectos de privacidade de dados...
ANÁLISE DA COLETA DE DADOS EM REDES SOCIAIS: aspectos de privacidade de dados...
 
ACESSO ÀS INFORMAÇÕES SOBRE AGRICULTURA FAMILIAR NA WEB
ACESSO ÀS INFORMAÇÕES SOBRE AGRICULTURA FAMILIAR NA WEBACESSO ÀS INFORMAÇÕES SOBRE AGRICULTURA FAMILIAR NA WEB
ACESSO ÀS INFORMAÇÕES SOBRE AGRICULTURA FAMILIAR NA WEB
 
O USO DE DADOS PÚBLICOS PARA O ACOMPANHAMENTO DA ATIVIDADE PARLAMENTAR
O USO DE DADOS PÚBLICOS PARA O ACOMPANHAMENTO DA ATIVIDADE PARLAMENTARO USO DE DADOS PÚBLICOS PARA O ACOMPANHAMENTO DA ATIVIDADE PARLAMENTAR
O USO DE DADOS PÚBLICOS PARA O ACOMPANHAMENTO DA ATIVIDADE PARLAMENTAR
 

Recently uploaded

Tuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notesTuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notes
jyothisaisri
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
GOWTHAMIM22
 

Recently uploaded (20)

PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)
 
Tuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notesTuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notes
 
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana LahariERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
 
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
 
MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
MSCII_              FCT UNIT 5 TOXICOLOGY.pdfMSCII_              FCT UNIT 5 TOXICOLOGY.pdf
MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptx
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdf
 
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptxBiochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptx
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary Gland
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptx
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdf
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
 
Lubrication System in forced feed system
Lubrication System in forced feed systemLubrication System in forced feed system
Lubrication System in forced feed system
 
MSC IV_Forensic medicine -sexual offence.pdf
MSC IV_Forensic medicine -sexual offence.pdfMSC IV_Forensic medicine -sexual offence.pdf
MSC IV_Forensic medicine -sexual offence.pdf
 
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
 

Identifying semantics characteristics of user’s interactions datasets through an application of a data analysis

  • 1. Identifying semantics characteristics of user’s interactions datasets through an application of a data analysis Fernando de Assis RODRIGUES, Ph. D. Pedro Henrique Santos BISI Ricardo César Gonçalves SANT’ANA, Ph. D. Graduate Program of Information Science UNESP - São Paulo State University (Brazil)
  • 2. 2 Using data as part of the decision-making process is a reality in professional environments. We need to decide [something]. Sure! Let’s use data to support. Data Data Data Data Data Data
  • 3. The analyzed fact need to receive inputs from multiple data sources – structuring, integrating, storing, and processing the collected data into an output that supports a better understanding of the fact from data, allowing new dimensions or perspectives of analysis 3
  • 4. 4 The use of data as part of the decision-making process in several areas. Science Decision- mankingData D D D D D D Education Decision- mankingData D D D D D D Industry Decision- mankingData D D D D D D Management Decision- mankingData D D D D D D Services Decision- mankingData D D D D D D ... Decision- mankingData D D D D D D
  • 5. Data warehouse 5 DataDataDataDataDataDataDataDataData Data Analytics DataDataNew Data Look! After the data analysis, new data became available! Sounds good! But don’t forget: from data sources, all things depend.
  • 6. The <entity, attribute, and value> condition implies a use of aggregated information on these elements to assure a minimal semantic to understand what is available, notably related with steps of obtaining data collections from data sources 6
  • 7. Goal 7 To identify the semantics characteristics of data attributes at the moment of collecting, from dataset's structures found on data export interfaces on user’s interactions analysis tools, on Internet communication channels, and on web analytics data tools involved in a scientific journal management, through an application of a process of data analysis and data modeling techniques.
  • 8. Methodology and Method 8 About the observed phenomenon Investigation of datasets, entities, and attributes related to the interaction between users and communications channels from a scientific journal. Nature Qualitative-quantitative research. Purpose Exploratory analysis to identify characteristics of how data are available and structured on these data resources.
  • 9. The Path adopted on investigation An exploratory research of data export interfaces to collect information about available data and metadata. About methods (i) Data extraction and spreadsheet data handling with Python 3 programming language. (ii) Applying of the Entity-Relationship model in generated data from analytics. (iii) Use of data structures oriented to making-decision processing (OLAP). Methodology and Method 9
  • 10. Data universe User’s interactions data from: → Internal data sources → External data sources Methodology and Method 10
  • 11. Data Sample User’s interactions data* from RECoDAF - Electronic Journal Digital Skills for Family Farming → Internal data sources: Open Journal Systems → External data sources: Facebook Insights, Google Analytics, Google Search Console, and Twitter Analytics Methodology and Method 11 * Data collected on September 2017. Available at http://dadosabertos.info/data/collection_recodaf_2017
  • 12. Results | Discussion 12 255 exportable datasets, using 5 file formats.
  • 13. Results | Discussion 13 Find on data sources information about: →Services →Resources →Datasets →Attributes →File Formats →Data types Diagram of Entity-Relationship model developed for data collecting Data warehouse Dat a Dat aData
  • 14. Results | Discussion 14 An entity (ex) may have two attributes (ax and ay) sharing same semantics (S), even when both attributes shows distinct text labels on data collecting
  • 15. Results | Discussion 15 ex = Page data, from Facebook Insights ax = “Lifetime Likes by City - Tupã, SP, Brazil” ay = “Lifetime Likes by City - Bauru, SP, Brazil” Example The attributes “Lifetime Likes by City - Tupã, SP, Brazil” and “Lifetime Likes by City - Bauru, SP, Brazil” have different text labels, but share same semantics.
  • 16. Results | Discussion 16 The absence of semantics with an exception for the availability of text labels do not ensure that attributes of two distinct entities (ex and ey) that shares equal labels (ax), consequently, are sharing the same formal semantics (S) on data collecting process by external agents.
  • 17. Results | Discussion 17 ax = “Impressions” ex = Search Analytics, from Google Search Console ey = Tweet Activity, from Twitter Analytics Example Both attributes share equal text labels, but this coincidence does not ensure that the attributes have the same semantics. In this example, each entity applied different data types to these attributes, resulting in a mismatch on their values. It’s a average! It’s a total amount!
  • 18. Final considerations 18 The data analysis → To identify the critical points of descriptive elements on those datasets. →The lack of descriptive elements in data collection process when triggered through the available export interfaces.
  • 19. Final considerations 19 To reduce this dissonance between attributes, export interfaces can bring more semantic information bound to datasets. → Important information to interpret data available from different sources. For example text labeling rules, controlled vocabularies, and restriction clauses. The semantic dissonances on these entities may interfere with the development process of relationships between attributes from different datasets, decreasing the potential of interoperability.
  • 20. References 20 Berg, O. (2015). Collaborating in a social era: ideas, insights and models that inspire new ways of thinking about collaboration. Göteborg: Intranätverk. Cornell, P. (2005). A complete guide to PivotTables: a visual approach. Berkeley, CA : New York: Apress ; Distributed to the Book trade in the United States by Springer-Verlag. Date, C. J. (2016). The new relational database dictionary: a comprehensive glossary of concepts arising in connection with the relational model of data, with definitions and illustrative examples: [terms, concepts, and examples]. Sebastopol, CA: O´Reilly. Goodwin, P., & Wright, G. (2014). Decision analysis for management judgment (5th Edition). Hoboken, New Jersey: Wiley. Gray, J., Bosworth, A., Lyaman, A., & Pirahesh, H. (1996). Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS (pp. 152–159). IEEE Comput. Soc. Press. https://doi.org/10.1109/ICDE.1996.492099 Ikemoto, G. S., & Marsh, J. A. (2007). Cutting Through the “Data- Driven” Mantra: Different Conceptions of Data-Driven Decision Making. Yearbook of the National Society for the Study of Education, 106(1), 105–131. https://doi.org/10.1111/j.1744-7984.2007.00099.x Inmon, W. H. (2005). Building the data warehouse (4th ed). Indianapolis, Ind: Wiley. Kimball, R., & Ross, M. (2011). The Data Warehouse Toolkit The Complete Guide to Dimensional Modeling. New York, United States of America: John Wiley & Sons. Retrieved from http://nbn- resolving.de/urn:nbn:de:101:1-2014122311140 Lebo, T., & Williams, G. T. (2010). Converting governmental datasets into linked data. In Proceedings of the 6th International Conference on Semantic Systems. Graz, Austria: ACM Press. https://doi.org/10.1145/1839707.1839755 Rathod, A. (2006). A messaging system to handle semantic dissonance (Thesis). Rochester Institute of Technology, New York. Retrieved from http://scholarworks.rit.edu/cgi/viewcontent.cgi?article=1668&context=the ses Reddy, G. S., Srinivasu, R., Rao, M. P. C., & Rikkula, S. R. (2010). Data warehousing, data mining, OLAP, OLTP technologies are essential elements to support decision-making process in industries. International Journal on Computer Science and Engineering, 2(9), 2865–2873. Ross Parry, Nick Poole, & Jon Pratty. (2008). Semantic Dissonance: Do We Need (And Do We Understand) The Semantic Web? In Toronto: Archives & Museum Informatics. Retrieved from http://www.archimuse.com/mw2008/papers/ parry/parry.html Sant’Ana, R. C. G. (2016). Ciclo de vida dos dados: uma perspectiva a partir da ciência da informação. Informação & Informação, 21(2), 116. https://doi.org/10.5433/1981-8920.2016v21n2p116 Santos, P. L. V. A. da C., & Sant’Ana, R. C. G. (2015). Dado e Granularidade na perspectiva da Informação e Tecnologia: uma interpretação pela Ciência da Informação. Ciência da Informação, 42(2), 11.
  • 21. References 21 Shafranovich, Y. (2005). Common Format and MIME Type for Comma- Separated Values (CSV) Files. The Internet Society. Retrieved from https://tools.ietf.org/html/rfc4180 Silberschatz, A., Korth, H. F., & Sudarshan, S. (2011). Database system concepts (6th edition). New York: McGraw-Hill. Tennison, J., Kellogg, G., & Herman, I. (2015, December 17). Model for Tabular Data and Metadata on the Web. (J. Tennison & G. Kellogg, Eds.). World Wide Web Consortium. Retrieved from https://www.w3.org/TR/tabular-data-model/ Turban, E., Aronson, J. E., & Liang, T.-P. (2004). Decision Support Systems and Intelligent Systems (7th Edition). Upper Saddle River, NJ, USA: Prentice-Hall, Inc. fernando (at) rodrigues.pro.br phbisi (at) gmail.com ricardosantana (at) marilia.unesp.br http://dadosabertos.info