This document discusses data quality and data integration in the context of big data. It presents two possible workflows for linking relational data to semantic data on the web. The first workflow performs record linkage and transformation together, while the second performs transformation and semantic enrichment before matching. The second workflow is able to improve completeness by leveraging external semantic knowledge from DBpedia. An example is provided to illustrate how the workflows may be applied to link sample relational data to semantic web resources.
DataGraft: Data-as-a-Service for Open Datadapaasproject
Tutorial at "The Data Matters Series – Transforming Service Industry via Big Data Analytics", May 4, 2016, Cyberjaya, Malaysia
http://www.eventbrite.com/e/the-data-matters-series-transforming-service-industry-via-big-data-analytics-tickets-24617911837
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.
Linking Scientific Metadata (presented at DC2010)Jian Qin
Linked entity data in metadata records builds a foundation for semantic web. Even though metadata records contain rich entity data, there is no linking between associated entities such as persons, datasets, projects, publications, or organizations. We conducted a small experiment using the dataset collection from the Hubbard Brook Ecosystem Study (HBES), in which we converted the entities and their relationships into RDF triples and linked the URIs contained in RDF triples to the corresponding entities in the Ecological Metadata Language (EML) records. Through the transformation program written in XML Stylesheet Language (XSL), we turned a plain EML record display into an interlinked semantic web of ecological datasets. The experiment suggests a methodological feasibility in incorporating linked entity data into metadata records. The paper also argues for the need of changing the scientific as well as general metadata paradigm.
The Most Effective Method For Selecting Data Science ProjectsGramener
Ganes Kesari, Gramener's Head of Analytics & Co-Founder gives his insights on how to craft a data science roadmap that maximizes ROI.
The biggest reason why 80% of analytics projects fail is that they don’t solve the right problem. Asking analytics or data-related question is the worst way to initiate a data analytics project.
This webinar will walk you through how to get started in the most efficient way possible. You'll discover a straightforward step-by-step strategy to unlocking corporate value through industry examples.
Things you will learn from this webinar:
-The most common reasons for the failure of data science initiatives
-Identifying projects and prioritizing them
-Building a data science strategy in three easy steps
-Real-life examples are used to explain the approach
Watch this full webinar on: https://info.gramener.com/data-science-roadmap
To know more from our industry experts book a free demo at: https://gramener.com/demorequest/
DataGraft: Data-as-a-Service for Open Datadapaasproject
Tutorial at "The Data Matters Series – Transforming Service Industry via Big Data Analytics", May 4, 2016, Cyberjaya, Malaysia
http://www.eventbrite.com/e/the-data-matters-series-transforming-service-industry-via-big-data-analytics-tickets-24617911837
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.
Linking Scientific Metadata (presented at DC2010)Jian Qin
Linked entity data in metadata records builds a foundation for semantic web. Even though metadata records contain rich entity data, there is no linking between associated entities such as persons, datasets, projects, publications, or organizations. We conducted a small experiment using the dataset collection from the Hubbard Brook Ecosystem Study (HBES), in which we converted the entities and their relationships into RDF triples and linked the URIs contained in RDF triples to the corresponding entities in the Ecological Metadata Language (EML) records. Through the transformation program written in XML Stylesheet Language (XSL), we turned a plain EML record display into an interlinked semantic web of ecological datasets. The experiment suggests a methodological feasibility in incorporating linked entity data into metadata records. The paper also argues for the need of changing the scientific as well as general metadata paradigm.
The Most Effective Method For Selecting Data Science ProjectsGramener
Ganes Kesari, Gramener's Head of Analytics & Co-Founder gives his insights on how to craft a data science roadmap that maximizes ROI.
The biggest reason why 80% of analytics projects fail is that they don’t solve the right problem. Asking analytics or data-related question is the worst way to initiate a data analytics project.
This webinar will walk you through how to get started in the most efficient way possible. You'll discover a straightforward step-by-step strategy to unlocking corporate value through industry examples.
Things you will learn from this webinar:
-The most common reasons for the failure of data science initiatives
-Identifying projects and prioritizing them
-Building a data science strategy in three easy steps
-Real-life examples are used to explain the approach
Watch this full webinar on: https://info.gramener.com/data-science-roadmap
To know more from our industry experts book a free demo at: https://gramener.com/demorequest/
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
2024 State of Marketing Report – by HubspotMarius Sescu
https://www.hubspot.com/state-of-marketing
· Scaling relationships and proving ROI
· Social media is the place for search, sales, and service
· Authentic influencer partnerships fuel brand growth
· The strongest connections happen via call, click, chat, and camera.
· Time saved with AI leads to more creative work
· Seeking: A single source of truth
· TLDR; Get on social, try AI, and align your systems.
· More human marketing, powered by robots
ChatGPT is a revolutionary addition to the world since its introduction in 2022. A big shift in the sector of information gathering and processing happened because of this chatbot. What is the story of ChatGPT? How is the bot responding to prompts and generating contents? Swipe through these slides prepared by Expeed Software, a web development company regarding the development and technical intricacies of ChatGPT!
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
2024 State of Marketing Report – by HubspotMarius Sescu
https://www.hubspot.com/state-of-marketing
· Scaling relationships and proving ROI
· Social media is the place for search, sales, and service
· Authentic influencer partnerships fuel brand growth
· The strongest connections happen via call, click, chat, and camera.
· Time saved with AI leads to more creative work
· Seeking: A single source of truth
· TLDR; Get on social, try AI, and align your systems.
· More human marketing, powered by robots
ChatGPT is a revolutionary addition to the world since its introduction in 2022. A big shift in the sector of information gathering and processing happened because of this chatbot. What is the story of ChatGPT? How is the bot responding to prompts and generating contents? Swipe through these slides prepared by Expeed Software, a web development company regarding the development and technical intricacies of ChatGPT!
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
The realm of product design is a constantly changing environment where technology and style intersect. Every year introduces fresh challenges and exciting trends that mold the future of this captivating art form. In this piece, we delve into the significant trends set to influence the look and functionality of product design in the year 2024.
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
Mental health has been in the news quite a bit lately. Dozens of U.S. states are currently suing Meta for contributing to the youth mental health crisis by inserting addictive features into their products, while the U.S. Surgeon General is touring the nation to bring awareness to the growing epidemic of loneliness and isolation. The country has endured periods of low national morale, such as in the 1970s when high inflation and the energy crisis worsened public sentiment following the Vietnam War. The current mood, however, feels different. Gallup recently reported that national mental health is at an all-time low, with few bright spots to lift spirits.
To better understand how Americans are feeling and their attitudes towards mental health in general, ThinkNow conducted a nationally representative quantitative survey of 1,500 respondents and found some interesting differences among ethnic, age and gender groups.
Technology
For example, 52% agree that technology and social media have a negative impact on mental health, but when broken out by race, 61% of Whites felt technology had a negative effect, and only 48% of Hispanics thought it did.
While technology has helped us keep in touch with friends and family in faraway places, it appears to have degraded our ability to connect in person. Staying connected online is a double-edged sword since the same news feed that brings us pictures of the grandkids and fluffy kittens also feeds us news about the wars in Israel and Ukraine, the dysfunction in Washington, the latest mass shooting and the climate crisis.
Hispanics may have a built-in defense against the isolation technology breeds, owing to their large, multigenerational households, strong social support systems, and tendency to use social media to stay connected with relatives abroad.
Age and Gender
When asked how individuals rate their mental health, men rate it higher than women by 11 percentage points, and Baby Boomers rank it highest at 83%, saying it’s good or excellent vs. 57% of Gen Z saying the same.
Gen Z spends the most amount of time on social media, so the notion that social media negatively affects mental health appears to be correlated. Unfortunately, Gen Z is also the generation that’s least comfortable discussing mental health concerns with healthcare professionals. Only 40% of them state they’re comfortable discussing their issues with a professional compared to 60% of Millennials and 65% of Boomers.
Race Affects Attitudes
As seen in previous research conducted by ThinkNow, Asian Americans lag other groups when it comes to awareness of mental health issues. Twenty-four percent of Asian Americans believe that having a mental health issue is a sign of weakness compared to the 16% average for all groups. Asians are also considerably less likely to be aware of mental health services in their communities (42% vs. 55%) and most likely to seek out information on social media (51% vs. 35%).
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
This article is all about what AI trends will emerge in the field of creative operations in 2024. All the marketers and brand builders should be aware of these trends for their further use and save themselves some time!
A report by thenetworkone and Kurio.
The contributing experts and agencies are (in an alphabetical order): Sylwia Rytel, Social Media Supervisor, 180heartbeats + JUNG v MATT (PL), Sharlene Jenner, Vice President - Director of Engagement Strategy, Abelson Taylor (USA), Alex Casanovas, Digital Director, Atrevia (ES), Dora Beilin, Senior Social Strategist, Barrett Hoffher (USA), Min Seo, Campaign Director, Brand New Agency (KR), Deshé M. Gully, Associate Strategist, Day One Agency (USA), Francesca Trevisan, Strategist, Different (IT), Trevor Crossman, CX and Digital Transformation Director; Olivia Hussey, Strategic Planner; Simi Srinarula, Social Media Manager, The Hallway (AUS), James Hebbert, Managing Director, Hylink (CN / UK), Mundy Álvarez, Planning Director; Pedro Rojas, Social Media Manager; Pancho González, CCO, Inbrax (CH), Oana Oprea, Head of Digital Planning, Jam Session Agency (RO), Amy Bottrill, Social Account Director, Launch (UK), Gaby Arriaga, Founder, Leonardo1452 (MX), Shantesh S Row, Creative Director, Liwa (UAE), Rajesh Mehta, Chief Strategy Officer; Dhruv Gaur, Digital Planning Lead; Leonie Mergulhao, Account Supervisor - Social Media & PR, Medulla (IN), Aurelija Plioplytė, Head of Digital & Social, Not Perfect (LI), Daiana Khaidargaliyeva, Account Manager, Osaka Labs (UK / USA), Stefanie Söhnchen, Vice President Digital, PIABO Communications (DE), Elisabeth Winiartati, Managing Consultant, Head of Global Integrated Communications; Lydia Aprina, Account Manager, Integrated Marketing and Communications; Nita Prabowo, Account Manager, Integrated Marketing and Communications; Okhi, Web Developer, PNTR Group (ID), Kei Obusan, Insights Director; Daffi Ranandi, Insights Manager, Radarr (SG), Gautam Reghunath, Co-founder & CEO, Talented (IN), Donagh Humphreys, Head of Social and Digital Innovation, THINKHOUSE (IRE), Sarah Yim, Strategy Director, Zulu Alpha Kilo (CA).
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
The search marketing landscape is evolving rapidly with new technologies, and professionals, like you, rely on innovative paid search strategies to meet changing demands.
It’s important that you’re ready to implement new strategies in 2024.
Check this out and learn the top trends in paid search advertising that are expected to gain traction, so you can drive higher ROI more efficiently in 2024.
You’ll learn:
- The latest trends in AI and automation, and what this means for an evolving paid search ecosystem.
- New developments in privacy and data regulation.
- Emerging ad formats that are expected to make an impact next year.
Watch Sreekant Lanka from iQuanti and Irina Klein from OneMain Financial as they dive into the future of paid search and explore the trends, strategies, and technologies that will shape the search marketing landscape.
If you’re looking to assess your paid search strategy and design an industry-aligned plan for 2024, then this webinar is for you.
5 Public speaking tips from TED - Visualized summarySpeakerHub
From their humble beginnings in 1984, TED has grown into the world’s most powerful amplifier for speakers and thought-leaders to share their ideas. They have over 2,400 filmed talks (not including the 30,000+ TEDx videos) freely available online, and have hosted over 17,500 events around the world.
With over one billion views in a year, it’s no wonder that so many speakers are looking to TED for ideas on how to share their message more effectively.
The article “5 Public-Speaking Tips TED Gives Its Speakers”, by Carmine Gallo for Forbes, gives speakers five practical ways to connect with their audience, and effectively share their ideas on stage.
Whether you are gearing up to get on a TED stage yourself, or just want to master the skills that so many of their speakers possess, these tips and quotes from Chris Anderson, the TED Talks Curator, will encourage you to make the most impactful impression on your audience.
See the full article and more summaries like this on SpeakerHub here: https://speakerhub.com/blog/5-presentation-tips-ted-gives-its-speakers
See the original article on Forbes here:
http://www.forbes.com/forbes/welcome/?toURL=http://www.forbes.com/sites/carminegallo/2016/05/06/5-public-speaking-tips-ted-gives-its-speakers/&refURL=&referrer=#5c07a8221d9b
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
Everyone is in agreement that ChatGPT (and other generative AI tools) will shape the future of work. Yet there is little consensus on exactly how, when, and to what extent this technology will change our world.
Businesses that extract maximum value from ChatGPT will use it as a collaborative tool for everything from brainstorming to technical maintenance.
For individuals, now is the time to pinpoint the skills the future professional will need to thrive in the AI age.
Check out this presentation to understand what ChatGPT is, how it will shape the future of work, and how you can prepare to take advantage.
A brief introduction to DataScience with explaining of the concepts, algorithms, machine learning, supervised and unsupervised learning, clustering, statistics, data preprocessing, real-world applications etc.
It's part of a Data Science Corner Campaign where I will be discussing the fundamentals of DataScience, AIML, Statistics etc.
Time Management & Productivity - Best PracticesVit Horky
Here's my presentation on by proven best practices how to manage your work time effectively and how to improve your productivity. It includes practical tips and how to use tools such as Slack, Google Apps, Hubspot, Google Calendar, Gmail and others.
The six step guide to practical project managementMindGenius
The six step guide to practical project management
If you think managing projects is too difficult, think again.
We’ve stripped back project management processes to the
basics – to make it quicker and easier, without sacrificing
the vital ingredients for success.
“If you’re looking for some real-world guidance, then The Six Step Guide to Practical Project Management will help.”
Dr Andrew Makar, Tactical Project Management
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
From Data Quality to Big Data Quality: A Data Integration Scenario
1. From Data Quality to Big Data Quality:
A Data Integration Scenario
Carlo Batini*, Anisa Rula+
*University of Milano-Bicocca
+Department of Information Engineering,
University of Brescia
2. Data Quality and Data Integration in a 3-dimensional
Data Space
2
- Relational data
- Traditional information systems
Data Types
Big Data
lifecycle
Data Quality Clusters
1
2
3
3. Big Data Lifecycle
Source selection
Discovery
Retrieval
Browsing
Internet of Things Design
Extraction
Sampling
Modeling
Filtering
Creation
Management
Interpretation
Modeling
Semantic enrichement
Profiling
Metadata management
Persistence
Representation transformation
Matching
Entity resolution (or record linkage)
Integration
Semantic Alignement
Fusion
Abstraction
Refinement
Restructuring
Quality assessment
Elementary data
Statistics
Quality improvement
Elementary data
Statistics
Learning
Supervised
Unsupervised
Reinforcement
Interpretation
Validation
Hypothesis testing
Statistical analysis
Sensitivity analysis
Model validation
Value driven analysis
Correlation
Mining
Discovery
Learning
Supervised
Unsupervised
Reinforcement
Prediction (predictive modeling)
Forecasting
Visualization
Optimization
Acquisition
Analysis
1
4. Data Quality and Data Integration among the 3Vs
• Volume
• Velocity
• Variety
5. Data Quality Dimension Clusters*
1. Accuracy, correctness, validity and precision focus on the adherence of data to a given reality of interest. Including time
related issues
2. Completeness, pertinence and relevance refer to the capability of representing all and only the relevant aspects of the
reality of interest
3. Redundancy, minimality, compactness and conciseness refer to the capability of representing the reality of interest
with the minimal use of informative resources
4. Readability, comprehensibility, clarity and simplicity refer to ease of understanding of data by users.
5. Accessibility and availability are related to the ability of the user to access data from his or her culture, physical
status/functions, and technologies available
6. Consistency, cohesion and coherence refer to the capability of data to comply without contradictions to all properties
of the reality of interest, as specified in terms of integrity constraints, data edits, business rules and other formalisms
7. Trust, including believability, reliability and reputation, catching how much data derive from an authoritative source
Relational data
Traditional information systems
DQ Clusters &
cluster dimensions
2
* From Data Quality to Big Data Quality. J. Database Manag. 2015. C. Batini, A. Rula, M. Scannapieco, G. Viscusi
6. Role of Structural Characteristics – Linked Data
How do data quality dimensions and metrics evolve in the transformation
process from tabular to linked data?
Data type Structural characteristics
Linked Data
• Dereferenceable Resource
• SPARQL Endpoint
• RDF dump
• Interlinking
• Licensing
Relational data
Types of Big Data
3
Enables us managing the huge variability of methods and techniques
needed to manage data quality in linked data
8. Toward a Cost Effectiveness Model
Management
Interpretation
Modeling
Semantic enrichement
Profiling
Metadata management
Persistence
Representation transformation
Matching
Entity resolution (or record linkage)
Integration
Semantic Alignement
Fusion
Abstraction
Refinement
Restructuring
Quality assessment
Elementary data
Statistics
Quality improvement
Elementary data
Statistics
…
From relational to RDF
Consider the four steps together
10. Strategy of Workflow 1
10
• Exploit syntactic distance between concepts
• Differentiate distance:
–for alphanumeric strings use edit distance
–for lists of items use Jaccard distance
P…. G….
alppg ttygb
gnnn null
P… G….
alpbg ttygt
gnggr eryuui
Record
linkage
Transfor-
mation &
Alignment
P…. G….
alppg ttygb
gnnn null
kkkyyt vvvy
Quality management
11. Running Example – Relational Data
11
Id Denomination Istat Code Type
126759 Torgian 54053 Historic Place
125007 Saint Peter Abbey 246 Building
Id Label Location Type
1 Torgiano Torgiano Populated Place
2 Saint Peter Sicily Historic Region
3 Basilica of Saint Peter null Religious Building
Tourist attraction
Place
12. Running Example – Attributes Selection
12
Id Denomination Istat Code Type
126759 Torgian 54053 Historic Place
125007 Saint Peter Abbey 246 Building
Id Label Location Type
1 Torgiano Torgiano Populated Place
2 Saint Peter Sicily Historic Region
3 Basilica of Saint Peter null Religious Building
Tourist attraction
Place
13. First pair – Edit Distance
13
Id Denomination Istat Code Type
126759 Torgian 54053 Historic Place
125007 Saint Peter Abbey 246 Building
Id Label Location Type
1 Torgiano Torgiano Populated Place
2 Saint Peter Sicily Historic Region
3 Basilica of Saint Peter null Religious Building
Tourist attraction
Place
Edit-d = 1
YES!
14. Second Pair – Jaccard Distance
14
Id Denomination Istat Code Type
126759 Torgian 54053 Historic Place
125007 Saint Peter Abbey 246 Building
Id Label Location Type
1 Torgiano Torgiano Populated Place
2 Saint Peter Sicily Historic Region
3 Basilica of Saint Peter null Religious Building
Tourist attraction
Place
Jaccard =
1 - 2/3 =
1/3
YES!
15. Third Pair – Jaccard Distance
15
Id Denomination Istat Code Type
126759 Torgian 54053 Historic Place
125007 Saint Peter Abbey 246 Building
Id Label Location Type
1 Torgiano Torgiano Populated Place
2 Saint Peter Sicily Historic Region
3 Basilica of Saint Peter null Religious Building
Tourist attraction
Place
Jaccard =
1 - 2/5 =
3/5
NO!
16. Workflow 1 – Outcome
16
Id Denomination Istat Code Type
126759 Torgian 54053 Historic Place
125007 Saint Peter Abbey 246 Building
Id Label Location Type
1 Torgiano Torgiano Populated Place
2 Saint Peter Sicily Historic Region
3 Basilica of Saint Peter null Religious Building
Tourist attraction
Place
Final outcome:
1 pair true positive
1 pair false positive
1 pair false negative
Completeness
assessment:
1 unsolvable null
17. Strategy of Workflow 2
17
• Exploit semantic distance between concepts
• Improve completeness through DBpedia
P…. G….
alppg ttygb
gnnn null
P… G….
alpbg ttygt
gnggr eryuui
Transformation/
Semantic
Enrichment
Alignment/
Matching
http://dbpedia.org/resource/St._Peter
“Saint Peter”
label
location
http://dbpedia.org/resource/Sicily
37
lat
14
long
http://example/resource/ 125970
246 “Saint Peter Abbey”
istatCode label
Match/
Non-Match
18. Id Denomination Istat Code Type
126759 Torgian 54053 Historic
Place
125007 Saint Peter Abbey 246 Building
Id Label Location Type
1 Torgiano Torgiano Populated Place
2 Saint Peter Sicily Historic Region
3 Basilica of Saint
Peter
null Religious
Building
Running Example – Model Transformation & Alignment 1
18
http://example/resource/126759
“Torgian”
54053
type
istatCode
denomination
HistoricPlace
PopulatedPlace
http://dbpedia.org/resource/Torgiano
“Torgiano”
type
label
Tourist attraction
Place
21. Id Label Location Type
1 Torgiano Torgiano Populated Place
2 Saint Peter Sicily Historic Region
3 Basilica of Saint
Peter
null Religious
Building
Running Example – Model Transformation & Alignment 2
21
Id Denomination Istat Code Type
126759 Torgian 54053 Historic
Place
125007 Saint Peter Abbey 246 Building
http://example/resource/ 125970
246 “Saint Peter Abbey”
istatCode
label
http://dbpedia.org/resource/St._Peter_Basilica
“Basilica of Saint Peter”
label
Tourist attraction
Place
http://dbpedia.org/resource/St._Peter
“Saint Peter”
label
location
http://dbpedia.org/resource/Sicily
37
lat
14
long
22. http://dbpedia.org/resource/St._Peter
http://example/resource/ 125970
246 “Saint Peter Abbey”
istatCode
label
“Saint Peter”
label
ArchitecturalStructure HistoricPlace
Monument
PopulatedPlace
Building
ReligiousBuilding
Place
…
Region
AdministrativeRegion HistoricRegion
HistoricBuilding
DBpedia Ontology
type
Running Example – Typing
22
location
http://dbpedia.org/resource/Sicily
http://dbpedia.org/resource/St._Peter_Basilica
“Basilica of Saint Peter”
label
http://dbpedia.org/resource/Vatican_City
location
41
lat
12
long
37
lat
14
long
type
type
25. Solving Missing Data
25
http://dbpedia.org/resource/St._Peter_Basilica
“Basilica of Saint Peter”
label
http://dbpedia.org/resource/Vatican_City
location
41
lat
12
long
Id Label Location Type
1 Torgiano Torgiano Populated Place
2 Saint Peter Sicily Historic Region
3 Basilica of Saint
Peter
null Religious Building
Place
Id Label Location Type
1 Torgiano Torgiano Populated Place
2 Saint Peter Sicily Historic Region
3 Basilica of Saint
Peter
Vatican City Religious Building
26. Conclusions and Futur Works
26
• Integration followed by transformation if no domain knowledge is needed
• Transformation followed by integration is better when:
– Aligning attributes by using existing vocabularies
– Distance between concepts is defined
– Enriching the dataset with the missing values
• Data integration of may depend on how good is the transformation
outcome
• Define it as an optimization problem
- Define functions for measuring the cost and the gain
- The cost and gain are different for different integration models