SlideShare a Scribd company logo
Open Data Convergence
WHITE PAPER

International Institute of Information Technology, Bangalore
Table of Contents

Contents
Abstract _____________________________________________________________ 1
Introduction __________________________________________________________ 2
Problem Definition_____________________________________________________ 3
Solution _____________________________________________________________ 4
Benefits _____________________________________________________________ 7
Conclusion __________________________________________________________ 8
Contact Information____________________________________________________ 9
Open Data Convergence | White Paper

Pg. 01

Abstract
Open Data initiative by data.gov.in has provide a great platform for sharing the
datasets belonging to 55+ Departments consist of around 5000 datasets till date.
These datasets are available in various format such as excel, xml, csv, etc.
But the consumer of these datasets are facing few major issue, first one is Data
Integration, currently there is no mean to check what all dataset are linked with each
other semantically. Second issue is Data Quality, there is no uniform data
representation format, no quality check metric to show missing/invalid data.
In this white paper we are proposing a concept termed as ‘Data Convergence’ which
provides unified view of data collected from different sources (or departments) in
different formats. To demonstrate this we built a software application which will
provide unified view by means of HTTP API in JSON/XML formats.
Main objective is to maintain loose coupling between underline storage structure and
consumer client.
Open Data Convergence | White Paper

Pg. 02

“Make each bit
count – by
creating network
of datasets”

Introduction
Major shortcoming in most of open data portal (such as http://data.gov.in ) is that they
do not provide any API to their datasets. Some of portal like http://data.worldbank.org/,
http://data.gov provide API but they are limited to produce output for a particular
dataset only, yes they do have custom API builder but no way to merge/combine two
or more different dataset which are having one or more common attribute/dimension.
Just having a lot of data is useless unless we can derive some useful information from
it. For example from the current representation of open data, It is not obvious to
identify effect of weather on number of tourist visited as both of this dataset belongs to
two different department earth science and tourism respectively. And consumer may
not be aware of existence of dependent dataset which could have worked as catalyst
in his/her analysis. It would have been much better to have an API which will identify
how datasets are connected / linked on which attribute/dimension and produce a
unified view of multiple datasets.
Linked Dataset is a basement of complete Data Convergence system which consists
of large number of graphs where a node represents dataset and edge between them
denotes the existence of relationship. Linked Data helps to identify correlation among
different parameters in different dataset.
Common flaws such as different file format, lack of consistent representation, missing/
invalid values can also be solved to certain extent in preprocessing phase of data
convergence.
Pg. 03

Open Data Convergence | White Paper

Problem Definition
In most of Open data portal consumer has to browse through a collection of datasets
and select one dataset at a time. User has no mean to know relationship/linking
among different datasets. Unless one goes through all dataset manually he/she won’t
get a clear idea about dependency, connectivity among datasets on certain parameter
which could have served as more aggregated information.
There should be provision where consumer can provide what all data he/she wants, in
which format as well as quantity and system should be able to identify how to process
this query and display relevant information.
There should also be a mechanism for searching a dataset not only by its name /
department, but also by its content / field / attribute / dimension.
System should be capable of displaying all linked datasets for a given dataset, where
user can visualize a graph of linked datasets, also it must be able to converge this
datasets and produce a unified view.
Open Data Convergence | White Paper

Pg. 04

“Leveraging open
data platform
with the help
of converged
datasets”

Solution
Overview
Data convergence system at a high level takes three inputs from consumer viz., what
all datasets user wants to converge? In which format? How many records per page?
Later it processes the input and generates API which will give unified view of
datasets.

Description
A typical use case in Data convergence system is stated below
1. Data convergence system has a Smart API builder which provides two kind of
user interface to select a collection of datasets.
1.1 Navigational
a. Step by step navigation to select datasets provided with filter such as
country, state, district, etc.,
b. User has to first select department, then master dataset and finally choose
as many to select all data.
1.2 Search based
a. User can search by data, dataset name, dataset description, and dataset
field/attribute/dimension name in open data repository.
b. Select any number of datasets which are displayed in query result.
2. From a given collection of datasets system identifies connected components.
For example- Assume user selects dataset of Authorize Travel agency, tourism
statistics, rainfall, storm and temperature. Then two connected component are
identified viz.,{[ Authorize Travel agency, tourism statistics], [rainfall, storm,
Pg. 05

Open Data Convergence | White Paper
temperature]} depending upon what relationship is specified by dataset
producer while uploading into open data repository
3. API is generated for each connected components which provide flexibility to
modify number of records per page, offset and output format json/xml
json/xml.
Sample API:
http://opendata.com/api.jsp?key=fbd7939d674997cdb4692d34de8633c4&
offset=1&limit=10& format=json
4. Whenever user perform GET on API using key our data convergence system
will retrieve metadata of API query create a temporary unified view and return
result set in JSON/XML format
5. JSON/XML are standard data exchange format, they can be easily integrated
into any application.
Data Convergence system also provides visualization of all connected dataset up to
3rd level (extensible Few more functionalities like filtering attributes, records in unified
extensible).
view / result set can be added.

(fig. Data Convergence System Block Diagram)
Pg. 06

Open Data Convergence | White Paper
Following image show converged output of two different dataset Road Accident and
Person Injured for a state Arunachal Pradesh in two different format json and xml.

(fig. API response in JSON and XML format)
Open Data Convergence | White Paper

Pg. 07

Benefits
Data Convergence System will provide a single uniform HTTP API access to a unified
view of dataset. It also provides user a facility to visualize the network of connected
datasets.
Key Highlights
 Easy Access to converged data through API,
 Standard data exchange format (JSON and XML) are supported
 Flexible (select dimension as required)
 Unified view support real time data
 Loose coupling
 Notifications generated with change data capture (CDC)

(fig. Visualization of Datasets relationship)
Pg. 08

Open Data Convergence | White Paper

Conclusion
An attempt is been made to introduce a concept which will provide an unified view for
Datasets present in open data portal based on the concept of Data Convergence
which will leverage the way of accessing open data in this document.
Consumer of Open Data can now have a much more idea and knowledge of Datasets
and their relationships, eventually it will stimulate their data analysis process.
Open Data Convergence | White Paper

Pg. 09

Contact Information
Prof. Chandrashekhar Ramanathan
Tel: +91 80 4140 7777
rc@iiitb.ac.in

Bisen Vikrantsingh M.
Tel: +91 8792708719
Bisenvikrantsingh.mohansingh@iiitb.org

Institute
IIIT Bangalore
IIIT-Bangalore,26/C, Electronics
City, Hosur Road, Bangalore,
560100
Tel+91 80 4140 7777 / 2852 7627
http://www.iiitb.ac.in

Kodamasimham Pridhvi
Tel: +91 8123160887
Pridhvi.kodamasimham @iiitb.org

More Related Content

What's hot

Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013
Yadhu Kiran
 
Annotating Search Results from Web Databases
Annotating Search Results from Web DatabasesAnnotating Search Results from Web Databases
Annotating Search Results from Web Databases
SWAMI06
 
B131626
B131626B131626
B131626
IJRES Journal
 
Annotating search results from web databases
Annotating search results from web databasesAnnotating search results from web databases
Annotating search results from web databases
IEEEFINALYEARPROJECTS
 
OODM-object oriented data model
OODM-object oriented data modelOODM-object oriented data model
OODM-object oriented data model
AnilPokhrel7
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463IJRAT
 
Efficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl FrameworkEfficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl Framework
IOSR Journals
 
At33264269
At33264269At33264269
At33264269
IJERA Editor
 
Week 1 Before the Advent of Database Systems & Fundamental Concepts
Week 1 Before the Advent of Database Systems & Fundamental ConceptsWeek 1 Before the Advent of Database Systems & Fundamental Concepts
Week 1 Before the Advent of Database Systems & Fundamental Concepts
oudesign
 
The three level of data modeling
The three level of data modelingThe three level of data modeling
The three level of data modelingsharmila_yusof
 
Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1JEAN-MICHEL LETENNIER
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbms
Anjaan Gajendra
 
Week 1 Lab Directions
Week 1 Lab DirectionsWeek 1 Lab Directions
Week 1 Lab Directions
oudesign
 
Vision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsVision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result Records
IJMER
 
Identity Resolution across Different Social Networks using Similarity Analysis
Identity Resolution across Different Social Networks using Similarity AnalysisIdentity Resolution across Different Social Networks using Similarity Analysis
Identity Resolution across Different Social Networks using Similarity Analysis
rahulmonikasharma
 

What's hot (18)

Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013
 
Annotating Search Results from Web Databases
Annotating Search Results from Web DatabasesAnnotating Search Results from Web Databases
Annotating Search Results from Web Databases
 
B131626
B131626B131626
B131626
 
Database model
Database modelDatabase model
Database model
 
Annotating search results from web databases
Annotating search results from web databasesAnnotating search results from web databases
Annotating search results from web databases
 
OODM-object oriented data model
OODM-object oriented data modelOODM-object oriented data model
OODM-object oriented data model
 
AtomiDB Dr Ashis Banerjee reviews
AtomiDB Dr Ashis Banerjee reviewsAtomiDB Dr Ashis Banerjee reviews
AtomiDB Dr Ashis Banerjee reviews
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463
 
Efficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl FrameworkEfficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl Framework
 
At33264269
At33264269At33264269
At33264269
 
Week 1 Before the Advent of Database Systems & Fundamental Concepts
Week 1 Before the Advent of Database Systems & Fundamental ConceptsWeek 1 Before the Advent of Database Systems & Fundamental Concepts
Week 1 Before the Advent of Database Systems & Fundamental Concepts
 
Ijetcas14 347
Ijetcas14 347Ijetcas14 347
Ijetcas14 347
 
The three level of data modeling
The three level of data modelingThe three level of data modeling
The three level of data modeling
 
Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbms
 
Week 1 Lab Directions
Week 1 Lab DirectionsWeek 1 Lab Directions
Week 1 Lab Directions
 
Vision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsVision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result Records
 
Identity Resolution across Different Social Networks using Similarity Analysis
Identity Resolution across Different Social Networks using Similarity AnalysisIdentity Resolution across Different Social Networks using Similarity Analysis
Identity Resolution across Different Social Networks using Similarity Analysis
 

Viewers also liked

C2i2e powerpoint
C2i2e powerpointC2i2e powerpoint
C2i2e powerpoint
domenjoz
 
Pre-Assessment Material
Pre-Assessment MaterialPre-Assessment Material
Pre-Assessment Materialshakerra_davis
 
Adp l11 practice_template
Adp l11 practice_templateAdp l11 practice_template
Adp l11 practice_template
Chiho Yoshida
 
On Target Recruitment Brochure
On Target Recruitment BrochureOn Target Recruitment Brochure
On Target Recruitment Brochureguymarshall
 
Gerencia de proyectos mapa conceptual
Gerencia de  proyectos mapa conceptualGerencia de  proyectos mapa conceptual
Gerencia de proyectos mapa conceptual
Maryory Perea
 
Web applications tools for learning and teaching
Web applications tools for learning and teachingWeb applications tools for learning and teaching
Web applications tools for learning and teachingnorazilamat
 
C2i2e powerpoint
C2i2e powerpointC2i2e powerpoint
C2i2e powerpoint
domenjoz
 
Kenya trip – horizons week
Kenya trip – horizons weekKenya trip – horizons week
Kenya trip – horizons week13websterkm2
 
Imbunatateste-ti calitatile de leader - aplica tehnicile Lean
Imbunatateste-ti calitatile de leader - aplica tehnicile LeanImbunatateste-ti calitatile de leader - aplica tehnicile Lean
Imbunatateste-ti calitatile de leader - aplica tehnicile Lean
Adrian Oprea
 
Implementarea 5S - primul pas spre imbunatatirea continua
Implementarea 5S - primul pas spre imbunatatirea continuaImplementarea 5S - primul pas spre imbunatatirea continua
Implementarea 5S - primul pas spre imbunatatirea continua
Adrian Oprea
 
Thermodynamics - Heat and Temperature
Thermodynamics - Heat and TemperatureThermodynamics - Heat and Temperature
Thermodynamics - Heat and TemperatureRa Jay
 
Mawlid
MawlidMawlid
Teaching Writing through Genre Studies
Teaching Writing through Genre StudiesTeaching Writing through Genre Studies
Teaching Writing through Genre Studies
sehclark
 
5S Implementation - The first step to continuous improvement
5S Implementation - The first step to continuous improvement5S Implementation - The first step to continuous improvement
5S Implementation - The first step to continuous improvement
Adrian Oprea
 
Innovación impulsada por la comunidad @ CityCampBA
Innovación impulsada por la comunidad @ CityCampBAInnovación impulsada por la comunidad @ CityCampBA
Innovación impulsada por la comunidad @ CityCampBA
Valentín Muro
 
Comisión federal de electricidad
Comisión federal de electricidadComisión federal de electricidad
Comisión federal de electricidad
EX ARTHUR MEXICO
 
ศิลปินในดวงใจ
ศิลปินในดวงใจศิลปินในดวงใจ
ศิลปินในดวงใจMinni Minnie
 
The woodlands home sales rpt january 2016
The woodlands home sales rpt   january 2016The woodlands home sales rpt   january 2016
The woodlands home sales rpt january 2016
Deb Stirling
 

Viewers also liked (20)

C2i2e powerpoint
C2i2e powerpointC2i2e powerpoint
C2i2e powerpoint
 
Pre-Assessment Material
Pre-Assessment MaterialPre-Assessment Material
Pre-Assessment Material
 
Adp l11 practice_template
Adp l11 practice_templateAdp l11 practice_template
Adp l11 practice_template
 
On Target Recruitment Brochure
On Target Recruitment BrochureOn Target Recruitment Brochure
On Target Recruitment Brochure
 
Gerencia de proyectos mapa conceptual
Gerencia de  proyectos mapa conceptualGerencia de  proyectos mapa conceptual
Gerencia de proyectos mapa conceptual
 
Interview tips
Interview tips Interview tips
Interview tips
 
Web applications tools for learning and teaching
Web applications tools for learning and teachingWeb applications tools for learning and teaching
Web applications tools for learning and teaching
 
C2i2e powerpoint
C2i2e powerpointC2i2e powerpoint
C2i2e powerpoint
 
Kenya trip – horizons week
Kenya trip – horizons weekKenya trip – horizons week
Kenya trip – horizons week
 
Imbunatateste-ti calitatile de leader - aplica tehnicile Lean
Imbunatateste-ti calitatile de leader - aplica tehnicile LeanImbunatateste-ti calitatile de leader - aplica tehnicile Lean
Imbunatateste-ti calitatile de leader - aplica tehnicile Lean
 
Implementarea 5S - primul pas spre imbunatatirea continua
Implementarea 5S - primul pas spre imbunatatirea continuaImplementarea 5S - primul pas spre imbunatatirea continua
Implementarea 5S - primul pas spre imbunatatirea continua
 
Thermodynamics - Heat and Temperature
Thermodynamics - Heat and TemperatureThermodynamics - Heat and Temperature
Thermodynamics - Heat and Temperature
 
Mawlid
MawlidMawlid
Mawlid
 
Teaching Writing through Genre Studies
Teaching Writing through Genre StudiesTeaching Writing through Genre Studies
Teaching Writing through Genre Studies
 
5S Implementation - The first step to continuous improvement
5S Implementation - The first step to continuous improvement5S Implementation - The first step to continuous improvement
5S Implementation - The first step to continuous improvement
 
Innovación impulsada por la comunidad @ CityCampBA
Innovación impulsada por la comunidad @ CityCampBAInnovación impulsada por la comunidad @ CityCampBA
Innovación impulsada por la comunidad @ CityCampBA
 
Comisión federal de electricidad
Comisión federal de electricidadComisión federal de electricidad
Comisión federal de electricidad
 
NASA PDR Technical Report
NASA PDR Technical ReportNASA PDR Technical Report
NASA PDR Technical Report
 
ศิลปินในดวงใจ
ศิลปินในดวงใจศิลปินในดวงใจ
ศิลปินในดวงใจ
 
The woodlands home sales rpt january 2016
The woodlands home sales rpt   january 2016The woodlands home sales rpt   january 2016
The woodlands home sales rpt january 2016
 

Similar to Data Convergence White Paper

Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data Fusion
IRJET Journal
 
Open data Websmatch
Open data WebsmatchOpen data Websmatch
Open data Websmatchdata publica
 
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report VisualizationCoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
KaashivInfoTech Company
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banks
Sammy Alvarez
 
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
rahulmonikasharma
 
Week 2 Characteristics & Benefits of a Database & Types of Data Models
Week 2 Characteristics & Benefits of a Database & Types of Data ModelsWeek 2 Characteristics & Benefits of a Database & Types of Data Models
Week 2 Characteristics & Benefits of a Database & Types of Data Models
oudesign
 
Study on Theoretical Aspects of Virtual Data Integration and its Applications
Study on Theoretical Aspects of Virtual Data Integration and its ApplicationsStudy on Theoretical Aspects of Virtual Data Integration and its Applications
Study on Theoretical Aspects of Virtual Data Integration and its Applications
IJERA Editor
 
Authentic and Anonymous Data Sharing with Data Partitioning in Big Data
Authentic and Anonymous Data Sharing with Data Partitioning in Big DataAuthentic and Anonymous Data Sharing with Data Partitioning in Big Data
Authentic and Anonymous Data Sharing with Data Partitioning in Big Data
rahulmonikasharma
 
Implementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record LinkageImplementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record Linkage
IOSR Journals
 
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Editor IJAIEM
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient Algorithm
IOSR Journals
 
H017124652
H017124652H017124652
H017124652
IOSR Journals
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Rinke Hoekstra
 
An effective adaptive approach for joining data in data
An effective adaptive approach for joining data in dataAn effective adaptive approach for joining data in data
An effective adaptive approach for joining data in data
eSAT Publishing House
 
Business Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdfBusiness Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdf
Jayanti Pande
 
Service Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningService Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data Mining
IIRindia
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningEditor IJCATR
 
Chemread – a chemical informant
Chemread – a chemical informantChemread – a chemical informant
Chemread – a chemical informant
eSAT Publishing House
 
Cal Essay
Cal EssayCal Essay
Cal Essay
Sherry Bailey
 

Similar to Data Convergence White Paper (20)

Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data Fusion
 
Open data Websmatch
Open data WebsmatchOpen data Websmatch
Open data Websmatch
 
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report VisualizationCoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banks
 
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
 
Week 2 Characteristics & Benefits of a Database & Types of Data Models
Week 2 Characteristics & Benefits of a Database & Types of Data ModelsWeek 2 Characteristics & Benefits of a Database & Types of Data Models
Week 2 Characteristics & Benefits of a Database & Types of Data Models
 
Study on Theoretical Aspects of Virtual Data Integration and its Applications
Study on Theoretical Aspects of Virtual Data Integration and its ApplicationsStudy on Theoretical Aspects of Virtual Data Integration and its Applications
Study on Theoretical Aspects of Virtual Data Integration and its Applications
 
Authentic and Anonymous Data Sharing with Data Partitioning in Big Data
Authentic and Anonymous Data Sharing with Data Partitioning in Big DataAuthentic and Anonymous Data Sharing with Data Partitioning in Big Data
Authentic and Anonymous Data Sharing with Data Partitioning in Big Data
 
Implementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record LinkageImplementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record Linkage
 
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient Algorithm
 
H017124652
H017124652H017124652
H017124652
 
rscript_paper-1
rscript_paper-1rscript_paper-1
rscript_paper-1
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
An effective adaptive approach for joining data in data
An effective adaptive approach for joining data in dataAn effective adaptive approach for joining data in data
An effective adaptive approach for joining data in data
 
Business Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdfBusiness Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdf
 
Service Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningService Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data Mining
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data Mining
 
Chemread – a chemical informant
Chemread – a chemical informantChemread – a chemical informant
Chemread – a chemical informant
 
Cal Essay
Cal EssayCal Essay
Cal Essay
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

Data Convergence White Paper

  • 1. Open Data Convergence WHITE PAPER International Institute of Information Technology, Bangalore
  • 2. Table of Contents Contents Abstract _____________________________________________________________ 1 Introduction __________________________________________________________ 2 Problem Definition_____________________________________________________ 3 Solution _____________________________________________________________ 4 Benefits _____________________________________________________________ 7 Conclusion __________________________________________________________ 8 Contact Information____________________________________________________ 9
  • 3. Open Data Convergence | White Paper Pg. 01 Abstract Open Data initiative by data.gov.in has provide a great platform for sharing the datasets belonging to 55+ Departments consist of around 5000 datasets till date. These datasets are available in various format such as excel, xml, csv, etc. But the consumer of these datasets are facing few major issue, first one is Data Integration, currently there is no mean to check what all dataset are linked with each other semantically. Second issue is Data Quality, there is no uniform data representation format, no quality check metric to show missing/invalid data. In this white paper we are proposing a concept termed as ‘Data Convergence’ which provides unified view of data collected from different sources (or departments) in different formats. To demonstrate this we built a software application which will provide unified view by means of HTTP API in JSON/XML formats. Main objective is to maintain loose coupling between underline storage structure and consumer client.
  • 4. Open Data Convergence | White Paper Pg. 02 “Make each bit count – by creating network of datasets” Introduction Major shortcoming in most of open data portal (such as http://data.gov.in ) is that they do not provide any API to their datasets. Some of portal like http://data.worldbank.org/, http://data.gov provide API but they are limited to produce output for a particular dataset only, yes they do have custom API builder but no way to merge/combine two or more different dataset which are having one or more common attribute/dimension. Just having a lot of data is useless unless we can derive some useful information from it. For example from the current representation of open data, It is not obvious to identify effect of weather on number of tourist visited as both of this dataset belongs to two different department earth science and tourism respectively. And consumer may not be aware of existence of dependent dataset which could have worked as catalyst in his/her analysis. It would have been much better to have an API which will identify how datasets are connected / linked on which attribute/dimension and produce a unified view of multiple datasets. Linked Dataset is a basement of complete Data Convergence system which consists of large number of graphs where a node represents dataset and edge between them denotes the existence of relationship. Linked Data helps to identify correlation among different parameters in different dataset. Common flaws such as different file format, lack of consistent representation, missing/ invalid values can also be solved to certain extent in preprocessing phase of data convergence.
  • 5. Pg. 03 Open Data Convergence | White Paper Problem Definition In most of Open data portal consumer has to browse through a collection of datasets and select one dataset at a time. User has no mean to know relationship/linking among different datasets. Unless one goes through all dataset manually he/she won’t get a clear idea about dependency, connectivity among datasets on certain parameter which could have served as more aggregated information. There should be provision where consumer can provide what all data he/she wants, in which format as well as quantity and system should be able to identify how to process this query and display relevant information. There should also be a mechanism for searching a dataset not only by its name / department, but also by its content / field / attribute / dimension. System should be capable of displaying all linked datasets for a given dataset, where user can visualize a graph of linked datasets, also it must be able to converge this datasets and produce a unified view.
  • 6. Open Data Convergence | White Paper Pg. 04 “Leveraging open data platform with the help of converged datasets” Solution Overview Data convergence system at a high level takes three inputs from consumer viz., what all datasets user wants to converge? In which format? How many records per page? Later it processes the input and generates API which will give unified view of datasets. Description A typical use case in Data convergence system is stated below 1. Data convergence system has a Smart API builder which provides two kind of user interface to select a collection of datasets. 1.1 Navigational a. Step by step navigation to select datasets provided with filter such as country, state, district, etc., b. User has to first select department, then master dataset and finally choose as many to select all data. 1.2 Search based a. User can search by data, dataset name, dataset description, and dataset field/attribute/dimension name in open data repository. b. Select any number of datasets which are displayed in query result. 2. From a given collection of datasets system identifies connected components. For example- Assume user selects dataset of Authorize Travel agency, tourism statistics, rainfall, storm and temperature. Then two connected component are identified viz.,{[ Authorize Travel agency, tourism statistics], [rainfall, storm,
  • 7. Pg. 05 Open Data Convergence | White Paper temperature]} depending upon what relationship is specified by dataset producer while uploading into open data repository 3. API is generated for each connected components which provide flexibility to modify number of records per page, offset and output format json/xml json/xml. Sample API: http://opendata.com/api.jsp?key=fbd7939d674997cdb4692d34de8633c4& offset=1&limit=10& format=json 4. Whenever user perform GET on API using key our data convergence system will retrieve metadata of API query create a temporary unified view and return result set in JSON/XML format 5. JSON/XML are standard data exchange format, they can be easily integrated into any application. Data Convergence system also provides visualization of all connected dataset up to 3rd level (extensible Few more functionalities like filtering attributes, records in unified extensible). view / result set can be added. (fig. Data Convergence System Block Diagram)
  • 8. Pg. 06 Open Data Convergence | White Paper Following image show converged output of two different dataset Road Accident and Person Injured for a state Arunachal Pradesh in two different format json and xml. (fig. API response in JSON and XML format)
  • 9. Open Data Convergence | White Paper Pg. 07 Benefits Data Convergence System will provide a single uniform HTTP API access to a unified view of dataset. It also provides user a facility to visualize the network of connected datasets. Key Highlights  Easy Access to converged data through API,  Standard data exchange format (JSON and XML) are supported  Flexible (select dimension as required)  Unified view support real time data  Loose coupling  Notifications generated with change data capture (CDC) (fig. Visualization of Datasets relationship)
  • 10. Pg. 08 Open Data Convergence | White Paper Conclusion An attempt is been made to introduce a concept which will provide an unified view for Datasets present in open data portal based on the concept of Data Convergence which will leverage the way of accessing open data in this document. Consumer of Open Data can now have a much more idea and knowledge of Datasets and their relationships, eventually it will stimulate their data analysis process.
  • 11. Open Data Convergence | White Paper Pg. 09 Contact Information Prof. Chandrashekhar Ramanathan Tel: +91 80 4140 7777 rc@iiitb.ac.in Bisen Vikrantsingh M. Tel: +91 8792708719 Bisenvikrantsingh.mohansingh@iiitb.org Institute IIIT Bangalore IIIT-Bangalore,26/C, Electronics City, Hosur Road, Bangalore, 560100 Tel+91 80 4140 7777 / 2852 7627 http://www.iiitb.ac.in Kodamasimham Pridhvi Tel: +91 8123160887 Pridhvi.kodamasimham @iiitb.org