SlideShare a Scribd company logo
1 of 19
Quantifying the Value of
Federated Datasets in Earth
Observation Information
Mining and Analytics
P.G. Marchetti (*), M. Iapaolo (**)
* European Space Agency (ESA/ESRIN)
EOP Research and Ground Segment Technology Section
** Randstad Italia c/o ESA/ESRIN
Image Information Mining Conference

05/03/2014
Outline

1. Introduction
2. EO Datasets Value

3. Representation Capacity and Information Content for EO Datasets
4. Initial Results
5. Towards “Big Data”
6. Future work and perspectives

Image Information Mining Conference

05/03/2014
Introduction
1. Volumes of EO data systematically collected, processed and stored
is continuously increasing
2. It becomes more and more difficult evaluating their “value”

3. Datasets are made available by different institutions
(agencies, commercial providers, etc.)  Federation of EO datasets
4. The Dataset Value is a vague concept: inherent information
content, its possible exploitation, relation with user’s application
needs, etc.

How to evaluate the value of an EO dataset in a typical scenario

of a network of federated datasets?
Image Information Mining Conference

05/03/2014
EO Datasets Value
Communication networks: the value of a network (its growth
potential) grows as a quadratic function (n2) with the number of
network nodes n (Metcalf’s Law)
Generic concept of value (importance) applicable to a wide range of
natural phenomena (occurrences of words in a text, size of population
of big cities, etc.): the kth ranked item has a value (frequency, size) of
about 1/k of the first one (Zipf’s Law)
Total Value = sum of decreasing 1/k values over all the n items
≈

Applying to all n nodes:
Image Information Mining Conference

log(n)

Total Value
05/03/2014

≈

n log(n)
EO Datasets Value

Plot of nlog(n) growing function, compared
with the linear and quadratic one
The origin is set on n=1.

The Crossover Point with the Zipf’s law is obtained for larger n
with respect to the Metcalf’s law
Image Information Mining Conference

05/03/2014
EO Datasets Value and
Information Content
1. In the EO context, it is of paramount importance to assess the value
of datasets from the information content point of view (neither
from growth potential nor from a market value )

2. The actual exploitation of federated datasets is mainly based on
their information content, extracted through time series analysis
and image information mining techniques and analytics
3. The relative value (i.e. the information content) of an EO dataset
permits to:
 estimate the number of EO products (or samples) to be used
 select which datasets are relevant for an analysis

Need for a theoretical framework for the assessment of the
value (information content) of a federation of EO datasets
Image Information Mining Conference

05/03/2014
Representation Capacity
Given a family of n non-overlapping datasets in a
federation, D={D1,D2,…,Dn};
Select from D a sample S={S1, S2, …, Sn}, where each Sh is contained

in Dh (h=1,2,…,n);
Our aim is here to assess and quantify how much S is

representative of D, and how it can characterise the value of D
The Representation Capacity in D, K(D) is a measure for the
degree of arbitrariness in choosing the sample S from D
K(D) should be a non-decreasing function f(x) where x is the size of
the set from which the images must be extracted
Image Information Mining Conference

05/03/2014
Representation Capacity

Image Information Mining Conference

05/03/2014
Information Content

Image Information Mining Conference

05/03/2014
Representation Capacity
1. The Representation Capacity of an EO product dataset D is
proportional to the log of the cardinality of the EO dataset
2. The value of a federation of datasets should take into account the
Representation Capacity, and therefore grows with the log of the
size of the individual datasets

3. In order to evaluate and compare different datasets in a federation
for further processing, a general methodology to preserve the
relative information content has been defined

Image Information Mining Conference

05/03/2014
Comments
1. Additional constraints could be imposed by further
processing, image mining, time series analysis and
statistics/analytics objectives and requirements
2. The simplified approach presented in this paper could allow to
assess the value (information content) a federation of EO dataset

according to the Shannon’s theoretical framework
3. This approach should complement the one derived from the Zipf’s
law, based on the number n of datasets in the federation, to help

decision makers in evaluating the wealth of available information.
Image Information Mining Conference

05/03/2014
Information Content

Image Information Mining Conference

05/03/2014
Initial results 1-3
1. General approach for the assessment of the value – in terms of
information content – of a federation of EO datasets
2. Interpretation of results under the Shannon information theoretical
framework:
o

The information content of a dataset is proportional to its
cardinality

o

Considering a sample of data extracted from the whole
dataset, the Representation Capacity of the dataset is
proportional to the log of its cardinality

o

As a consequence, the value (information content) of a
federation of EO datasets grows with the log of the size of the
individual datasets

Image Information Mining Conference

05/03/2014
Initial results 2-3
Oops, if we have a look at the papers…
Number of papers published on IEEE
3000

search performed on 14.02.2014

2500

2000

1500
Series1
1000

500

0

ESA Presentation | DD/MM/YYYY | Slide 14
ESA UNCLASSIFIED – For Official Use
Initial results 3-3
The identification of a general method for evaluating, comparing and
selecting different datasets cannot ignore other information elements
like:
•

the papers published and their
quality, content, relevance, citations and impact factors e.g. (see
Hirsch [1]) h-index

•

the papers published and related parameters:
mission, sensor, area, …

•

the web pages published (see PageRank [2])

•

Social media

•

…

Image Information Mining Conference

05/03/2014
Future Work, towards “Big Data”
1. New models for research and service support are emerging in the Earth
Observation context / Data availability from forthcoming missions will
increase rapidly
2. Facilities for EO dissemination and processing services, geographically
distributed in a federated domain, largely scalable with reliable Quality
of Services are urgently needed
3. Federated domains shall federate both computing and storage
resources. The federation is valued and sustained by the underpinning
Earth Observation datasets and their information content
4. To value datasets federations in wider contexts (e.g. Big Data, Web 2.0)
R&D activities are needed to fully exploit the information they contain
5. A programmatic framework to sustain such R&D activities must be setup to cover the various aspects involved (IIM, TS analysis, EO data
analytics, multi-dimensional databases, semantic web, visual
analytics, etc.)
Image Information Mining Conference

05/03/2014
Future Work, towards “Big Data”
1. The programmatic framework should span a time frame of 5-10 years
2. It should include a strong user validation step (possibly involving
hundreds of users and laboratories)
3. Should be extended to include other domains (not only EO!!): Earth
and Space Science, Engineering … see the announced “Big Data from
Space” Conference !
4. Recent work (Mazzucato) demonstrates the benefits to fund large and
strongly supported research programmes (venture capital and market
will follow, exploiting former consistent investments by state funded
institutions)
5. Research on value-enahnced search for EO data may help in adding
value and is needed to exploit to the great variety of data which will
be made available!

Image Information Mining Conference

05/03/2014
References

[1] J.E. Hirsch, An index to quantify an individual's scientific research
output, Proceedings National Academy of Science 46:16569, 2005
[2] L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank Citation
Ranking: Bringing Order to the Web. Technical Report. Stanford
InfoLab., 1999

ESA Presentation | DD/MM/YYYY | Slide 18
ESA UNCLASSIFIED – For Official Use
Thank you!!

Image Information Mining Conference

05/03/2014

More Related Content

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Featured

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

Featured (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Image Information Mining Conference: The Sentinels Era

  • 1. Quantifying the Value of Federated Datasets in Earth Observation Information Mining and Analytics P.G. Marchetti (*), M. Iapaolo (**) * European Space Agency (ESA/ESRIN) EOP Research and Ground Segment Technology Section ** Randstad Italia c/o ESA/ESRIN Image Information Mining Conference 05/03/2014
  • 2. Outline 1. Introduction 2. EO Datasets Value 3. Representation Capacity and Information Content for EO Datasets 4. Initial Results 5. Towards “Big Data” 6. Future work and perspectives Image Information Mining Conference 05/03/2014
  • 3. Introduction 1. Volumes of EO data systematically collected, processed and stored is continuously increasing 2. It becomes more and more difficult evaluating their “value” 3. Datasets are made available by different institutions (agencies, commercial providers, etc.)  Federation of EO datasets 4. The Dataset Value is a vague concept: inherent information content, its possible exploitation, relation with user’s application needs, etc. How to evaluate the value of an EO dataset in a typical scenario of a network of federated datasets? Image Information Mining Conference 05/03/2014
  • 4. EO Datasets Value Communication networks: the value of a network (its growth potential) grows as a quadratic function (n2) with the number of network nodes n (Metcalf’s Law) Generic concept of value (importance) applicable to a wide range of natural phenomena (occurrences of words in a text, size of population of big cities, etc.): the kth ranked item has a value (frequency, size) of about 1/k of the first one (Zipf’s Law) Total Value = sum of decreasing 1/k values over all the n items ≈ Applying to all n nodes: Image Information Mining Conference log(n) Total Value 05/03/2014 ≈ n log(n)
  • 5. EO Datasets Value Plot of nlog(n) growing function, compared with the linear and quadratic one The origin is set on n=1. The Crossover Point with the Zipf’s law is obtained for larger n with respect to the Metcalf’s law Image Information Mining Conference 05/03/2014
  • 6. EO Datasets Value and Information Content 1. In the EO context, it is of paramount importance to assess the value of datasets from the information content point of view (neither from growth potential nor from a market value ) 2. The actual exploitation of federated datasets is mainly based on their information content, extracted through time series analysis and image information mining techniques and analytics 3. The relative value (i.e. the information content) of an EO dataset permits to:  estimate the number of EO products (or samples) to be used  select which datasets are relevant for an analysis Need for a theoretical framework for the assessment of the value (information content) of a federation of EO datasets Image Information Mining Conference 05/03/2014
  • 7. Representation Capacity Given a family of n non-overlapping datasets in a federation, D={D1,D2,…,Dn}; Select from D a sample S={S1, S2, …, Sn}, where each Sh is contained in Dh (h=1,2,…,n); Our aim is here to assess and quantify how much S is representative of D, and how it can characterise the value of D The Representation Capacity in D, K(D) is a measure for the degree of arbitrariness in choosing the sample S from D K(D) should be a non-decreasing function f(x) where x is the size of the set from which the images must be extracted Image Information Mining Conference 05/03/2014
  • 8. Representation Capacity Image Information Mining Conference 05/03/2014
  • 9. Information Content Image Information Mining Conference 05/03/2014
  • 10. Representation Capacity 1. The Representation Capacity of an EO product dataset D is proportional to the log of the cardinality of the EO dataset 2. The value of a federation of datasets should take into account the Representation Capacity, and therefore grows with the log of the size of the individual datasets 3. In order to evaluate and compare different datasets in a federation for further processing, a general methodology to preserve the relative information content has been defined Image Information Mining Conference 05/03/2014
  • 11. Comments 1. Additional constraints could be imposed by further processing, image mining, time series analysis and statistics/analytics objectives and requirements 2. The simplified approach presented in this paper could allow to assess the value (information content) a federation of EO dataset according to the Shannon’s theoretical framework 3. This approach should complement the one derived from the Zipf’s law, based on the number n of datasets in the federation, to help decision makers in evaluating the wealth of available information. Image Information Mining Conference 05/03/2014
  • 12. Information Content Image Information Mining Conference 05/03/2014
  • 13. Initial results 1-3 1. General approach for the assessment of the value – in terms of information content – of a federation of EO datasets 2. Interpretation of results under the Shannon information theoretical framework: o The information content of a dataset is proportional to its cardinality o Considering a sample of data extracted from the whole dataset, the Representation Capacity of the dataset is proportional to the log of its cardinality o As a consequence, the value (information content) of a federation of EO datasets grows with the log of the size of the individual datasets Image Information Mining Conference 05/03/2014
  • 14. Initial results 2-3 Oops, if we have a look at the papers… Number of papers published on IEEE 3000 search performed on 14.02.2014 2500 2000 1500 Series1 1000 500 0 ESA Presentation | DD/MM/YYYY | Slide 14 ESA UNCLASSIFIED – For Official Use
  • 15. Initial results 3-3 The identification of a general method for evaluating, comparing and selecting different datasets cannot ignore other information elements like: • the papers published and their quality, content, relevance, citations and impact factors e.g. (see Hirsch [1]) h-index • the papers published and related parameters: mission, sensor, area, … • the web pages published (see PageRank [2]) • Social media • … Image Information Mining Conference 05/03/2014
  • 16. Future Work, towards “Big Data” 1. New models for research and service support are emerging in the Earth Observation context / Data availability from forthcoming missions will increase rapidly 2. Facilities for EO dissemination and processing services, geographically distributed in a federated domain, largely scalable with reliable Quality of Services are urgently needed 3. Federated domains shall federate both computing and storage resources. The federation is valued and sustained by the underpinning Earth Observation datasets and their information content 4. To value datasets federations in wider contexts (e.g. Big Data, Web 2.0) R&D activities are needed to fully exploit the information they contain 5. A programmatic framework to sustain such R&D activities must be setup to cover the various aspects involved (IIM, TS analysis, EO data analytics, multi-dimensional databases, semantic web, visual analytics, etc.) Image Information Mining Conference 05/03/2014
  • 17. Future Work, towards “Big Data” 1. The programmatic framework should span a time frame of 5-10 years 2. It should include a strong user validation step (possibly involving hundreds of users and laboratories) 3. Should be extended to include other domains (not only EO!!): Earth and Space Science, Engineering … see the announced “Big Data from Space” Conference ! 4. Recent work (Mazzucato) demonstrates the benefits to fund large and strongly supported research programmes (venture capital and market will follow, exploiting former consistent investments by state funded institutions) 5. Research on value-enahnced search for EO data may help in adding value and is needed to exploit to the great variety of data which will be made available! Image Information Mining Conference 05/03/2014
  • 18. References [1] J.E. Hirsch, An index to quantify an individual's scientific research output, Proceedings National Academy of Science 46:16569, 2005 [2] L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab., 1999 ESA Presentation | DD/MM/YYYY | Slide 18 ESA UNCLASSIFIED – For Official Use
  • 19. Thank you!! Image Information Mining Conference 05/03/2014