Díaz, P., Masó, J., Sevillano, E., Ninyerola, M., Zabala, A., Serral, I., Pons, X. (2012). Analysis of quality metadata in the
GEOSS Clearinghouse. International Journal of Spatial Data Infrastructures Research. Vol 7 (2012), pp. 352-377.
Metadata syncronisation with GeoNetwork - a users perspectiveARDC
Metadata synchronisation with GeoNetwork - a users perspective: making metadata great again.
Presented at the ANDS facilitated GeoNetwork Community of Practice on April 3rd, 2017 in Canberra.
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...Jose Emilio Labra Gayo
This document describes validating statistical index data represented in RDF using SPARQL queries. It discusses the WebIndex project, which measures the impact of the web in different countries. Raw data from over 60 countries and 85 indicators is converted to RDF and stored in a triplestore. A three-level validation process using SPARQL queries ensures the integrity of the RDF data based on the RDF Data Cube vocabulary and a Computex profile defining statistical computations. While expressive, SPARQL has some limitations for complex validation tasks. RDF profiles provide a declarative way to specify dataset constraints and validation rules.
The document discusses topological data analysis (TDA) and its use cases. It first describes how TDA can be used for data visualization. It then provides three examples of using TDA for insights discovery in Titanic passenger data, energy consumption data, and high-dimensional customer data. Finally, it discusses how different methods including TDA, PCA, and MRMR can be evaluated on a energy consumption modeling task using metrics like MAPE and F1 score. TDA was able to achieve high reduction rates while maintaining good performance.
This document presents an approach for semantically annotating RESTful web services. The approach involves (1) obtaining syntactic descriptions of services, (2) semantically enriching parameters using ontologies like DBpedia and GeoNames, and (3) checking annotations by executing services. The approach was tested on 60 services, annotating over 1500 parameters total by matching to ontologies or using suggestion/synonym services. Evaluation showed the approach improved recall over initial parameter matches. Future work aims to improve annotation standards and integration.
This document discusses the LDAP Synchronization Connector (LSC), an open-source tool for synchronizing data between different data sources like LDAP directories, SQL databases, and files. It provides an overview of LSC's features like connectors for various data sources, synchronization rules, logging capabilities, and support for Active Directory. The document also describes how to configure LSC to synchronize between an OpenLDAP directory and Active Directory, including handling passwords and attribute mapping between the different schemas.
Metadata syncronisation with GeoNetwork - a users perspectiveARDC
Metadata synchronisation with GeoNetwork - a users perspective: making metadata great again.
Presented at the ANDS facilitated GeoNetwork Community of Practice on April 3rd, 2017 in Canberra.
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...Jose Emilio Labra Gayo
This document describes validating statistical index data represented in RDF using SPARQL queries. It discusses the WebIndex project, which measures the impact of the web in different countries. Raw data from over 60 countries and 85 indicators is converted to RDF and stored in a triplestore. A three-level validation process using SPARQL queries ensures the integrity of the RDF data based on the RDF Data Cube vocabulary and a Computex profile defining statistical computations. While expressive, SPARQL has some limitations for complex validation tasks. RDF profiles provide a declarative way to specify dataset constraints and validation rules.
The document discusses topological data analysis (TDA) and its use cases. It first describes how TDA can be used for data visualization. It then provides three examples of using TDA for insights discovery in Titanic passenger data, energy consumption data, and high-dimensional customer data. Finally, it discusses how different methods including TDA, PCA, and MRMR can be evaluated on a energy consumption modeling task using metrics like MAPE and F1 score. TDA was able to achieve high reduction rates while maintaining good performance.
This document presents an approach for semantically annotating RESTful web services. The approach involves (1) obtaining syntactic descriptions of services, (2) semantically enriching parameters using ontologies like DBpedia and GeoNames, and (3) checking annotations by executing services. The approach was tested on 60 services, annotating over 1500 parameters total by matching to ontologies or using suggestion/synonym services. Evaluation showed the approach improved recall over initial parameter matches. Future work aims to improve annotation standards and integration.
This document discusses the LDAP Synchronization Connector (LSC), an open-source tool for synchronizing data between different data sources like LDAP directories, SQL databases, and files. It provides an overview of LSC's features like connectors for various data sources, synchronization rules, logging capabilities, and support for Active Directory. The document also describes how to configure LSC to synchronize between an OpenLDAP directory and Active Directory, including handling passwords and attribute mapping between the different schemas.
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...NLJUG
The document discusses the benefits of combining modularity through domain-driven design and bounded contexts with database migration using Liquibase. This allows independent migration of database schemas for each domain module. It also addresses challenges around cross-domain transactions, search, and migration of multiple module versions. Overall, the approach aims to make systems more resilient to change by containing the impact within loosely coupled domain modules.
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Spark Summit
Realtime analytics over large datasets has become an increasing wide-spread demand, over the past several years, Hadoop ecosystem has been continuously evolving, even complex queries over large datasets can be realized in an interactive fashion with distributed processing framework like Apache Spark, new paradigm of efficient storage were introduced as well to facilitate data processing framework, such as Apache Parquet, ORC provide fast scan over columnar data format, and Apache Hbase offers fast ingest and millisecond scale random access.
In this talk, we will outline Apache Carbondata, a new addition to open source Hadoop ecosystem which is an indexed columnar file format aimed for bridging the gap to fully enable real-time analytics abilities. It has been deeply integrated with Spark SQL and enables dramatic acceleration of query processing by leveraging efficient encoding/compression and effective predicate push down through Carbondata’s multi-level index technique.
This document provides an overview of machine learning with Azure. It discusses various machine learning concepts like classification, regression, clustering and more. It outlines an agenda for a workshop on the topic that includes experiments in Azure ML Studio, publishing models as web services, and using various Azure data sources. The document encourages participants to clone a GitHub repo for sample code and data and to sign up for an Azure ML Studio account.
This document summarizes a scientific data cataloging framework developed by Datamedici. The framework uses a central Solr server to maintain metadata from distributed agents that parse scientific data products. It allows for flexible querying and supports both static and dynamic metadata fields. The solution architecture includes the Solr server, distributed agents to extract and send metadata, and a query interface for users to search metadata and locate data products.
The document describes ArcGIS add-ins developed by David Eliseo Martinez Castellanos for the Salvadoran Ministry of Environment and Natural Resources to assist in quality control of LIDAR data for El Salvador. Tools include loaders to import LAS tiles into a PostgreSQL database with quality metrics, toolbars to evaluate tiles against criteria, and editors to mark defects on products. The tools were created with Python and C# for use in ArcGIS and integrate with the database for automated and manual quality reviews.
How to Leverage Big Data to Deliver Smart LogisticsAlibaba Cloud
See Webinar Recording at https://resource.alibabacloud.com/webinar/detail.htm?webinarId=13
Gain an introduction to how Big Data and AI is currently used in industry to deliver smart logistics. This webinar covers various practical components including vehicle-cargo matching, route planning, and delivery optimization as well as a case study on China's major food delivery platform Ele.me.
The webinar also includes a segment on how data science teams integrate big data contests with their real-world AI applications as well as an introduction to Alibaba Cloud's own Tianchi Big Data Contest.
This webinar is ideally suited for IT managers from large enterprises who wish to improve their understanding of Big Data and AI technology, as well as researchers and developers who are interested in solving real-world machine learning challenges.
More Webinars: https://resource.alibabacloud.com/webinar/index.htm
Tianchi: https://tianchi.aliyun.com/
This document is a dissertation submitted by Theofylaktos Papapanagiotou for the degree of Master of Science. It evaluates different grid performance monitoring tools and information services for distributing monitoring data in a multi-level architecture. It describes how tools like Ganglia, Nagios, BDII, and WSRF can be used to monitor load averages on grid nodes, aggregate the data, and present performance visualizations. The dissertation aims to understand how standards like the GLUE schema are used to organize information in the services and evaluate which approach better supports the multi-level monitoring model.
Implementing a VO archive for datacubes of galaxiesJose Enrique Ruiz
The document describes implementing a VO archive for galaxy datacubes. It details collections of FITS files containing 2D spatial and spectral data on galaxies from two telescopes. A MySQL database stores metadata on the datasets extracted from FITS headers using IPython notebooks. The web interface allows discovering, viewing metadata, and accessing the data through use cases like moment maps and channel maps. The archive aims to provide characterization of emission lines and provenance to better understand the radio interferometric data.
Apache CarbonData+Spark to realize data convergence and Unified high performa...Tech Triveni
Challenges in Data Analytics:
Different application scenarios need different storage solutions: HBASE is ideal for point query scenarios but unsuitable for multi-dimensional queries. MPP is suitable for data warehouse scenarios but engine and data are coupled together which hampers scalability. OLAP stores used in BI applications perform best for Aggregate queries but full scan queries perform at a sub-optimal performance. Moreover, they are not suitable for real-time analysis. These distinct systems lead to low resource sharing and need different pipelines for data and application management.
Optimizing Your Supply Chain with the Neo4j GraphNeo4j
With the world’s supply chain system in crisis, it’s clear that better solutions are needed. Digital twins built on knowledge graph technology allow you to achieve an end-to-end view of the process, supporting real-time monitoring of critical assets.
Can ISO 19157 support current NASA data quality metadata?Ted Habermann
ISO 19157 provides a powerful framework for describing quality of Earth science datasets. As NASA migrates towards using that standard, it is important to understand whether and how existing data quality content fits into the ISO 19157 model. This talk demonstrates that fit and concludes that ISO 19157 can include all existing content and also includes new capabilities that can be very useful for all kinds of NASA data users.
The document describes a data science project conducted on streaming log data from Cloudera Movies, an online streaming video service. The goals of the project were to understand which user accounts are used most by younger viewers, segment user sessions to improve site usability, and build a recommendation engine. Key steps included exploring and cleaning the data, classifying users as children or adults using a SimRank approach, clustering user sessions to identify behavior patterns, and predicting user ratings through user-user and item-item similarity models to build a recommendation system. Accuracy of 99.64% was achieved in classifying users.
Predicting query performance and explaining results to assist Linked Data con...Rakebul Hasan
This document presents research on generating explanations for SPARQL query results over linked data. It describes developing a framework for predicting query performance using machine learning models trained on query characteristics. It also explains generating provenance-based explanations for query results by computing why-provenance without annotation. Finally, it discusses representing explanation metadata as linked data and evaluating the impact of explanations through a user study showing explanations improve understanding.
This document discusses various diagnostic tools and query tuning techniques in PostgreSQL, including:
- PostgreSQL workload monitoring tools like Mamonsu and Zabbix Agent for collecting metrics.
- Extensions for tracking resource-intensive queries like pg_stat_statements, pg_stat_kcache, and auto_explain.
- Using pg_profile to detect resource-consuming queries and view statistics on execution time, shared blocks fetched, and I/O waiting time.
- Query tuning techniques like replacing DISTINCT and window functions with LIMIT, optimizing queries using GROUP BY, and creating appropriate indexes.
In the document, questions are asked about various aspects of Pega rules and architecture. Key points covered include:
- The AssigTo parameter is needed to make a router flow shape available.
- The Show-Step-Page method views step page info in XML format for debugging.
- Decision and Ticket flow shapes can be used to call a map value and cancel work processing.
- The prefix "ps" indicates a property value can be directly updated by a worker.
- Standard work party classes include Data-Party-Person, Data-Party-Org, and Data-Party-Operator.
- The Requestor page contains access roles, ruleset lists, and HTTP parameters.
This document provides an overview of how SQL Server processes queries. It discusses the key components like the query processor, parser, algebrizer, optimizer and executor. The query processor breaks queries into logical and physical representations. The optimizer chooses the most efficient execution plan. The executor then runs the query. It also touches on topics like parameter sniffing, locking, deadlocks and the thread pool model.
The document summarizes an agenda for a meetup on data science, design, and technology. The agenda includes presentations on inventory optimization using algorithms and graph structures, and an introduction to graph databases and their applications. The inventory optimization presentation discusses slow-moving items, forecasting, and formulating inventory management as a Markov decision process. The graph databases presentation provides an overview of graph databases and their use cases, popular graph database options, and results from experimenting with Neo4j and JanusGraph on cloud infrastructure.
Optimization Algorithms & Graph Structures: Tools for Better Decision Making
The first topic will present a mathematical approach to make optimal inventory decisions. The second topic will focus on how graph databases can be used when a decision process relies on data relationships.
Inventory management, a new look on a common problem
Inventory management is a problem that every retailer must tackle. Usually this problem is solved using common statistical tools such as Poisson and Normal distributions. However, a large part of these inventories is poorly handled due to their nature. Many items have very few customers’ demands like once a week or even less, but need to be stocked nonetheless for a variety of reasons. This gives rise to a variety of challenges but also opportunities for more flexible algorithmic tools. We will give an overview and rationale of the different models involved, starting from probabilistic forecasting as an input to an inventory control policy optimization with Markov Decision Processes. Even though this can get very technical, we promise to keep the presentation light, accessible and free from equations!
Graph databases: When data relationships really matter
Graph databases have gained popularity in the recent years as a powerful technology that allows understanding of relationships between data records. We have explored some popular graph databases in the market such as Neo4j and JanusGraph running on top of Cassandra and HBase to determine their usability in a production-ready cloud environment. In this talk, we will be sharing our findings and lessons learned. We will also show you a concrete example of a graph database usage to address a specific business problem.
Workshop: «Trade-offs der Schweizer Energiewende» ETH Zürich 2017Paula Díaz
The document presents the results of a study that analyzed scenarios to meet future electricity demand through 2035. It evaluated the technological implications and potential increases to electricity prices under three scenarios: (A) Gas intensive, (B) Gas limited, and (C) 100% renewable energy. Charts and graphs show the electricity generation mix, installed capacity, and levelized cost of electricity for each scenario. The results indicate that a 100% renewable scenario in 2035 would rely heavily on solar, wind, hydropower, and imports but lead to higher electricity costs compared to scenarios using more natural gas.
Poster in the 18th Swiss Global Change DayPaula Díaz
Hydropower is a key actor of the Swiss energy strategy.
What do think stakeholders involved in the decision process of a small hydropower plant?
Here you find the stakeholder perspectives in the context of the Swiss Energy transition to renewables.
More Related Content
Similar to Analysis of quality metadata in the GEOSS Clearinghouse
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...NLJUG
The document discusses the benefits of combining modularity through domain-driven design and bounded contexts with database migration using Liquibase. This allows independent migration of database schemas for each domain module. It also addresses challenges around cross-domain transactions, search, and migration of multiple module versions. Overall, the approach aims to make systems more resilient to change by containing the impact within loosely coupled domain modules.
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Spark Summit
Realtime analytics over large datasets has become an increasing wide-spread demand, over the past several years, Hadoop ecosystem has been continuously evolving, even complex queries over large datasets can be realized in an interactive fashion with distributed processing framework like Apache Spark, new paradigm of efficient storage were introduced as well to facilitate data processing framework, such as Apache Parquet, ORC provide fast scan over columnar data format, and Apache Hbase offers fast ingest and millisecond scale random access.
In this talk, we will outline Apache Carbondata, a new addition to open source Hadoop ecosystem which is an indexed columnar file format aimed for bridging the gap to fully enable real-time analytics abilities. It has been deeply integrated with Spark SQL and enables dramatic acceleration of query processing by leveraging efficient encoding/compression and effective predicate push down through Carbondata’s multi-level index technique.
This document provides an overview of machine learning with Azure. It discusses various machine learning concepts like classification, regression, clustering and more. It outlines an agenda for a workshop on the topic that includes experiments in Azure ML Studio, publishing models as web services, and using various Azure data sources. The document encourages participants to clone a GitHub repo for sample code and data and to sign up for an Azure ML Studio account.
This document summarizes a scientific data cataloging framework developed by Datamedici. The framework uses a central Solr server to maintain metadata from distributed agents that parse scientific data products. It allows for flexible querying and supports both static and dynamic metadata fields. The solution architecture includes the Solr server, distributed agents to extract and send metadata, and a query interface for users to search metadata and locate data products.
The document describes ArcGIS add-ins developed by David Eliseo Martinez Castellanos for the Salvadoran Ministry of Environment and Natural Resources to assist in quality control of LIDAR data for El Salvador. Tools include loaders to import LAS tiles into a PostgreSQL database with quality metrics, toolbars to evaluate tiles against criteria, and editors to mark defects on products. The tools were created with Python and C# for use in ArcGIS and integrate with the database for automated and manual quality reviews.
How to Leverage Big Data to Deliver Smart LogisticsAlibaba Cloud
See Webinar Recording at https://resource.alibabacloud.com/webinar/detail.htm?webinarId=13
Gain an introduction to how Big Data and AI is currently used in industry to deliver smart logistics. This webinar covers various practical components including vehicle-cargo matching, route planning, and delivery optimization as well as a case study on China's major food delivery platform Ele.me.
The webinar also includes a segment on how data science teams integrate big data contests with their real-world AI applications as well as an introduction to Alibaba Cloud's own Tianchi Big Data Contest.
This webinar is ideally suited for IT managers from large enterprises who wish to improve their understanding of Big Data and AI technology, as well as researchers and developers who are interested in solving real-world machine learning challenges.
More Webinars: https://resource.alibabacloud.com/webinar/index.htm
Tianchi: https://tianchi.aliyun.com/
This document is a dissertation submitted by Theofylaktos Papapanagiotou for the degree of Master of Science. It evaluates different grid performance monitoring tools and information services for distributing monitoring data in a multi-level architecture. It describes how tools like Ganglia, Nagios, BDII, and WSRF can be used to monitor load averages on grid nodes, aggregate the data, and present performance visualizations. The dissertation aims to understand how standards like the GLUE schema are used to organize information in the services and evaluate which approach better supports the multi-level monitoring model.
Implementing a VO archive for datacubes of galaxiesJose Enrique Ruiz
The document describes implementing a VO archive for galaxy datacubes. It details collections of FITS files containing 2D spatial and spectral data on galaxies from two telescopes. A MySQL database stores metadata on the datasets extracted from FITS headers using IPython notebooks. The web interface allows discovering, viewing metadata, and accessing the data through use cases like moment maps and channel maps. The archive aims to provide characterization of emission lines and provenance to better understand the radio interferometric data.
Apache CarbonData+Spark to realize data convergence and Unified high performa...Tech Triveni
Challenges in Data Analytics:
Different application scenarios need different storage solutions: HBASE is ideal for point query scenarios but unsuitable for multi-dimensional queries. MPP is suitable for data warehouse scenarios but engine and data are coupled together which hampers scalability. OLAP stores used in BI applications perform best for Aggregate queries but full scan queries perform at a sub-optimal performance. Moreover, they are not suitable for real-time analysis. These distinct systems lead to low resource sharing and need different pipelines for data and application management.
Optimizing Your Supply Chain with the Neo4j GraphNeo4j
With the world’s supply chain system in crisis, it’s clear that better solutions are needed. Digital twins built on knowledge graph technology allow you to achieve an end-to-end view of the process, supporting real-time monitoring of critical assets.
Can ISO 19157 support current NASA data quality metadata?Ted Habermann
ISO 19157 provides a powerful framework for describing quality of Earth science datasets. As NASA migrates towards using that standard, it is important to understand whether and how existing data quality content fits into the ISO 19157 model. This talk demonstrates that fit and concludes that ISO 19157 can include all existing content and also includes new capabilities that can be very useful for all kinds of NASA data users.
The document describes a data science project conducted on streaming log data from Cloudera Movies, an online streaming video service. The goals of the project were to understand which user accounts are used most by younger viewers, segment user sessions to improve site usability, and build a recommendation engine. Key steps included exploring and cleaning the data, classifying users as children or adults using a SimRank approach, clustering user sessions to identify behavior patterns, and predicting user ratings through user-user and item-item similarity models to build a recommendation system. Accuracy of 99.64% was achieved in classifying users.
Predicting query performance and explaining results to assist Linked Data con...Rakebul Hasan
This document presents research on generating explanations for SPARQL query results over linked data. It describes developing a framework for predicting query performance using machine learning models trained on query characteristics. It also explains generating provenance-based explanations for query results by computing why-provenance without annotation. Finally, it discusses representing explanation metadata as linked data and evaluating the impact of explanations through a user study showing explanations improve understanding.
This document discusses various diagnostic tools and query tuning techniques in PostgreSQL, including:
- PostgreSQL workload monitoring tools like Mamonsu and Zabbix Agent for collecting metrics.
- Extensions for tracking resource-intensive queries like pg_stat_statements, pg_stat_kcache, and auto_explain.
- Using pg_profile to detect resource-consuming queries and view statistics on execution time, shared blocks fetched, and I/O waiting time.
- Query tuning techniques like replacing DISTINCT and window functions with LIMIT, optimizing queries using GROUP BY, and creating appropriate indexes.
In the document, questions are asked about various aspects of Pega rules and architecture. Key points covered include:
- The AssigTo parameter is needed to make a router flow shape available.
- The Show-Step-Page method views step page info in XML format for debugging.
- Decision and Ticket flow shapes can be used to call a map value and cancel work processing.
- The prefix "ps" indicates a property value can be directly updated by a worker.
- Standard work party classes include Data-Party-Person, Data-Party-Org, and Data-Party-Operator.
- The Requestor page contains access roles, ruleset lists, and HTTP parameters.
This document provides an overview of how SQL Server processes queries. It discusses the key components like the query processor, parser, algebrizer, optimizer and executor. The query processor breaks queries into logical and physical representations. The optimizer chooses the most efficient execution plan. The executor then runs the query. It also touches on topics like parameter sniffing, locking, deadlocks and the thread pool model.
The document summarizes an agenda for a meetup on data science, design, and technology. The agenda includes presentations on inventory optimization using algorithms and graph structures, and an introduction to graph databases and their applications. The inventory optimization presentation discusses slow-moving items, forecasting, and formulating inventory management as a Markov decision process. The graph databases presentation provides an overview of graph databases and their use cases, popular graph database options, and results from experimenting with Neo4j and JanusGraph on cloud infrastructure.
Optimization Algorithms & Graph Structures: Tools for Better Decision Making
The first topic will present a mathematical approach to make optimal inventory decisions. The second topic will focus on how graph databases can be used when a decision process relies on data relationships.
Inventory management, a new look on a common problem
Inventory management is a problem that every retailer must tackle. Usually this problem is solved using common statistical tools such as Poisson and Normal distributions. However, a large part of these inventories is poorly handled due to their nature. Many items have very few customers’ demands like once a week or even less, but need to be stocked nonetheless for a variety of reasons. This gives rise to a variety of challenges but also opportunities for more flexible algorithmic tools. We will give an overview and rationale of the different models involved, starting from probabilistic forecasting as an input to an inventory control policy optimization with Markov Decision Processes. Even though this can get very technical, we promise to keep the presentation light, accessible and free from equations!
Graph databases: When data relationships really matter
Graph databases have gained popularity in the recent years as a powerful technology that allows understanding of relationships between data records. We have explored some popular graph databases in the market such as Neo4j and JanusGraph running on top of Cassandra and HBase to determine their usability in a production-ready cloud environment. In this talk, we will be sharing our findings and lessons learned. We will also show you a concrete example of a graph database usage to address a specific business problem.
Similar to Analysis of quality metadata in the GEOSS Clearinghouse (20)
Workshop: «Trade-offs der Schweizer Energiewende» ETH Zürich 2017Paula Díaz
The document presents the results of a study that analyzed scenarios to meet future electricity demand through 2035. It evaluated the technological implications and potential increases to electricity prices under three scenarios: (A) Gas intensive, (B) Gas limited, and (C) 100% renewable energy. Charts and graphs show the electricity generation mix, installed capacity, and levelized cost of electricity for each scenario. The results indicate that a 100% renewable scenario in 2035 would rely heavily on solar, wind, hydropower, and imports but lead to higher electricity costs compared to scenarios using more natural gas.
Poster in the 18th Swiss Global Change DayPaula Díaz
Hydropower is a key actor of the Swiss energy strategy.
What do think stakeholders involved in the decision process of a small hydropower plant?
Here you find the stakeholder perspectives in the context of the Swiss Energy transition to renewables.
Q method conference 2016: Do stakeholders' perspectives pose a risk to energy...Paula Díaz
This document summarizes the findings of a Q study on stakeholder perspectives regarding renewable energy decision processes and policy implementation in Switzerland. Four main viewpoints were identified: 1) local production and empowerment, 2) national savings and nature protection, 3) balancing interests, and 4) liberal production. While stakeholders agreed on involving all in decisions, they differed in priorities around economic rationale, energy saving, nature protection and technologies. The perspectives correlated with administrative levels of local, cantonal and national, showing the canton's role in enacting strategy but not prioritizing national interests like efficiency and environment.
This document discusses the resilience of electricity transmission grids in the face of climate change and increasing extreme weather events. It outlines several drivers that are influencing grid development over the next decades, including slow demand growth, high costs of building new lines, and tight capacity margins. Climate change is expected to increase the likelihood of weather-related outages and potential migrations that could impact demand. The document evaluates four pillars of grid resilience: redundant links, isolating outages, restoring services, and repairing/rebuilding infrastructure. It also discusses challenges to expanding grids to improve resilience.
Modeling the energy future of Switzerland after the phase out of nuclear powe...Paula Díaz
In September 2013, the Swiss Federal Office of Energy (SFOE) published the final report of the proposed measures in the context of the Energy Strategy 2050 (ES2050). The ES2050 draws an energy scenario where the nuclear must be substituted by alternative sources. This implies a fundamental change in the energy system that has already been questioned by experts. Therefore, we must analyse in depth the technical implications of change in the Swiss energy mix from a robust baseload power such as nuclear, to an electricity mix where intermittent sources account for higher rates.
Exchanging the Status between Clients of Geospatial Web Services and GIS appl...Paula Díaz
Masó, J., Díaz, P., Riverola, A., Díaz. D. and Pons., X. (2013). Exchanging the Status between Clients of Geospatial Web Services and GIS applications using Atom. In Castillo, O., Douglas, c., Dagan Feng, D. and Lee., J. (Eds.), in proceedings of the International Multi Conference of Engineers and Computer Scientists 2013, Vol I. IMECS, March 2013, Hong Kong. ISBN: 978-988-19251-8-3.
Análisis crítico de los metadatos distribuidos por la IDEC presentacionPaula Díaz
Diaz, P., (2009). Análisis comparativo de los metadatos distribuidos por la IDEC, en: Treballs del Màster en Teledetecció i Sistemes d’Informació Geogràfica, 10ª edició. Universitat Autònoma de Barcelona y CREAF. Bellaterra, Septiembre 2009.
Impact of user concurrency in commonly used OGC map server implementationsPaula Díaz
Masó, J., Díaz, P., Pons, X., Monteagudo, J.L., Serra, J., Aulí, F., (2011). Impact of user concurrency in commonly used OGC map server implementations, en: Proceedings of INFOCOMP. Barcelona, October 2011. ISBN: 978-1-61208-161-8.
Performance of standardized web map servers for remote sensing ImageryPaula Díaz
Masó J., Díaz, P., Pons, X. (2011). Performance of standardized web map servers for remote sensing Imagery, en: Proceedings of Data Flow: From Space to Earth. Applications and interoperability Conference, March 2011, Venice. Corila -Consorzio per la Gestione del Centro di Coordinamento delle Attività di Ricerca Inerenti il Sistema Lagunare di Venezia, pp.64-64. ISBN:9788889405154.
Analysis of quality metadata in the GEOSS Clearinghouse - PosterPaula Díaz
This document analyzes the quality metadata from 97,203 records harvested from the GEOSS clearinghouse. It finds that 19.66% of records contain quality indicators, with the most common being positional accuracy (37.19%) and completeness (35.71%). Lineage is described in 9.53% of records. The analysis demonstrates that while documentation of quality indicators and lineage is still limited, the current status is sufficient to start developing tools to exploit quality information in catalogs.
The importance of geospatial data to calculate the optimal distribution of re...Paula Díaz
Díaz, P., Masó, J. (2013). The importance of geospatial data to calculate the optimal distribution of renewable energies.
Poster in EGU General Assembly 2013, Session ERE – Energy, Resources and the Environment, Vienna, April 2013.
Mapping the evolution of renewable resources and their relation with EROI and...Paula Díaz
Díaz, P., Miao, B., Masó, J. (2013). Mapping the evolution of renewable resources and their relation with EROI and energy policies. In proceedings of the International Symposium on Remote Sensing of Environment (ISRSE35), Beijing, April 2013.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Analysis insight about a Flyball dog competition team's performance
Analysis of quality metadata in the GEOSS Clearinghouse
1. Analysis of the
Quality Metadata in
GEOSS Clearinghouse
QUAlity aware VIsualisation for
the Global Earth Observation
system of systems
SEVILLANO Eva1, DÍAZ Paula2, NINYEROLA Miquel1,
MASÓ Joan2 , ZABALA Alaitz1, PONS Xavier1
1 UAB Universitat Autònoma de Barcelona.
2 CREAF Centre for Ecological Research and Forestry Applications.
2. Objectives
• To get a first analysis of the data quality in the
Clearinghouse
• Analyze the quality contained in the metadata (ISO 19115)
– Quality indicators
www.geoviqua.org
– Quality indicators
– Lineage
– Usage
• Start building components for the GEO Portal
– Quality Broker
– Quality searcher
– Quality visualization
5. Overall Results
• Total metadata records in the
Clearinghouse
– 97203
• Total number of quality
indicators
– 52187
www.geoviqua.org
– 52187
• Metadata records with quality
indicators
– 19107
• Metadata records with lineage
– 10899 (9261 process, 3771
source)
• Metadata with usage
– 1226
6. Quality Scope
• 19.66% Metadata records with quality indicators
– 2.7 quality indicator per metadata record
www.geoviqua.org
14. Quality indicators - Qualitative
600
800
1000
1200
1400
Numberofqualityelements
Quality elements - Conformance measures
www.geoviqua.org
0
200
400
600
Numberofqualityelements
Conformance to specification
Declare conformance
Conformance type
15. Coverage result
(ISO19115-2 extension)
• Clearinghouse record ID: 273234, 273232, 273233, 273235, 273236)
• Only 5 records use this. Bad news for visualizing data + quality maps
• Title: OMNO2e:OMI Column Amount NO2:ColumnAmountNO2CS30
<gmd:DQ_QuantitativeAttributeAccuracy>
<gmd:measureDescription>
<gco:CharacterString>The 'version 003' product is the second public release. It is based on improved
radiance calibration. For details, please see document:
www.geoviqua.org
radiance calibration. For details, please see document:
http://disc.sci.gsfc.nasa.gov/Aura/OMI/OMTO3e_v003.shtml</gco:CharacterString>
</gmd:measureDescription>
<gmd:result><gmi:QE_CoverageResult>
<gmi:spatialRepresentationType><gmd:MD_SpatialRepresentationTypeCode
codeList="./resources/codeList.xml#MD_SpatialRepresentationTypeCode"
codeListValue="grid">grid</gmd:MD_SpatialRepresentationTypeCode></gmi:spatialRepresentationType>
<gmi:resultFile gco:nilReason="missing" />
<gmi:resultFormat>
<gmd:MD_Format>
<gmd:name><gco:CharacterString>CF-netCDF</gco:CharacterString></gmd:name>
</gmd:MD_Format>
</gmi:resultFormat>
</gmi:QE_CoverageResult></gmd:result>
</gmd:DQ_QuantitativeAttributeAccuracy>
19. LI_ProcessStep with LI_Source
Example
Clearinghouse record ID 131007 (simplified)
• Compile survey input data from the best and most current survey records.
– BLM database of the index to all official (microfilm, CD, other) BLM survey records.
– USFS survey records.
– Private land surveyor records
– GCDB Data Collection Attribute Definitions Version 2.0, Appendix A, 2/14/1991. Survey records
used - source abbreviations.
• Compile listings of known locations of PLSS corners.
– USGS topographic quadrangles and other sources.
– USC&GS published coordinate data.
– NGS published coordinate data.
– BLM global positioning Data.
– USFS global positioning data.
• Coordinates of control stations are entered into a control data base with associated
reliabilities.
• Topologically correct GIS coverages are modified to use FGDC compliant naming
www.geoviqua.org
• Topologically correct GIS coverages are modified to use FGDC compliant naming
conventions and then loaded into the LSI database. These layers can then be
downloaded as shapefiles through the LSI website.
• GCDB Data was downloaded for Kiowa and Cheyenne Counties, Colorado.
– C:fgis_datasandzippedkiowatwnshp.shp.xml
• Metadata imported and data was exported from regions format to shapefile format
• Dataset copied.
– C:fgis_datasanddatabasedataplssck_gcdb_region_township
• Source Contribution: Survey data in the form of official (microfilm, CD, other) survey
and BLM, abstracted into a vector digital format.online
• Source Contribution: Survey and control data from the Cartographic Feature File
(CFF) data set.disc
• Source Contribution: Digitized control data from standard topological quadrangle
sheets.disc
20. LI_Lineage: LI_Source
• 6.02% metadata records (5851)
contain direct list of the data
sources.
– 1.85% (1798) with temporal extent
class LI_Source_only
LI_Lineage
+ statement :CharacterString [0..1]
+ scope :DQ_Scope [0..*]
constraints
{"source" role is mandatory if LI_Lineage.statement
and "processStep" role are not documented}
Metadata Information::MD_Metadata
+resourceLineage
0..*
www.geoviqua.org
• Gives credit (attribution, and
eventually some trust on them)
• If quality indicators are not
provided for the dataset, the
quality indicators from sources
can be a clue.
LI_Source
+ description :CharacterString [0..1]
+ sourceSpatialResolution :MD_Resolution [0..1]
+ sourceReferenceSystem :MD_ReferenceSystem [0..1]
+ sourceCitation :CI_Citation [0..1]
+ sourceMetadata :CI_Citation [0..*]
+ scope :DQ_Scope [0..*]
constraints
{"description" is mandatory if "scope" is not documented}
{"scope" is mandatory if "description" is not documented}
and "processStep" role are not documented}
{"processStep" role is mandatory if
LI_Lineage.statement and "source" role are not
documented}
+source 0..*
21. LI_Lineage: LI_ProcessStep
• 8.26% metadata records
(8035) contain the direct
list of the processes
without sources
– 292 (0.30%) contain date
class From_LI_ProcessStep_to_LI_Source
LI_Lineage
+ statement :CharacterString [0..1]
+ scope :DQ_Scope [0..*]
constraints
Metadata Information::MD_Metadata
+resourceLineage0..*
www.geoviqua.org
• With the order of these
processes.
• If quality indicators are not
provided for the dataset, it’s
difficult to infer resource
quality with only a process
list
LI_ProcessStep
+ description :CharacterString
+ rationale :CharacterString [0..1]
+ stepDateTime :TM_Primitive [0..*]
+ processor :CI_ResponsiblePartyInfo [0..*]
+ reference :CI_Citation [0..*]
+ scope :DQ_Scope [0..*]
constraints
{"source" role is mandatory if LI_Lineage.statement
and "processStep" role are not documented}
{"processStep" role is mandatory if
LI_Lineage.statement and "source" role are not
documented}
+processStep 0..*
22. Complete Provenance:
MD_ProcessStep with MD_Source
• 1.26% metadata
records (1226 ) with
more complete
provenance process .
• How and when the data
sources where used
class From_LI_ProcessStep_to_LI_Source
LI_Lineage
+ statement :CharacterString [0..1]
+ scope :DQ_Scope [0..*]
constraints
{"source" role is mandatory if LI_Lineage.statement
and "processStep" role are not documented}
Metadata Information::MD_Metadata
+resourceLineage0..*
www.geoviqua.org
sources where used
• If quality indicators are
not provided for the
dataset, we can
deduce which sources
have more influence in
the quality of the final
result
LI_Source
+ description :CharacterString [0..1]
+ sourceSpatialResolution :MD_Resolution [0..1]
+ sourceReferenceSystem :MD_ReferenceSystem [0..1]
+ sourceCitation :CI_Citation [0..1]
+ sourceMetadata :CI_Citation [0..*]
+ scope :DQ_Scope [0..*]
constraints
{"description" is mandatory if "scope" is not documented}
{"scope" is mandatory if "description" is not documented}
LI_ProcessStep
+ description :CharacterString
+ rationale :CharacterString [0..1]
+ stepDateTime :TM_Primitive [0..*]
+ processor :CI_ResponsiblePartyInfo [0..*]
+ reference :CI_Citation [0..*]
+ scope :DQ_Scope [0..*]
and "processStep" role are not documented}
{"processStep" role is mandatory if
LI_Lineage.statement and "source" role are not
documented}
+processStep 0..*
+source
0..*
23. Complete provenance in ISO19115-2
• LI_ProcessStep includes a
LE_Processing that has a
runTimeParameters attribute that
allows us describing the exact list of
parameters used in the execution.
• There is a citation of the algorithm
used (LI_Algorithm).
class From_LE_ProcessStep_to_LE_Source
LI_Source
+ description :CharacterString [0..1]
+ sourceSpatialResolution :MD_Resolution [0..1]
+ sourceReferenceSystem :MD_ReferenceSystem [0..1]
+ sourceCitation :CI_Citation [0..1]
+ sourceMetadata :CI_Citation [0..*]
+ scope :DQ_Scope [0..*]
constraints
{"description" is mandatory if "scope" is not documented}
{"scope" is mandatory if "description" is not documented}
LI_ProcessStep
+ description :CharacterString
+ rationale :CharacterString [0..1]
+ stepDateTime :TM_Primitive [0..*]
+ processor :CI_ResponsiblePartyInfo [0..*]
+ reference :CI_Citation [0..*]
+ scope :DQ_Scope [0..*]
LI_Lineage
+ statement :CharacterString [0..1]
+ scope :DQ_Scope [0..*]
constraints
{"source" role is mandatory if LI_Lineage.statement
and "processStep" role are not documented}
{"processStep" role is mandatory if
LI_Lineage.statement and "source" role are not
documented}
Metadata Information::MD_Metadata
+processStep 0..*
+resourceLineage
0..*
www.geoviqua.org
used (LI_Algorithm).
• All these extensions were done for
the benefit of the EO gridded data,
but there are not in the
Clearinghouse.
• We can completely evaluate the
quality of the resulting product if we
know the uncertainties that sources
have in their metadata
(sourceMetadata citation in
LI_Source).
From ISO 19115-2:2009 shown for informative purposes only
Data quality information -
Imagery::
LE_ProcessStep
Data quality information - Imagery::
LE_ProcessStepReport
+ name :CharacterString
+ description :CharacterString [0..1]
+ fileType :CharacterString [0..1]
If "LE_NominalResolution.scanningResolution" is used
then "LE_Source.scaleDenominator" is required
Data quality information - Imagery::
LE_Source
+ processedLevel :MD_Identifier [0..1]
+ resolution :LE_NominalResolution [0..1]
Data quality information - Imagery::
LE_Processing
+ identifier :MD_Identifier
+ softwareReference :CI_Citation [0..1]
+ procedureDescription :CharacterString [0..1]
+ documentation :CI_Citation [0..*]
+ runTimeParameters :CharacterString [0..1]
Data quality information -
Imagery::LE_Algorithm
+ citation :CI_Citation
+ description :CharacterString
«Union»
Data quality information - Imagery::
LE_NominalResolution
+ scanningResolution :Distance
+ groundResolution :Distance
"description" is mandatory if
"sourceExtent" is not documented
"sourceExtent" is mandatory if
"description" is not documented
+report 0..*
+output
0..*
+processingInformation0..1
+algorithm 0..*
24. 3. Usage - User feedback
www.geoviqua.org
• There is one small entry for user
feedback in the current ISO-19115:
• MD_Usage
– Brief description of ways in which the
resource is currently or has been used
25. • There are 1.2% (1133) entries
– SpecificUsage and
– UserContactInfo, only
• All made by the same institution!:
MD_Usage - User feedback
www.geoviqua.org
– Landesvermessung und Geobasisinformation
Brandenburg (LGB)
– Tel +49-331-8844-123, Fax. +49-331-8844-16123
– Heinrich-Mann-Allee 103, Potsdam, Brandenburg 14473,
Deutschland
– kundenservice@geobasis-bb.de
– http://www.geobasis-bb.de
26. Conclusions
• There are many different kinds of quality indicators
– There is a lack of a complete description of values provided (no units, missing
measure name, missing evaluation method)
• Quality coverage results (by pixel) are almost inexistent and the the link is
not there
• Lineage information is rich in many records, some with more that 100
entries in source or ProcessSteps
www.geoviqua.org
entries in source or ProcessSteps
• We have usage examples -> Feedback
• Current data is enough to demonstrate search and visualization with
some limitations. Good for GeoViQua.
• Next steps:
– Assess the Quality of Quality Metadata?
– Extend this analysis to other capacity catalogues integrated in the EuroGEOSS
Broker