Presentation as held at the "Workshop on Knowledge Evolution and Ontology Dynamics" co-located with ISWC 2011. Related to the paper http://ceur-ws.org/Vol-784/evodyn1.pdf
Preserving the Inputs and Outputs of Scholarshiptsbbbu
Tim Babbitt discusses the changing context of research and scholarship due to digitization and the internet. The inputs and outputs of research are increasingly digital and complex, including data, code, presentations, and more. ProQuest has a history of preserving scholarship through microfilming and is exploring how to preserve the full range of digital scholarly outputs and their linkages in a sustainable way. Key questions include balancing new and old preservation methods and moving beyond preserving individual objects to also preserving networks and linkages between scholarly works.
Este documento describe los proyectos y actividades realizados por el Centro de Prácticas Profesionales del Consejo de Protección del Municipio Bolivariano Libertador. Se detallan tres proyectos implementados en el período 2014-2015 para atender las necesidades de niños, niñas y adolescentes. También se proponen nuevas actividades para el equipo de prácticas profesionales, incluyendo el diagnóstico de casos, elaboración de informes y la organización de charlas formativas.
Este documento descreve o plano de avaliação de História para o 8o ano de escolaridade no ano letivo de 2015/16. Inclui oito itens de avaliação sobre o Iluminismo e a governação do Marquês de Pombal no século XVIII, cobrindo conceitos, correspondências, verdadeiro/falso e respostas desenvolvidas. A avaliação terá uma duração de 90 minutos e será cotada em 100%.
Este documento presenta los cálculos estructurales para una casa de habitación. Describe los materiales y métodos de construcción, incluyendo el concreto, acero de refuerzo, cimentación y losas. También describe el análisis de cargas, incluyendo cargas muertas, vivas y accidentales. Finalmente, resume los procedimientos empleados para el diseño estructural y análisis sísmico de acuerdo con los reglamentos de construcción aplicables.
1) O documento apresenta as informações sobre uma prova de História do 8o ano sobre os séculos XVII e XVIII na Europa, incluindo o Iluminismo e a Revolução Industrial.
2) A prova terá 8 questões sobre o Iluminismo, a governação de Marquês de Pombal em Portugal e a Revolução Americana.
3) São fornecidos critérios de classificação como riscado de respostas, uso de mais de uma alternativa em escolha múltipla e a correção da língua portuguesa.
Oportunidades de financiación de la infraestructura a través de los mercados de capital en Ibero America. Una presentación en donde analizamos los fondos de pensiones y la lógica que subyace a la inversión en infraestructura, el régimen económico y los vehículos financieros en América Latina y factores que subyacen a este tipo de inversión, ¿qué es lo que realmente importa?
Preserving the Inputs and Outputs of Scholarshiptsbbbu
Tim Babbitt discusses the changing context of research and scholarship due to digitization and the internet. The inputs and outputs of research are increasingly digital and complex, including data, code, presentations, and more. ProQuest has a history of preserving scholarship through microfilming and is exploring how to preserve the full range of digital scholarly outputs and their linkages in a sustainable way. Key questions include balancing new and old preservation methods and moving beyond preserving individual objects to also preserving networks and linkages between scholarly works.
Este documento describe los proyectos y actividades realizados por el Centro de Prácticas Profesionales del Consejo de Protección del Municipio Bolivariano Libertador. Se detallan tres proyectos implementados en el período 2014-2015 para atender las necesidades de niños, niñas y adolescentes. También se proponen nuevas actividades para el equipo de prácticas profesionales, incluyendo el diagnóstico de casos, elaboración de informes y la organización de charlas formativas.
Este documento descreve o plano de avaliação de História para o 8o ano de escolaridade no ano letivo de 2015/16. Inclui oito itens de avaliação sobre o Iluminismo e a governação do Marquês de Pombal no século XVIII, cobrindo conceitos, correspondências, verdadeiro/falso e respostas desenvolvidas. A avaliação terá uma duração de 90 minutos e será cotada em 100%.
Este documento presenta los cálculos estructurales para una casa de habitación. Describe los materiales y métodos de construcción, incluyendo el concreto, acero de refuerzo, cimentación y losas. También describe el análisis de cargas, incluyendo cargas muertas, vivas y accidentales. Finalmente, resume los procedimientos empleados para el diseño estructural y análisis sísmico de acuerdo con los reglamentos de construcción aplicables.
1) O documento apresenta as informações sobre uma prova de História do 8o ano sobre os séculos XVII e XVIII na Europa, incluindo o Iluminismo e a Revolução Industrial.
2) A prova terá 8 questões sobre o Iluminismo, a governação de Marquês de Pombal em Portugal e a Revolução Americana.
3) São fornecidos critérios de classificação como riscado de respostas, uso de mais de uma alternativa em escolha múltipla e a correção da língua portuguesa.
Oportunidades de financiación de la infraestructura a través de los mercados de capital en Ibero America. Una presentación en donde analizamos los fondos de pensiones y la lógica que subyace a la inversión en infraestructura, el régimen económico y los vehículos financieros en América Latina y factores que subyacen a este tipo de inversión, ¿qué es lo que realmente importa?
This document provides a survey of web usage mining systems and technologies. It discusses the five major functions of a web usage mining system: 1) data gathering through web logs, 2) preparing raw log data, 3) discovering navigation patterns, 4) analyzing and visualizing patterns, and 5) applying patterns. Each function is explained in detail along with related technologies. Major research systems concerning web usage mining are also listed.
Crushing, Blending, and Stretching Transactional DataRay Schwartz
The document discusses extracting transactional data from library systems like the integrated library system (ILS), interlibrary loan (ILL) software, and other vendor services. It describes setting up an application server to store this extracted data in a database for reporting and analysis. The goal is to mine this data to determine which patron groups, like academic majors and departments, are accessing different library services and resources.
Geo-referenced human-activity-data; access, processing and knowledge extractionConor Mc Elhinney
This document discusses methods for accessing, processing, and extracting knowledge from geo-referenced human activity data. It describes challenges in modeling geospatial data from different sources and accessing data through spatial hierarchy models. It also covers processing paradigms for knowledge extraction, including spatial workflow patterns and temporal dynamics in communities from data sources like tweets. Visualization and interaction techniques are discussed, including moving toward 3D web-based visualization using technologies like WebGL. Feature extraction from data is highlighted as informing risk assessment knowledge.
The document discusses using usage analysis to improve ontology engineering. It describes analyzing query logs over datasets like DBpedia to identify frequently queried triples and patterns. This can reveal missing or inconsistent data and suggest new links between entities. The analysis helps increase data quality and acquire new knowledge that benefits both the dataset and Web of Data as a whole. While complete automation may not be needed, supporting usage analysis and endpoint access allows publishers to play a role in maintaining datasets and the Web of Data.
This document proposes using feedback and k-means clustering to refine web data. The k-means clustering algorithm is used to initially cluster web usage data. Then, a genetic algorithm is applied to the clusters to improve their quality based on feedback from users on the usefulness of different web pages. This combined approach of initial k-means clustering followed by genetic algorithm refinement aims to better organize web data according to user preferences and eliminate unwanted websites.
The document discusses agINFRA, a multilingual infrastructure project that aims to improve access to information on agricultural innovation. It notes the growing and distributed nature of agricultural data and the need for linking vocabularies and open data to enable value-added services, knowledge inference, and collaboration. The agINFRA project is working to develop vocabulary servers, routing maps, data production tools, data rendering tools, and cloud storage to help organize and link agricultural data across languages and systems.
Johannes Keizer from FAO presented on agINFRA, a multilingual infrastructure for information on agricultural innovation. The project aims to create a linked open data environment to improve access to growing information on agriculture. This will be done by developing vocabularies, tools for data production and rendering, and a routing map (RING) to connect data sources. The goal is to enable real-time sharing of research data across languages and better organize collaboration in agriculture.
Exploratory Search upon Semantically Described Web Data Sources: Service regi...Marco Brambilla
This presentation was given at the SSW workshop, collocated with VLDB 2012.
Exploratory search applications upon structured Web content is becoming one of the main information seeking paradigms for users. This is mainly due to the move towards mobile and pervasive Web access and to the more and more tight intertwining between everyday life and information seeking.
Structured data is typically distributed on the Web and accessible through a service-oriented paradigm. This paper proposes a vision on: (1) a semantically-enabled service registration framework for describing in a Web data services in a convenient way; and (2) a design method for applications that exploit such model using a design pattern -based method.
In this video from the ISC Big Data'14 Conference, Ted Willke from Intel presents: The Analytics Frontier of the Hadoop Eco-System.
"The Hadoop MapReduce framework grew out of an effort to make it easy to express and parallelize simple computations that were routinely performed at Google. It wasn’t long before libraries, like Apache Mahout, were developed to enable matrix factorization, clustering, regression, and other more complex analyses on Hadoop. Now, many of these libraries and their workloads are migrating to Apache Spark because it supports a wider class of applications than MapReduce and is more appropriate for iterative algorithms, interactive processing, and streaming applications. What’s next beyond Spark? Where is big data analytics processing headed? How will data scientists program these systems? In this talk, we will explore the current analytics frontier, the popular debates, and discuss some potentially clever additions. We will also share the emergent data science applications and collaborative university research that inform our thinking."
Learn more:
http://www.isc-events.com/bigdata14/schedule.html
and
http://www.intel.com/content/www/us/en/software/intel-graph-solutions.html
Watch the video presentation: https://www.youtube.com/watch?v=qlfx495Ekw0
1. Find all frequent itemsets of length 1 by scanning the database to count item occurrences.
2. Iteratively generate candidate itemsets of length k from frequent itemsets of length k-1, and prune unpromising candidates using the Apriori property.
3. Scan the database to determine truly frequent itemsets.
4. Generate association rules from frequent itemsets by adding items to the antecedent and consequent of rules if they meet minimum confidence.
Actions speak louder than words: Analyzing large-scale query logs to improve ...Raman Chandrasekar
Analyzing anonymized query and click through logs leads to a better understanding of user behaviors and intentions and provides great opportunities to respond to users with an improved search experience. A large-scale provider of SaaS services, Serials Solutions is uniquely positioned to learn from the dataset of queries aggregated from the Summon service generated by millions of users at hundreds of libraries around the world.
In this session, we will describe our Relevance Metrics Framework and provide examples of insights gained during its development and implementation. We will also cover recent product changes inspired by these insights. Chandra and Susan, from the Summon dev team, will share insights and outcomes from this ongoing process and highlight how analysis of large-scale query logs helps improve the academic research experience.
Siddhi: A Second Look at Complex Event Processing ImplementationsSrinath Perera
Today there are so much data being available from sources like sensors (RFIDs, Near Field Communication), web activities, transactions, social networks, etc. Making sense of this avalanche of data requires efficient and fast processing.
Processing of high volume of events to derive higher-level information is a vital part of taking critical decisions, and
Complex Event Processing (CEP) has become one of the most rapidly emerging fields in data processing. e-Science
use-cases, business applications, financial trading applications, operational analytics applications and business activity monitoring applications are some use-cases that directly use CEP. This paper discusses different design decisions associated
with CEP Engines, and proposes some approaches to improve CEP performance by using more stream processing
style pipelines. Furthermore, the paper will discuss Siddhi, a CEP Engine that implements those suggestions. We
present a performance study that exhibits that the resulting CEP Engine—Siddhi—has significantly improved performance.
Primary contributions of this paper are performing a critical analysis of the CEP Engine design and identifying
suggestions for improvements, implementing those improvements
through Siddhi, and demonstrating the soundness of those suggestions through empirical evidence.
The document describes the Social Informatics Data Grid (SIDGrid), which aims to:
1) Integrate heterogeneous datasets over time, place, and type through a shared data and service interface and common problems/theories.
2) Develop tools for collecting, storing, retrieving, annotating, and analyzing synchronized multi-modal data on computational grids.
3) The SIDGrid architecture allows streaming of video, audio and time series data across distributed datasets using time alignment, database, and grid computing standards. It provides search and analysis tools to browse over 4,000 projects containing various media files.
This document discusses the challenges of collecting, storing, and analyzing large volumes of internet measurement data. It examines issues such as distributed and resilient data collection, handling multi-timescale and heterogeneous data from various sources, and developing standardized tools and formats. The paper proposes the "datapository" - an internet data repository designed to address these challenges through a collaborative framework for data sharing, storage, and analysis tools. The goal is to help both network operators and researchers more effectively harness the wealth of data available.
IOUG93 - Technical Architecture for the Data Warehouse - PresentationDavid Walker
The document outlines a technical architecture for implementing a data warehouse. It discusses business analysis, database schema design, project management, data acquisition, building a transaction repository, data aggregation, data marts, metadata and security, middleware and presentation layers. The goal is to help users find the information they need from the data warehouse. Contact information is provided at the end.
Life Science Database Cross Search and MetadataMaori Ito
Life science databases are sometimes difficult to understand due to lack of information. I'd like to add metadata into databases and improve search results.
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
Tutorial given at Informatics for HEalth 2017 COnference These slides are for the second part of the tutorial describing provenance capture and management tools.
Software Analytics: Data Analytics for Software EngineeringTao Xie
This document summarizes a presentation on software analytics and its achievements and opportunities. It begins by noting how both how software and how it is built and operated are changing, with data becoming more pervasive and development more distributed. It then defines software analytics as enabling analysis of software data to obtain insights and make informed decisions. It outlines research topics covering different areas of the software domain throughout the development cycle. It describes target audiences of software practitioners and outputs of insightful and actionable information. Selected projects demonstrating software analytics are then summarized, including StackMine for performance debugging at scale, XIAO for scalable code clone analysis, and others.
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
The tremendous growth in digital data has led to an increase in metadata initiatives for different types of scientific data, as evident in Ball’s survey (2009). Although individual communities have specific needs, there are shared goals that need to be recognized if systems are to effectively support data sharing within and across all domains. This paper considers this need, and explores systems requirements that are essential for metadata supporting the discovery and management of scientific data. The paper begins with an introduction and a review of selected research specific to metadata modeling in the sciences. Next, the paper’s goals are stated, followed by the presentation of valuable systems requirements. The results include a base-model with three chief principles: principle of least effort, infrastructure service, and portability. The principles are intended to support “data user” tasks. Results also include a set of defined user tasks and functions, and applications scenarios.
Not re-decentralizing the Web is not only a missed opportunity, it is irrespo...Markus Luczak-Rösch
Slides of a public "Spotlight Lecture" given at Victoria University of Wellington on Tuesday, 17th April 2018. The purpose of the lecture was to inform the general public and policy makers about the recent facebook-Cambrige Analytica case and discuss possible ways out of the dilemma where large data monopolies accumulate and sell personal data at scale.
Analysing literature through the lens of information theory and network scienceMarkus Luczak-Rösch
This document introduces a new tool for interactively analyzing literature that constructs temporally ordered networks of character occurrences. It uses natural language processing to automatically detect characters and relationships in texts. Researchers used the tool to produce annotated graphs of novels that identified important narrative episodes and sequences. Their annotations provided insights into how digital tools can cultivate new collaborative interpretive practices for literary analysis.
More Related Content
Similar to Statistical Analysis of Web of Data Usage
This document provides a survey of web usage mining systems and technologies. It discusses the five major functions of a web usage mining system: 1) data gathering through web logs, 2) preparing raw log data, 3) discovering navigation patterns, 4) analyzing and visualizing patterns, and 5) applying patterns. Each function is explained in detail along with related technologies. Major research systems concerning web usage mining are also listed.
Crushing, Blending, and Stretching Transactional DataRay Schwartz
The document discusses extracting transactional data from library systems like the integrated library system (ILS), interlibrary loan (ILL) software, and other vendor services. It describes setting up an application server to store this extracted data in a database for reporting and analysis. The goal is to mine this data to determine which patron groups, like academic majors and departments, are accessing different library services and resources.
Geo-referenced human-activity-data; access, processing and knowledge extractionConor Mc Elhinney
This document discusses methods for accessing, processing, and extracting knowledge from geo-referenced human activity data. It describes challenges in modeling geospatial data from different sources and accessing data through spatial hierarchy models. It also covers processing paradigms for knowledge extraction, including spatial workflow patterns and temporal dynamics in communities from data sources like tweets. Visualization and interaction techniques are discussed, including moving toward 3D web-based visualization using technologies like WebGL. Feature extraction from data is highlighted as informing risk assessment knowledge.
The document discusses using usage analysis to improve ontology engineering. It describes analyzing query logs over datasets like DBpedia to identify frequently queried triples and patterns. This can reveal missing or inconsistent data and suggest new links between entities. The analysis helps increase data quality and acquire new knowledge that benefits both the dataset and Web of Data as a whole. While complete automation may not be needed, supporting usage analysis and endpoint access allows publishers to play a role in maintaining datasets and the Web of Data.
This document proposes using feedback and k-means clustering to refine web data. The k-means clustering algorithm is used to initially cluster web usage data. Then, a genetic algorithm is applied to the clusters to improve their quality based on feedback from users on the usefulness of different web pages. This combined approach of initial k-means clustering followed by genetic algorithm refinement aims to better organize web data according to user preferences and eliminate unwanted websites.
The document discusses agINFRA, a multilingual infrastructure project that aims to improve access to information on agricultural innovation. It notes the growing and distributed nature of agricultural data and the need for linking vocabularies and open data to enable value-added services, knowledge inference, and collaboration. The agINFRA project is working to develop vocabulary servers, routing maps, data production tools, data rendering tools, and cloud storage to help organize and link agricultural data across languages and systems.
Johannes Keizer from FAO presented on agINFRA, a multilingual infrastructure for information on agricultural innovation. The project aims to create a linked open data environment to improve access to growing information on agriculture. This will be done by developing vocabularies, tools for data production and rendering, and a routing map (RING) to connect data sources. The goal is to enable real-time sharing of research data across languages and better organize collaboration in agriculture.
Exploratory Search upon Semantically Described Web Data Sources: Service regi...Marco Brambilla
This presentation was given at the SSW workshop, collocated with VLDB 2012.
Exploratory search applications upon structured Web content is becoming one of the main information seeking paradigms for users. This is mainly due to the move towards mobile and pervasive Web access and to the more and more tight intertwining between everyday life and information seeking.
Structured data is typically distributed on the Web and accessible through a service-oriented paradigm. This paper proposes a vision on: (1) a semantically-enabled service registration framework for describing in a Web data services in a convenient way; and (2) a design method for applications that exploit such model using a design pattern -based method.
In this video from the ISC Big Data'14 Conference, Ted Willke from Intel presents: The Analytics Frontier of the Hadoop Eco-System.
"The Hadoop MapReduce framework grew out of an effort to make it easy to express and parallelize simple computations that were routinely performed at Google. It wasn’t long before libraries, like Apache Mahout, were developed to enable matrix factorization, clustering, regression, and other more complex analyses on Hadoop. Now, many of these libraries and their workloads are migrating to Apache Spark because it supports a wider class of applications than MapReduce and is more appropriate for iterative algorithms, interactive processing, and streaming applications. What’s next beyond Spark? Where is big data analytics processing headed? How will data scientists program these systems? In this talk, we will explore the current analytics frontier, the popular debates, and discuss some potentially clever additions. We will also share the emergent data science applications and collaborative university research that inform our thinking."
Learn more:
http://www.isc-events.com/bigdata14/schedule.html
and
http://www.intel.com/content/www/us/en/software/intel-graph-solutions.html
Watch the video presentation: https://www.youtube.com/watch?v=qlfx495Ekw0
1. Find all frequent itemsets of length 1 by scanning the database to count item occurrences.
2. Iteratively generate candidate itemsets of length k from frequent itemsets of length k-1, and prune unpromising candidates using the Apriori property.
3. Scan the database to determine truly frequent itemsets.
4. Generate association rules from frequent itemsets by adding items to the antecedent and consequent of rules if they meet minimum confidence.
Actions speak louder than words: Analyzing large-scale query logs to improve ...Raman Chandrasekar
Analyzing anonymized query and click through logs leads to a better understanding of user behaviors and intentions and provides great opportunities to respond to users with an improved search experience. A large-scale provider of SaaS services, Serials Solutions is uniquely positioned to learn from the dataset of queries aggregated from the Summon service generated by millions of users at hundreds of libraries around the world.
In this session, we will describe our Relevance Metrics Framework and provide examples of insights gained during its development and implementation. We will also cover recent product changes inspired by these insights. Chandra and Susan, from the Summon dev team, will share insights and outcomes from this ongoing process and highlight how analysis of large-scale query logs helps improve the academic research experience.
Siddhi: A Second Look at Complex Event Processing ImplementationsSrinath Perera
Today there are so much data being available from sources like sensors (RFIDs, Near Field Communication), web activities, transactions, social networks, etc. Making sense of this avalanche of data requires efficient and fast processing.
Processing of high volume of events to derive higher-level information is a vital part of taking critical decisions, and
Complex Event Processing (CEP) has become one of the most rapidly emerging fields in data processing. e-Science
use-cases, business applications, financial trading applications, operational analytics applications and business activity monitoring applications are some use-cases that directly use CEP. This paper discusses different design decisions associated
with CEP Engines, and proposes some approaches to improve CEP performance by using more stream processing
style pipelines. Furthermore, the paper will discuss Siddhi, a CEP Engine that implements those suggestions. We
present a performance study that exhibits that the resulting CEP Engine—Siddhi—has significantly improved performance.
Primary contributions of this paper are performing a critical analysis of the CEP Engine design and identifying
suggestions for improvements, implementing those improvements
through Siddhi, and demonstrating the soundness of those suggestions through empirical evidence.
The document describes the Social Informatics Data Grid (SIDGrid), which aims to:
1) Integrate heterogeneous datasets over time, place, and type through a shared data and service interface and common problems/theories.
2) Develop tools for collecting, storing, retrieving, annotating, and analyzing synchronized multi-modal data on computational grids.
3) The SIDGrid architecture allows streaming of video, audio and time series data across distributed datasets using time alignment, database, and grid computing standards. It provides search and analysis tools to browse over 4,000 projects containing various media files.
This document discusses the challenges of collecting, storing, and analyzing large volumes of internet measurement data. It examines issues such as distributed and resilient data collection, handling multi-timescale and heterogeneous data from various sources, and developing standardized tools and formats. The paper proposes the "datapository" - an internet data repository designed to address these challenges through a collaborative framework for data sharing, storage, and analysis tools. The goal is to help both network operators and researchers more effectively harness the wealth of data available.
IOUG93 - Technical Architecture for the Data Warehouse - PresentationDavid Walker
The document outlines a technical architecture for implementing a data warehouse. It discusses business analysis, database schema design, project management, data acquisition, building a transaction repository, data aggregation, data marts, metadata and security, middleware and presentation layers. The goal is to help users find the information they need from the data warehouse. Contact information is provided at the end.
Life Science Database Cross Search and MetadataMaori Ito
Life science databases are sometimes difficult to understand due to lack of information. I'd like to add metadata into databases and improve search results.
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
Tutorial given at Informatics for HEalth 2017 COnference These slides are for the second part of the tutorial describing provenance capture and management tools.
Software Analytics: Data Analytics for Software EngineeringTao Xie
This document summarizes a presentation on software analytics and its achievements and opportunities. It begins by noting how both how software and how it is built and operated are changing, with data becoming more pervasive and development more distributed. It then defines software analytics as enabling analysis of software data to obtain insights and make informed decisions. It outlines research topics covering different areas of the software domain throughout the development cycle. It describes target audiences of software practitioners and outputs of insightful and actionable information. Selected projects demonstrating software analytics are then summarized, including StackMine for performance debugging at scale, XIAO for scalable code clone analysis, and others.
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
The tremendous growth in digital data has led to an increase in metadata initiatives for different types of scientific data, as evident in Ball’s survey (2009). Although individual communities have specific needs, there are shared goals that need to be recognized if systems are to effectively support data sharing within and across all domains. This paper considers this need, and explores systems requirements that are essential for metadata supporting the discovery and management of scientific data. The paper begins with an introduction and a review of selected research specific to metadata modeling in the sciences. Next, the paper’s goals are stated, followed by the presentation of valuable systems requirements. The results include a base-model with three chief principles: principle of least effort, infrastructure service, and portability. The principles are intended to support “data user” tasks. Results also include a set of defined user tasks and functions, and applications scenarios.
Similar to Statistical Analysis of Web of Data Usage (20)
Not re-decentralizing the Web is not only a missed opportunity, it is irrespo...Markus Luczak-Rösch
Slides of a public "Spotlight Lecture" given at Victoria University of Wellington on Tuesday, 17th April 2018. The purpose of the lecture was to inform the general public and policy makers about the recent facebook-Cambrige Analytica case and discuss possible ways out of the dilemma where large data monopolies accumulate and sell personal data at scale.
Analysing literature through the lens of information theory and network scienceMarkus Luczak-Rösch
This document introduces a new tool for interactively analyzing literature that constructs temporally ordered networks of character occurrences. It uses natural language processing to automatically detect characters and relationships in texts. Researchers used the tool to produce annotated graphs of novels that identified important narrative episodes and sequences. Their annotations provided insights into how digital tools can cultivate new collaborative interpretive practices for literary analysis.
A talk given at the annual Computer Science for High School Teachers event at Victoria University of Wellington. I presented on some basics of the World Wide Web and why it's worth to preserve it, our work on non-expert tools to populate semantically enriched content, a current project to identify NZ native birds based on their calls that involves citizen science and contemporary deep learning using TensorFlow, a project that investigates the impact of online citizen science on the development of science capabilities of primary school children, and my collaboration with Adam Grener from the School of English, Film, Theater and Media Studies at VUW with whom I am working on computational tools for the literature studies.
Overview of how data on the Web of Data can be consumed (first and foremost Linked Data) and implications for the development of usage mining approaches.
References:
Elbedweihy, K., Mazumdar, S., Cano, A. E., Wrigley, S. N., & Ciravegna, F. (2011). Identifying Information Needs by Modelling Collective Query Patterns. COLD, 782.
Elbedweihy, K., Wrigley, S. N., & Ciravegna, F. (2012). Improving Semantic Search Using Query Log Analysis. Interacting with Linked Data (ILD 2012), 61.
Raghuveer, A. (2012). Characterizing machine agent behavior through SPARQL query mining. In Proceedings of the International Workshop on Usage Analysis and the Web of Data, Lyon, France.
Arias, M., Fernández, J. D., Martínez-Prieto, M. A., & de la Fuente, P. (2011). An empirical study of real-world SPARQL queries. arXiv preprint arXiv:1103.5043.
Hartig, O., Bizer, C., & Freytag, J. C. (2009). Executing SPARQL queries over the web of linked data (pp. 293-309). Springer Berlin Heidelberg.
Verborgh, R., Hartig, O., De Meester, B., Haesendonck, G., De Vocht, L., Vander Sande, M., ... & Van de Walle, R. (2014). Querying datasets on the web with high availability. In The Semantic Web–ISWC 2014 (pp. 180-196). Springer International Publishing.
Verborgh, R., Vander Sande, M., Colpaert, P., Coppens, S., Mannens, E., & Van de Walle, R. (2014, April). Web-Scale Querying through Linked Data Fragments. In LDOW.
Luczak-Rösch, M., & Bischoff, M. (2011). Statistical analysis of web of data usage. In Joint Workshop on Knowledge Evolution and Ontology Dynamics (EvoDyn2011), CEUR WS.
Luczak-Rösch, M. (2014). Usage-dependent maintenance of structured Web data sets (Doctoral dissertation, Freie Universität Berlin, Germany), http://edocs.fu-berlin.de/diss/receive/FUDISS_thesis_000000096138.
The document discusses using information cascades to analyze relationships between online resources over time. It proposes a model of "transcendental information cascades" to represent how content identifiers are shared across resources. The model defines cascades as networks of nodes representing resources linked by shared identifiers. Applying different matching functions to identify patterns in content can generate different cascade representations from the same data. An example visualization of a hashtag-based cascade is shown. The properties of cascades, such as their size and entropy of identifiers used, can provide insights into the system's behavior.
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...Markus Luczak-Rösch
Invited talk given at the QUEST (Qualitative Experise at Southampton, http://www.quest.soton.ac.uk/) group event (http://www.quest.soton.ac.uk/training/) on Qualitative Methods and Big Data.
Context-free data analysis with Transcendental Information Cascades.Markus Luczak-Rösch
In order to discover hidden relationships and patterns in data streams from multiple heterogenous sources, we work on a method for exploratory data analysis. We disregard any system-specific context to generate generic networks of information co-occurrence. These networks allow for more informed sampling and filtering. Case specific context can be added once these networks have been created to support accurate decision making.
From coincidence to purposeful flow? Properties of transcendental information...Markus Luczak-Rösch
Invited talk given on 12-06-2015 at the University of Oxford, Oxford e-Research Centre.
The talk introduces our notion of socio-technical computation as the implicit purposeful collective action of human collectives on the Web and transcendental information cascades as a means to capture this.
Relevant references:
[1] Luczak-Roesch, M., Tinati, R., Simperl, E., Van Kleek, M., Shadbolt, N., & Simpson, R. (2014). Why won't aliens talk to us? Content and community dynamics in online citizen science. Proceedings of the Eighth AAAI Conference on Weblogs and Social Media, {ICWSM} 2014, Ann Arbor, Michigan, USA, June 1-4, 2014.
[2] Markus Luczak-Roesch, Ramine Tinati, Kieron O'Hara, and Nigel Shadbolt. 2015. Socio-technical Computation. In Proceedings of the 18th ACM Conference Companion on Computer Supported Cooperative Work & Social Computing (CSCW'15 Companion). ACM, New York, NY, USA, 139-142. http://doi.acm.org/10.1145/2685553.2698991
[3] Markus Luczak-Roesch, Ramine Tinati, and Nigel Shadbolt. 2015. When Resources Collide: Towards a Theory of Coincidence in Information Spaces. To appear in WWW’15 Companion, May 18–22, 2015, Florence, Italy. http://dx.doi.org/10.1145/2740908.2743973
When resources collide: Towards a theory of coincidence in information spaces...Markus Luczak-Rösch
We work on a theory to facilitate data analysis for discovery of meaningful collective information sharing patterns in very large data streams. By focusing only on coincidence of information occurrence, we can capture and analyse emergent collective action across system boundaries and independent from social network contexts.
References:
Markus Luczak-Roesch, Ramine Tinati, Kieron O'Hara, and Nigel Shadbolt. 2015. Socio-technical Computation. In Proceedings of the 18th ACM Conference Companion on Computer Supported Cooperative Work & Social Computing (CSCW'15 Companion). ACM, New York, NY, USA, 139-142. http://doi.acm.org/10.1145/2685553.2698991
Markus Luczak-Roesch, Ramine Tinati, and Nigel Shadbolt. 2015. When Resources Collide: Towards a Theory of Coincidence in Information Spaces. To appear in WWW’15 Companion, May 18–22, 2015, Florence, Italy. http://dx.doi.org/10.1145/2740908.2743973
Talk given at the Southampton Data Analysis Workshop (18.-19. September 2014) about the analysis of the Zooniverse citizen science platform and WikiProjects as a subset of Wikipedia.
- The document analyzes data from multiple citizen science projects on the Zooniverse platform to understand how participants interact and contribute.
- It finds that while most participants only classify images, around 1% contribute a large portion (72%) of discussions in project talk pages.
- The language used in talk pages evolves slightly over time within each project as domain-specific vocabulary emerges, but classification vocabulary remains stable.
loomp is an RDFa editor and a content model, which degenerates a conventional document into atomic parts that represent the smallest entities an author wants to delimit (paragraphs, sentences, words,...). In the end a document consists of a mashup of several semantically annotated elements (by now only text elements). Elements can appear in multiple documents. The whole content network is stored as an RDF graph in a triple store. Thus loomp creates one content network that consists of documents which are interlinked because of the elements they share and a second content network which is based on annotations that are shared by elements or documents. The RDF graph may also be served separately via a Linked Data endpoint.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
1. Statistical Analysis of Web of
Data Usage
Towards (Visual) Maintenance
Support for Dataset Publishers
Markus Luczak-Rösch, Markus Bischoff
Freie Universität Berlin, Networked Information Systems (www.ag-nbi.de)
2. Who is addressed?
• rather small/simple ontologies
– min. effort for OE
– “under-engineered”
• unknown user requirements
3. We propose: A Usage-dependent Life
Cycle
Requests and
• RDB2RDF Queries • Re-engineering
• Crawling & • SELECT * WHERE ?t • Re-population
transformation a:madeOf a:Plastic •…
•… • SELECT * WHERE ?t
b:madeOf b:Wood
Negotiate
Initial Release
understanding
USAGE
4. (Very) Quick Example
• Out of which
instruments consists
The Beatles band?
• Are the Beatles a “Big Band”?
• What are “british” bands?
5.
6. • Is it what the user expected
to see?
• Did you know that
this happens and
do you know what
to do now?
7. Survey covering approx.
25% of all cloud datasets
• size
• complexity
• engineering methodology
• …
Publishers of most of the dataset do not
have any (structured) idea how to maintain
their data. Survey ran in October 2010, not yet
published officially
8. Role of the dataset publisher
(more general)
Effort Distribution between Publisher and Consumer
• use common
vocabularies
• provide RDF
Consumer generates/
links to other data mines links
resources Effort
• provide Distribution
schema Publisher provides Links as
links
mappings hints
Christian Bizer: Pay-as-you-go Data Integration (21/9/2010)
Source: Talk of Chris Bizer
9. Role of the dataset publisher
(more specific)*
• Reliability Is the data valid and complete?
• Peak-load Temporal profiles of important data?
• Performance Are caches and indexes optimal?
• Usefulness What do people find and use frequently?
• Attacks Is the data threatened by spam?
* w.r.t. Möller et al.: Learning from Linked
Open Data Usage: Patterns & Metrics.
11. How do people access resources on the Web of Data?
xxx.xxx.xxx.xxx - - [21/Sep/2009:00:00:00 -0600]
"GET /page/Jeroen_Simaeys HTTP/1.1"
200 26777 "" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
xxx.xxx.xxx.xxx - - [21/Sep/2009:00:00:00 -0600]
"GET /resource/Guano_Apes HTTP/1.1"
303 0 "" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
xxx.xxx.xxx.xxx - - [21/Sep/2009:00:00:01 -0600]
"GET /sparql?query=PREFIX+rdfs%3A+%...“
200 1844 "" ""
What do they get?
• RDF-Graphs
• SPARQL Query Results XML Format
• …, HTML, JSON, … serialization of results
• …, HTML, JSON, … serialization of no results
204 would be great but for now the usage
mining process should respect this
12. Adapted from Myra Spilipoulou: “Web usage mining for
Web site evaluation”, 2000, Commun. ACM
Log
File Result Patterns
Instructions
Visualization Tool
Preparation Tool
Mining Query
Mining Results
Access Methods and Patterns
Navigation
Patterns
Queries Patterns Triples Filters Sessions
and Statistics
Sequences
Usage Mining
Methods
Prepared Log Data
Preparation Phase Mining Phase
14. Usage Analysis
• queries
• patterns
• triples
• primitives
ns1:A
rdf:type
Reference for details: M. Luczak-Rösch and H. Mühleisen,
ns2:B "Log File Analysis for Web of Data Endpoints ," in Proc. of
the 8th Extended Semantic Web Conference (ESWC)
Poster-Session, 2011.
15. Metrics
• Ontology heat map • Resource usage
– the amount a class or – triple combinations in
a predicate is used in which a resource is
queries used
• Primitive usage
– position in triples
– triple combinations
16. Metrics
• Time statistics • Error statistics
– hourly accesses – triple patterns that
contradict the schema
but succeeded
• Hosts statistics – triples patterns that
– hourly accesses per fail due to the
host modelling
– primitives and triple
patterns requested by
host
17. Visualizations
network
• weighted nodes overview
and edges
(depending on
the applied
metric) represent
the amount of
usage zoom in and see
details
18. Evaluation Dataset
• Dbpedia 3.3 log files
– 1.700.000 requests from two randomly chosen
days (07/2009)
– analysis against a mirror of the 3.3 dataset
(inconsistent dataset)
– performance issues of dynamic network
visualization and reprocessing of queries
limited number of analyzed logs
24. Inconsitencies & Weaknesses
• ns:Band ns:instrument ?x
inconsistent • ns:Band ns:genre ?y
data
• ns:Band ns:associatedBand ?z
• ns:Band ns:knownFor ?x missing facts
• ns:Band ns:nationality ?y
•…
Complete analysis can be found at http://page.mi.fu-berlin.de/mluczak/pub/visual-analysis-of-web-of-data-usage-dbpedia33/
25. What to learn from usage analysis?
• ontology maintenance
– schema evolution
– instance population
– ontology modularization
– error detection
Image source http://mrg.bz/GgaxPB
26. What else to learn?
• performance scaling
– index generation
– store architecture based on frequent SPARQL
patterns
– hardware scaling at peak times
– modularization of data for different hosts
27. This is ok for the beginning but…
… SONIVIS can do more
evaluate (with users!) various network visualizations
and find the best one for specific context
28. More for the Future
• Generic patterns for the metrics
+ resolution/evolution patterns
• Common sense of statistics
+ Quality-of-dataset index
Central conclusion:
• Temporal analysis Calculate statistics,
• Network metrics (degree,…) weaknesses and
inconsistencies first and
• Visualize the effects of change do visual editing
afterwards!
Image source: http://mrg.bz/8Co9lA
29. • usage-dependent life cycle support for
LOD vocabularies and the populated
instances
T A • (visual) usage analysis can help to plan
and perform maintenance activities
• this is a benefit for the dataset publisher
a w and the Web of data as a whole
k a
e y
Markus Luczak-Rösch (luczak@inf.fu-berlin.de)
Freie Universität Berlin, Networked Information Systems (www.ag-nbi.de) Image source: http://mrg.bz/jlObbL
Editor's Notes
This is not an approach for all kind of domains but within LOD we find characteristic ontologies and vocabulariesdataset hosts do not know the requirements of the dataset users necessarily
round about 25 per cent of alldatsets were covered by the survey.that relates to the absolute number of datsets and not the amount of triples servedsome of the bigger ones replied such as dbpedia and bio2rdf