The New York Times is the largest metropolitan and the third largest newspaper in the United States. The Times website, nytimes.com, is ranked as the most
popular newspaper website in the United States and is an important source of advertisement revenue for the company. The NYT has a rich history for curation of its articles and its 100 year old curated repository has ultimately defined its participation as one of the first players in the emergingWeb of Data.
Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance.
E. Curry, A. Freitas, and S. O’Riáin, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25-47.
Wikipedia (DBpedia): Crowdsourced Data CurationEdward Curry
Wikipedia is an open-source encyclopedia, built collaboratively by a large community of web editors. The success of Wikipedia as one of the most important sources of information available today still challenges existing models of content creation. Despite the fact that the term ‘curation’ is not commonly addressed by Wikipedia’s contributors, the task of digital curation is the central activity of Wikipedia editors, who have the responsibility for information quality standards.
Wikipedia, is already widely used as a collaborative environment inside organizations5.
The investigation of the collaboration dynamics behind Wikipedia highlights important features and good practices which can be applied to different organizations. Our analysis focuses on the curation perspective and covers two important dimensions: social organization and artifacts, tools & processes for cooperative work coordination. These are key enablers that support the creation of high quality information products in Wikipedia’s decentralized environment.
Developing an Sustainable IT Capability: Lessons From Intel's JourneyEdward Curry
Intel Corporation set itself a goal to reduce its global-warming greenhouse gas footprint by 20% by 2012 from 2007 levels. Through the use of sustainable IT, the Intel IT organization is recognized as a significant contributor to the company’s sustainability strategy by transforming its IT operations and overall Intel operations. This article describes how Intel has achieved IT sustainability benefits thus far by developing four key capabilities. These capabilities have been incorporated into the Sustainable ICT Capability Maturity Framework (SICT-CMF), a model developed by an industry consortium in which the authors were key participants. The article ends with lessons learned from Intel’s experiences that can be applied by business and IT executives in other enterprises.
Challenges Ahead for Converging Financial DataEdward Curry
Consumers of financial information come in many guises from personal investors looking for that value for money share, to government regulators investigating corporate fraud, to business executives seeking competitive advantage over their competition. While the particular analysis performed by each of these information consumers will vary, they all have to deal with the explosion of information available from multiple sources including, SEC filings, corporate press releases, market press coverage, and expert commentary. Recent economic events have begun to bring sharp focus on the activities and actions of financial markets, institutions and not least regulatory authorities. Calls for enhanced scrutiny will bring increased regulation and information transparency While extracting information from individual filings is relatively easy to perform when a machine readable format is utilized (for example, using XBRL, the eXtensible Business Reporting Language), cross comparison of extracted financial information can be problematic as descriptions and accounting terms vary across companies and jurisdictions. Across multiple sources the problem becomes the classical data integration problem where a common data abstraction is necessary before functional data use can begin. Within this paper we discuss the challenges in converging financial data from multiple sources. We concentrate on integrating data from multiple sources in terms of the abstraction, linking, and consolidation activities needed to consolidate data before more sophisticated analysis algorithms can examine the data for the objectives of particular information consumers (for e.g. competitive analysis, regulatory compliance, or investor analysis). We base our discussion on several years researching and deploying data integration systems in both the web and enterprise environments.
E. Curry, A. Harth, and S. O’Riain, “Challenges Ahead for Converging Financial Data,” in Proceedings of the XBRL/W3C Workshop on Improving Access to Financial Data on the Web, 2009.
Approximate Semantic Matching of Heterogeneous EventsEdward Curry
Event-based systems have loose coupling within space, time and synchronization, providing a scalable infrastructure for information exchange and distributed workflows. However, event-based systems are tightly coupled, via event subscriptions and patterns, to the semantics of the underlying event schema and values. The high degree of semantic heterogeneity of events in large and open deployments such as smart cities and the sensor web makes it difficult to develop and maintain event-based systems. In order to address semantic coupling within event-based systems, we propose vocabulary free subscriptions together with the use of approximate semantic matching of events. This paper examines the requirement of event semantic decoupling and discusses approximate semantic event matching and the consequences it implies for event processing systems. We introduce a semantic event matcher and evaluate the suitability of an approximate hybrid matcher based on both thesauri-based and distributional semantics-based similarity and relatedness measures. The matcher is evaluated over show that the approach matches a representation of Wikipedia and Freebase events. Initial evaluations events structured with maximal combined precision-recall F1 score of 75.89% on average in all experiments with a subscription set of 7 subscriptions. The evaluation shows how a hybrid approach to semantic event matching outperforms a single similarity measure approach.
Hasan S, O'Riain S, Curry E. Approximate Semantic Matching of Heterogeneous Events. In: 6th ACM International Conference on Distributed Event-Based Systems (DEBS 2012).
An Environmental Chargeback for Data Center and Cloud Computing ConsumersEdward Curry
Government, business, and the general public increasingly agree that the polluter should pay. Carbon dioxide and environmental damage are considered viable chargeable commodities. The net effect of this for data center and cloud computing operators is that they should look to “chargeback” the environmental impacts of their services to the consuming end-users. An environmental chargeback model can have a positive effect on environmental impacts by linking consumers to the indirect impacts of their usage, facilitating clearer understanding of the impact of their actions. In this paper we motivate the need for environmental chargeback mechanisms. The environmental chargeback model is described including requirements, methodology for definition, and environmental impact allocation strategies. The paper details a proof-of-concept within an operational data center together with discussion on experiences gained and future research directions.
Curry, E.; Hasan, S.; White, M.; and Melvin, H. 2012. An Environmental Chargeback for Data Center and Cloud Computing Consumers. In Huusko, J.; de Meer, H.; Klingert, S.; and Somov, A., eds., First International Workshop on Energy-Efficient Data Centers. Madrid, Spain: Springer Berlin / Heidelberg.
The Role of Community-Driven Data Curation for EnterprisesEdward Curry
With increased utilization of data within their operational and strategic processes, enterprises need to ensure data quality and accuracy. Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance. This chapter provides an overview of data curation, discusses the business motivations for curating data and investigates the role of community-based data curation, focusing on internal communities and pre-competitive data collaborations. The chapter is supported by case studies from Wikipedia, The New York Times, Thomson Reuters, Protein Data Bank and ChemSpider upon which best practices for both social and technical aspects of community-driven data curation are described.
E. Curry, A. Freitas, and S. O’Riáin, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25-47.
Dealing with Semantic Heterogeneity in Real-Time InformationEdward Curry
Tutorial at the EarthBiAs 2014 Summer School on Dealing with Semantic Heterogeneity in Real-Time Information
Part I: Large Scale Open Environments
Part Ii: Computational Paradigms
Part III: RDF Event Processing
Part IV: Theory of Event Exchange
Part V: Approaches to Semantic Decoupling
Part VI: Example Application: Linked Energy Intelligence
Wikipedia (DBpedia): Crowdsourced Data CurationEdward Curry
Wikipedia is an open-source encyclopedia, built collaboratively by a large community of web editors. The success of Wikipedia as one of the most important sources of information available today still challenges existing models of content creation. Despite the fact that the term ‘curation’ is not commonly addressed by Wikipedia’s contributors, the task of digital curation is the central activity of Wikipedia editors, who have the responsibility for information quality standards.
Wikipedia, is already widely used as a collaborative environment inside organizations5.
The investigation of the collaboration dynamics behind Wikipedia highlights important features and good practices which can be applied to different organizations. Our analysis focuses on the curation perspective and covers two important dimensions: social organization and artifacts, tools & processes for cooperative work coordination. These are key enablers that support the creation of high quality information products in Wikipedia’s decentralized environment.
Developing an Sustainable IT Capability: Lessons From Intel's JourneyEdward Curry
Intel Corporation set itself a goal to reduce its global-warming greenhouse gas footprint by 20% by 2012 from 2007 levels. Through the use of sustainable IT, the Intel IT organization is recognized as a significant contributor to the company’s sustainability strategy by transforming its IT operations and overall Intel operations. This article describes how Intel has achieved IT sustainability benefits thus far by developing four key capabilities. These capabilities have been incorporated into the Sustainable ICT Capability Maturity Framework (SICT-CMF), a model developed by an industry consortium in which the authors were key participants. The article ends with lessons learned from Intel’s experiences that can be applied by business and IT executives in other enterprises.
Challenges Ahead for Converging Financial DataEdward Curry
Consumers of financial information come in many guises from personal investors looking for that value for money share, to government regulators investigating corporate fraud, to business executives seeking competitive advantage over their competition. While the particular analysis performed by each of these information consumers will vary, they all have to deal with the explosion of information available from multiple sources including, SEC filings, corporate press releases, market press coverage, and expert commentary. Recent economic events have begun to bring sharp focus on the activities and actions of financial markets, institutions and not least regulatory authorities. Calls for enhanced scrutiny will bring increased regulation and information transparency While extracting information from individual filings is relatively easy to perform when a machine readable format is utilized (for example, using XBRL, the eXtensible Business Reporting Language), cross comparison of extracted financial information can be problematic as descriptions and accounting terms vary across companies and jurisdictions. Across multiple sources the problem becomes the classical data integration problem where a common data abstraction is necessary before functional data use can begin. Within this paper we discuss the challenges in converging financial data from multiple sources. We concentrate on integrating data from multiple sources in terms of the abstraction, linking, and consolidation activities needed to consolidate data before more sophisticated analysis algorithms can examine the data for the objectives of particular information consumers (for e.g. competitive analysis, regulatory compliance, or investor analysis). We base our discussion on several years researching and deploying data integration systems in both the web and enterprise environments.
E. Curry, A. Harth, and S. O’Riain, “Challenges Ahead for Converging Financial Data,” in Proceedings of the XBRL/W3C Workshop on Improving Access to Financial Data on the Web, 2009.
Approximate Semantic Matching of Heterogeneous EventsEdward Curry
Event-based systems have loose coupling within space, time and synchronization, providing a scalable infrastructure for information exchange and distributed workflows. However, event-based systems are tightly coupled, via event subscriptions and patterns, to the semantics of the underlying event schema and values. The high degree of semantic heterogeneity of events in large and open deployments such as smart cities and the sensor web makes it difficult to develop and maintain event-based systems. In order to address semantic coupling within event-based systems, we propose vocabulary free subscriptions together with the use of approximate semantic matching of events. This paper examines the requirement of event semantic decoupling and discusses approximate semantic event matching and the consequences it implies for event processing systems. We introduce a semantic event matcher and evaluate the suitability of an approximate hybrid matcher based on both thesauri-based and distributional semantics-based similarity and relatedness measures. The matcher is evaluated over show that the approach matches a representation of Wikipedia and Freebase events. Initial evaluations events structured with maximal combined precision-recall F1 score of 75.89% on average in all experiments with a subscription set of 7 subscriptions. The evaluation shows how a hybrid approach to semantic event matching outperforms a single similarity measure approach.
Hasan S, O'Riain S, Curry E. Approximate Semantic Matching of Heterogeneous Events. In: 6th ACM International Conference on Distributed Event-Based Systems (DEBS 2012).
An Environmental Chargeback for Data Center and Cloud Computing ConsumersEdward Curry
Government, business, and the general public increasingly agree that the polluter should pay. Carbon dioxide and environmental damage are considered viable chargeable commodities. The net effect of this for data center and cloud computing operators is that they should look to “chargeback” the environmental impacts of their services to the consuming end-users. An environmental chargeback model can have a positive effect on environmental impacts by linking consumers to the indirect impacts of their usage, facilitating clearer understanding of the impact of their actions. In this paper we motivate the need for environmental chargeback mechanisms. The environmental chargeback model is described including requirements, methodology for definition, and environmental impact allocation strategies. The paper details a proof-of-concept within an operational data center together with discussion on experiences gained and future research directions.
Curry, E.; Hasan, S.; White, M.; and Melvin, H. 2012. An Environmental Chargeback for Data Center and Cloud Computing Consumers. In Huusko, J.; de Meer, H.; Klingert, S.; and Somov, A., eds., First International Workshop on Energy-Efficient Data Centers. Madrid, Spain: Springer Berlin / Heidelberg.
The Role of Community-Driven Data Curation for EnterprisesEdward Curry
With increased utilization of data within their operational and strategic processes, enterprises need to ensure data quality and accuracy. Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance. This chapter provides an overview of data curation, discusses the business motivations for curating data and investigates the role of community-based data curation, focusing on internal communities and pre-competitive data collaborations. The chapter is supported by case studies from Wikipedia, The New York Times, Thomson Reuters, Protein Data Bank and ChemSpider upon which best practices for both social and technical aspects of community-driven data curation are described.
E. Curry, A. Freitas, and S. O’Riáin, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25-47.
Dealing with Semantic Heterogeneity in Real-Time InformationEdward Curry
Tutorial at the EarthBiAs 2014 Summer School on Dealing with Semantic Heterogeneity in Real-Time Information
Part I: Large Scale Open Environments
Part Ii: Computational Paradigms
Part III: RDF Event Processing
Part IV: Theory of Event Exchange
Part V: Approaches to Semantic Decoupling
Part VI: Example Application: Linked Energy Intelligence
Building Optimisation using Scenario Modeling and Linked DataEdward Curry
As buildings become more complex, it becomes more difficult to manage and operate them effectively. The holistic management and maintenance of facilities is a multi-domain problem encompassing financial accounting, building maintenance, human resources, asset management and code compliance, affecting different stakeholders in different ways. One technique, called scenario modelling, customises data-driven decision support for building managers during building operation. However, current implementations of scenario modeling have been limited to data from Building Management Systems with little interaction with other relevant data sources due to interoperability issues. Linked data helps to overcome interoperability challenges to enable data from multiple domains to be merged into holistic scenario models for different stakeholders of the building. The approach is demonstrated using an owner-occupied office building.
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataEdward Curry
Data management efforts such as MDM are a popular approach for high quality enterprise data. However, MDM can be heavily centralized and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within collaborative data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk is discusses how collaborative data management can be applied within an enterprise context using platforms such as Amazon Mechanical Turk, Mobile Works, and internal enterprise human computation platforms.
Topics covered include:
- Introduction to Crowdsourcing and Human Computation for Data Management
- Crowds vs. Communities, When to use them and why
- Push vs. Pull methods of crowdsourcing data management
- Setting up and running a collaborative data management process
- Modelling the expertise of communities
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
The growing number of datasets published on the Web as linked data brings both opportunities for high data availability and challenges inherent to querying data in a semantically heterogeneous and distributed environment. Approaches used for querying siloed databases fail at Web-scale because users don't have an a priori understanding of all the available datasets. This article investigates the main challenges in constructing a query and search solution for linked data and analyzes existing approaches and trends.
System of Systems Information Interoperability using a Linked DataspaceEdward Curry
System of Systems pose significant technical challenges in terms of information interoperability that require overcoming conceptual barriers (both syntax and semantic) and technological barriers. This paper presents an approach to System of Systems information interoperability based on the Dataspace data management abstraction and the Linked Data approach to sharing information on the web. The paper describes the fundamentals of the approach and demonstrates the concept with a System of Systems for enterprise energy management.
Curry E. System of Systems Information Interoperability using a Linked Dataspace. In: IEEE 7th International Conference on System of Systems Engineering (SOSE 2012)
Further Reading:
http://www.edwardcurry.org/publications/Curry_LinkedDataspaceForSOS_SOSE.pdf
Within the operational phase buildings are now producing more data than ever before, from energy usage, utility information, occupancy patterns, weather data, etc. In order to manage a building holistically it is important to use knowledge from across these information sources. However, many barriers exist to their interoperability and there is little interaction between these islands of information. As part of moving building data to the cloud there is a critical need to reflect on the design of cloud-based data services and how they are designed from an interoperability perspective. If new cloud data services are designed in the same manner as traditional building management systems they will suffer from the data interoperability problems. Linked data technology leverages the existing open protocols and W3C standards of the Web architecture for sharing structured data on the web. In this paper we propose the use of linked data as an enabling technology for cloud-based building data services. The objective of linking building data in the cloud is to create an integrated well-connected graph of relevant information for managing a building. This paper describes the fundamentals of the approach and demonstrates the concept within a Small Medium sized Enterprise (SME) with an owner-occupied office building.
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEdward Curry
Energy Intelligence platforms can help organizations manage power consumption more efficiently by providing a functional view of the entire organization so that the energy consumption of business activities can be understood, changed, and reinvented to better support sustainable practices. Significant technical challenges exist in terms of information management, cross-domain data integration, leveraging real-time data, and assisting users to interpret the information to optimize energy usage. This paper presents an architectural approach to overcome these challenges using a Dataspace, Linked Data, and Complex Event Processing. The paper describes the fundamentals of the approach and demonstrates it within an Enterprise Energy Observatory.
E. Curry, S. Hasan, and S. O’Riáin, “Enterprise Energy Management using a Linked Dataspace for Energy Intelligence,” in The Second IFIP Conference on Sustainable Internet and ICT for Sustainability (SustainIT 2012), 2012.
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsEdward Curry
An invited talk to the Galway-Mayo Institute of Technology on the current state of the art in Sustainable IT for energy management, the challenges, and the emerging trends.
Transforming the European Data Economy: A Strategic Research and Innovation A...Edward Curry
Transforming the European Data Economy: A Strategic Research and Innovation Agenda
Keynote at European Data Forum 2016
Prof. Dr. Milan Petković, Vice President BDVA, Philips
Dr. Edward Curry, Vice President BDVA, Insight
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
Cyber-Physical Energy Systems (CPES) exploit the potential of information technology to boost energy efficiency while minimising environmental impacts. CPES can help manage energy more efficiently by providing a functional view of the entire energy system so that energy activities can be understood, changed, and reinvented to better support sustainable practices. CPES can be applied at different scales from Smart Grids and Smart Cities to Smart Enterprises and Smart Buildings. Significant technical challenges exist in terms of information management, leveraging real-time sensor data, coordination of the various stakeholders to optimize energy usage.
In this talk I describe an approach to overcome these challenges by re-using the Web standards to quickly connect the required systems within a CPES. The resulting lightweight architecture leverages Web technologies including Linked Data, the Web of Things, and Social Media. The paper describes the fundamentals of the approach and demonstrates it within an Enterprise Energy Management scenario smart building.
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingEdward Curry
Recent advances in web technologies allow people to help solve complex problems by performing online tasks in return for money, learning, or fun. At present, human contribution is limited to the tasks defined on individual crowdsourcing platforms. Furthermore, there is a lack of tools and technologies that support matching of tasks with appropriate users, across multiple systems. A more explicit capture of the semantics of crowdsourcing tasks could enable the design and development of matchmaking services between users and tasks. The paper presents the SLUA ontology that aims to model users and tasks in crowdsourcing systems in terms of the relevant actions, capabilities, and rewards. This model describes different types of human tasks that help in solving complex problems using crowds. The paper provides examples of describing users and tasks in some real world systems, with SLUA ontology.
A Capability Maturity Framework for Sustainable ICTEdward Curry
Researchers estimate that information and communication technology (ICT) is responsible for at least 2 percent of global greenhouse gas (GHG) emissions. Furthermore, in any individual business, ICT is responsible for a much higher percentage of that business's GHG footprint. Yet researchers also estimate that ICT can provide business solutions to reduce its GHG footprint fivefold. However, because the field is new and evolving, few guidelines and best practices are available. To address this issue, a consortium of leading organizations from industry, the nonprofit sector, and academia has developed and tested a framework for systematically assessing and improving SICT capabilities. The Innovation Value Institute (IVI; http://ivi.nuim.ie) consortium used an open-innovation model of collaboration, engaging academia and industry in scholarly work to create the SICT-Capability Maturity Framework (SICT-CMF), which is discussed in this paper.
B. Donnellan, C. Sheridan, and E. Curry, "A Capability Maturity Framework for Sustainable Information and Communication Technology,â" IEEE IT Professional, vol. 13, no. 1, pp. 33-40, Jan. 2011.
Key Technology Trends for Big Data in EuropeEdward Curry
In this presentation we will discuss some of the results of the BIG project including analysis of foundational Big Data research technologies, technology and strategy roadmaps to enable business to understand the potential of Big Data technologies across different sectors, and the necessary collaboration and dissemination infrastructure to link technology suppliers, integrators and leading user organizations.
Edward Curry is leading the Technical Working Group of the BIG Project with over 30 committed experts along the big data value chain (Acquisition, Analysis, Curation, Storage, Usage). With the help of the other technical leads, he will elaborate on the key technology trends identified in the BIG Project and how they bring data-driven value to industrial sectors.
Linked Water Data For Water Information ManagementEdward Curry
The management of water consumption is hindered by low general awareness and absence of precise historical and contextual information. Effective and efficiency management of water resources requires a holistic approach considering all the stages of water usage. A decision support tool for water management services requires access to a number of different data domains and different data providers. The design of next-generation water information management systems poses significant technical challenges in terms of information management, integration of heterogeneous data, and real-time processing of dynamic data. Linked Data is a set of web technologies that enables integration of different data sources. This work investigates the usage of Linked Data technologies in the Water Management domain, describes the fundamental concepts of the approach, details an architecture, and discusses possible water management applications.
Interactive Water Services: The Waternomics ApproachEdward Curry
WATERNOMICS focuses on the development of ICT as an enabling technology to manage water as a resource, increase end-user conservation awareness and affect behavioral changes. Unique aspects of WATERNOMICS include personalized feedback about end-user water consumption, the development of systematic and standards-based water resource management systems, new sensor hardware developments, and the introduction of forecasting and fault detection diagnosis to the analysis of water consumption data. These services will be bundled into the WATERNOMICS Water Information Services Platform. This paper presents the overall architectural approach to WATERNOMICS and details the potential interactive services possible based on this novel platform.
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
The Real-time Linked Dataspace (RLD) is an enabling platform for data management for intelligent systems within smart environments that combines the pay-as-you-go paradigm of dataspaces, linked data, and knowledge graphs with entity-centric real-time query capabilities.
The RLD contains all the relevant information within a data ecosystem including things, sensors, and data sources and has the responsibility for managing the relationships among these participants.
It manages sources without presuming a pre-existing semantic integration among them using specialised dataspace support services for loose administrative proximity and semantic integration for event and stream systems. Support services leverage approximate and best-effort techniques and operate under a 5 star model for “pay-as-you-go” incremental data management.
Crowdsourcing Approaches for Smart City Open Data ManagementEdward Curry
A wide-scale bottom-up approach to the creation and management of open data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. This talk explores how to involving a wide community of users in collaborative management of open data activities within a Smart City. The talk discusses how crowdsourcing techniques can be applied within a Smart City context using crowdsourcing and human computation platforms such as Amazon Mechanical Turk, Mobile Works, and Crowd Flower.
Big Data and Big Data Management (BDM) with current Technologies –ReviewIJERA Editor
The emerging phenomenon called ―Big Data‖ is pushing numerous changes in businesses and several other organizations, Domains, Fields, areas etc. Many of them are struggling just to manage the massive data sets. Big data management is about two things - ―Big data‖ and ―Data Management‖ and these terms work together to achieve business and technology goals as well. In previous few years data generation have tremendously enhanced due to digitization of data. Day by day new computer tools and technologies for transmission of data among several computers through Internet is been increasing. It‗s relevance and importance in the context of applicability, usefulness for decision making, performance improvement etc in all areas have emerged very fast to be relevant in today‗s era. Big data management also has numerous challenges and common complexities include low organizational maturity relative to big data, weak business support, and the need to learn new technology approaches. This paper will discuss the impacts of Big Data and issues related to data management using current technologies
Open Data Innovation in Smart Cities: Challenges and TrendsEdward Curry
Open Data initiatives are increasingly considered as defining elements of emerging smart cities. However, few studies have attempted to provide a better understanding of the nature of this convergence and the impact on both domains. This talk examines the challenges and trends with open data initiatives using a socio-technical perspective of smart cities. The talk presents findings from a detailed study of 18 open data initiatives across five smart cities to identify emerging best practice. Three distinct waves of open data innovation for smart cities are discussed. The talk details the specific impacts of open data innovation on the different smart cities domains, governance of the cities, and the nature of datasets available in the open data ecosystem within smart cities.
Building Optimisation using Scenario Modeling and Linked DataEdward Curry
As buildings become more complex, it becomes more difficult to manage and operate them effectively. The holistic management and maintenance of facilities is a multi-domain problem encompassing financial accounting, building maintenance, human resources, asset management and code compliance, affecting different stakeholders in different ways. One technique, called scenario modelling, customises data-driven decision support for building managers during building operation. However, current implementations of scenario modeling have been limited to data from Building Management Systems with little interaction with other relevant data sources due to interoperability issues. Linked data helps to overcome interoperability challenges to enable data from multiple domains to be merged into holistic scenario models for different stakeholders of the building. The approach is demonstrated using an owner-occupied office building.
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataEdward Curry
Data management efforts such as MDM are a popular approach for high quality enterprise data. However, MDM can be heavily centralized and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within collaborative data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk is discusses how collaborative data management can be applied within an enterprise context using platforms such as Amazon Mechanical Turk, Mobile Works, and internal enterprise human computation platforms.
Topics covered include:
- Introduction to Crowdsourcing and Human Computation for Data Management
- Crowds vs. Communities, When to use them and why
- Push vs. Pull methods of crowdsourcing data management
- Setting up and running a collaborative data management process
- Modelling the expertise of communities
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
The growing number of datasets published on the Web as linked data brings both opportunities for high data availability and challenges inherent to querying data in a semantically heterogeneous and distributed environment. Approaches used for querying siloed databases fail at Web-scale because users don't have an a priori understanding of all the available datasets. This article investigates the main challenges in constructing a query and search solution for linked data and analyzes existing approaches and trends.
System of Systems Information Interoperability using a Linked DataspaceEdward Curry
System of Systems pose significant technical challenges in terms of information interoperability that require overcoming conceptual barriers (both syntax and semantic) and technological barriers. This paper presents an approach to System of Systems information interoperability based on the Dataspace data management abstraction and the Linked Data approach to sharing information on the web. The paper describes the fundamentals of the approach and demonstrates the concept with a System of Systems for enterprise energy management.
Curry E. System of Systems Information Interoperability using a Linked Dataspace. In: IEEE 7th International Conference on System of Systems Engineering (SOSE 2012)
Further Reading:
http://www.edwardcurry.org/publications/Curry_LinkedDataspaceForSOS_SOSE.pdf
Within the operational phase buildings are now producing more data than ever before, from energy usage, utility information, occupancy patterns, weather data, etc. In order to manage a building holistically it is important to use knowledge from across these information sources. However, many barriers exist to their interoperability and there is little interaction between these islands of information. As part of moving building data to the cloud there is a critical need to reflect on the design of cloud-based data services and how they are designed from an interoperability perspective. If new cloud data services are designed in the same manner as traditional building management systems they will suffer from the data interoperability problems. Linked data technology leverages the existing open protocols and W3C standards of the Web architecture for sharing structured data on the web. In this paper we propose the use of linked data as an enabling technology for cloud-based building data services. The objective of linking building data in the cloud is to create an integrated well-connected graph of relevant information for managing a building. This paper describes the fundamentals of the approach and demonstrates the concept within a Small Medium sized Enterprise (SME) with an owner-occupied office building.
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEdward Curry
Energy Intelligence platforms can help organizations manage power consumption more efficiently by providing a functional view of the entire organization so that the energy consumption of business activities can be understood, changed, and reinvented to better support sustainable practices. Significant technical challenges exist in terms of information management, cross-domain data integration, leveraging real-time data, and assisting users to interpret the information to optimize energy usage. This paper presents an architectural approach to overcome these challenges using a Dataspace, Linked Data, and Complex Event Processing. The paper describes the fundamentals of the approach and demonstrates it within an Enterprise Energy Observatory.
E. Curry, S. Hasan, and S. O’Riáin, “Enterprise Energy Management using a Linked Dataspace for Energy Intelligence,” in The Second IFIP Conference on Sustainable Internet and ICT for Sustainability (SustainIT 2012), 2012.
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsEdward Curry
An invited talk to the Galway-Mayo Institute of Technology on the current state of the art in Sustainable IT for energy management, the challenges, and the emerging trends.
Transforming the European Data Economy: A Strategic Research and Innovation A...Edward Curry
Transforming the European Data Economy: A Strategic Research and Innovation Agenda
Keynote at European Data Forum 2016
Prof. Dr. Milan Petković, Vice President BDVA, Philips
Dr. Edward Curry, Vice President BDVA, Insight
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
Cyber-Physical Energy Systems (CPES) exploit the potential of information technology to boost energy efficiency while minimising environmental impacts. CPES can help manage energy more efficiently by providing a functional view of the entire energy system so that energy activities can be understood, changed, and reinvented to better support sustainable practices. CPES can be applied at different scales from Smart Grids and Smart Cities to Smart Enterprises and Smart Buildings. Significant technical challenges exist in terms of information management, leveraging real-time sensor data, coordination of the various stakeholders to optimize energy usage.
In this talk I describe an approach to overcome these challenges by re-using the Web standards to quickly connect the required systems within a CPES. The resulting lightweight architecture leverages Web technologies including Linked Data, the Web of Things, and Social Media. The paper describes the fundamentals of the approach and demonstrates it within an Enterprise Energy Management scenario smart building.
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingEdward Curry
Recent advances in web technologies allow people to help solve complex problems by performing online tasks in return for money, learning, or fun. At present, human contribution is limited to the tasks defined on individual crowdsourcing platforms. Furthermore, there is a lack of tools and technologies that support matching of tasks with appropriate users, across multiple systems. A more explicit capture of the semantics of crowdsourcing tasks could enable the design and development of matchmaking services between users and tasks. The paper presents the SLUA ontology that aims to model users and tasks in crowdsourcing systems in terms of the relevant actions, capabilities, and rewards. This model describes different types of human tasks that help in solving complex problems using crowds. The paper provides examples of describing users and tasks in some real world systems, with SLUA ontology.
A Capability Maturity Framework for Sustainable ICTEdward Curry
Researchers estimate that information and communication technology (ICT) is responsible for at least 2 percent of global greenhouse gas (GHG) emissions. Furthermore, in any individual business, ICT is responsible for a much higher percentage of that business's GHG footprint. Yet researchers also estimate that ICT can provide business solutions to reduce its GHG footprint fivefold. However, because the field is new and evolving, few guidelines and best practices are available. To address this issue, a consortium of leading organizations from industry, the nonprofit sector, and academia has developed and tested a framework for systematically assessing and improving SICT capabilities. The Innovation Value Institute (IVI; http://ivi.nuim.ie) consortium used an open-innovation model of collaboration, engaging academia and industry in scholarly work to create the SICT-Capability Maturity Framework (SICT-CMF), which is discussed in this paper.
B. Donnellan, C. Sheridan, and E. Curry, "A Capability Maturity Framework for Sustainable Information and Communication Technology,â" IEEE IT Professional, vol. 13, no. 1, pp. 33-40, Jan. 2011.
Key Technology Trends for Big Data in EuropeEdward Curry
In this presentation we will discuss some of the results of the BIG project including analysis of foundational Big Data research technologies, technology and strategy roadmaps to enable business to understand the potential of Big Data technologies across different sectors, and the necessary collaboration and dissemination infrastructure to link technology suppliers, integrators and leading user organizations.
Edward Curry is leading the Technical Working Group of the BIG Project with over 30 committed experts along the big data value chain (Acquisition, Analysis, Curation, Storage, Usage). With the help of the other technical leads, he will elaborate on the key technology trends identified in the BIG Project and how they bring data-driven value to industrial sectors.
Linked Water Data For Water Information ManagementEdward Curry
The management of water consumption is hindered by low general awareness and absence of precise historical and contextual information. Effective and efficiency management of water resources requires a holistic approach considering all the stages of water usage. A decision support tool for water management services requires access to a number of different data domains and different data providers. The design of next-generation water information management systems poses significant technical challenges in terms of information management, integration of heterogeneous data, and real-time processing of dynamic data. Linked Data is a set of web technologies that enables integration of different data sources. This work investigates the usage of Linked Data technologies in the Water Management domain, describes the fundamental concepts of the approach, details an architecture, and discusses possible water management applications.
Interactive Water Services: The Waternomics ApproachEdward Curry
WATERNOMICS focuses on the development of ICT as an enabling technology to manage water as a resource, increase end-user conservation awareness and affect behavioral changes. Unique aspects of WATERNOMICS include personalized feedback about end-user water consumption, the development of systematic and standards-based water resource management systems, new sensor hardware developments, and the introduction of forecasting and fault detection diagnosis to the analysis of water consumption data. These services will be bundled into the WATERNOMICS Water Information Services Platform. This paper presents the overall architectural approach to WATERNOMICS and details the potential interactive services possible based on this novel platform.
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
The Real-time Linked Dataspace (RLD) is an enabling platform for data management for intelligent systems within smart environments that combines the pay-as-you-go paradigm of dataspaces, linked data, and knowledge graphs with entity-centric real-time query capabilities.
The RLD contains all the relevant information within a data ecosystem including things, sensors, and data sources and has the responsibility for managing the relationships among these participants.
It manages sources without presuming a pre-existing semantic integration among them using specialised dataspace support services for loose administrative proximity and semantic integration for event and stream systems. Support services leverage approximate and best-effort techniques and operate under a 5 star model for “pay-as-you-go” incremental data management.
Crowdsourcing Approaches for Smart City Open Data ManagementEdward Curry
A wide-scale bottom-up approach to the creation and management of open data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. This talk explores how to involving a wide community of users in collaborative management of open data activities within a Smart City. The talk discusses how crowdsourcing techniques can be applied within a Smart City context using crowdsourcing and human computation platforms such as Amazon Mechanical Turk, Mobile Works, and Crowd Flower.
Big Data and Big Data Management (BDM) with current Technologies –ReviewIJERA Editor
The emerging phenomenon called ―Big Data‖ is pushing numerous changes in businesses and several other organizations, Domains, Fields, areas etc. Many of them are struggling just to manage the massive data sets. Big data management is about two things - ―Big data‖ and ―Data Management‖ and these terms work together to achieve business and technology goals as well. In previous few years data generation have tremendously enhanced due to digitization of data. Day by day new computer tools and technologies for transmission of data among several computers through Internet is been increasing. It‗s relevance and importance in the context of applicability, usefulness for decision making, performance improvement etc in all areas have emerged very fast to be relevant in today‗s era. Big data management also has numerous challenges and common complexities include low organizational maturity relative to big data, weak business support, and the need to learn new technology approaches. This paper will discuss the impacts of Big Data and issues related to data management using current technologies
Open Data Innovation in Smart Cities: Challenges and TrendsEdward Curry
Open Data initiatives are increasingly considered as defining elements of emerging smart cities. However, few studies have attempted to provide a better understanding of the nature of this convergence and the impact on both domains. This talk examines the challenges and trends with open data initiatives using a socio-technical perspective of smart cities. The talk presents findings from a detailed study of 18 open data initiatives across five smart cities to identify emerging best practice. Three distinct waves of open data innovation for smart cities are discussed. The talk details the specific impacts of open data innovation on the different smart cities domains, governance of the cities, and the nature of datasets available in the open data ecosystem within smart cities.
Towards a BIG Data Public Private PartnershipEdward Curry
Building an industrial community around Big Data in Europe is the priority of the BIG: Big Data Public Private Forum project. In this workshop we will present the work of the project including analysis of foundational Big Data research technologies, technology and strategy roadmaps to enable business to understand the potential of Big Data technologies, and the necessary collaboration and dissemination infrastructure to link technology suppliers, integrators and leading user organizations. BIG is working towards the definition and implementation of a clear strategy that tackles the necessary efforts in terms of Big Data research and innovation, while also providing a major boost for technology adoption and supporting actions for the successful implementation of the Big Data economy.
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Edward Curry
The complexity, volume and diversity of government policies and regulations raises significant burden on both the complying parties and government itself. On the one hand, businesses, civil organizations and other societal entities are required to simultaneously comply with and interpret different and possibly conflicting or inconsistent regulations. On the other hand, government as a whole must ensure policy and regulatory coherence across its various policy domains. While the recent wave of open government initiatives have led to significantly more public access to these documents, features allowing cross-referencing related documents and linking to less formal documents or comments on other media more understandable and accessible to the public are not common if at all available today. As a solution to this challenge, we propose an Open Government-wide Policy and Regulation Information Space consisting of documents that are “semantically” annotated and cross-linked to other documents in the information space as well as to external resources such as interpretations, comments and blogs on the social web.
Our approach is three-fold. First, we identify the requirements for the infrastructure. Second, we eloborate a Reference Architecture identifying the various elements needed within the infrastructure. Third, we show how such infrastructure may be realised as a linked data portal where policies and regulations are published as linked open data. Finally, we present a case study involving environmental policy and regulations; discuss the potential impact of such infrastructure on coherency and accessibility of policies and regulations and concludes with challenges associated with provisioning a linked open policy and regulatory information infrastructure.
Designing Next Generation Smart City Initiatives:Harnessing Findings And Les...Edward Curry
The proliferation of “Smart Cities” initiatives around the world is part of the strategic response by governments to the challenges and opportunities of increasing urbanization and the rise of cities as the nexus of societal development. As a framework for urban transformation, Smart City initiatives aim to harness Information and Communication Technologies and Knowledge Infrastructures for economic regeneration, social cohesion, better city administration and infrastructure management. However, experiences from earlier Smart City initiatives have revealed several technical, management and governance challenges arising from the inherent nature of a Smart City as a complex “Socio- technical System of Systems”. While these early lessons are informing modest objectives for planned Smart Cities programs, no rigorous developed framework based on careful analysis of existing initiatives is available to guide policymakers, practitioners, and other Smart City stakeholders. In response to this need, this paper presents a “Smart City Initiative Design (SCID) Framework” grounded in the findings from the analysis of ten major Smart Cities programs from Netherlands, Sweden, Malta, United Arab Emirates, Portugal, Singapore, Brazil, South Korea, China and Japan. The findings provide a design space for the objectives, implementation options, strategies, and the enabling institutional and governance mechanisms for Smart City initiatives.
Citizen Actuation For Lightweight Energy ManagementEdward Curry
In this work, we aim to utilise the concept of citizen sensors but also introduce the theory of citizen actuation. Citizen sensors observe, report, and collect data – we propose by supporting these citizen sensors with methods to affect their surroundings we enable them to become citizen actuators. We outline a use case for citizen actuation in the Energy Management domain, propose an architecture (a Cyber-Physical Social System) built on previous work in Energy Management with Twitter integration, use of Complex Event Processing (CEP), and perform an experiment to test this theory. We motivate the need for citizen actuation in Building Management Systems due to the high cost of actuation systems. We define the concept of citizen actuation and outline an experiment that shows a reduction in average energy usage of 24%. The experiment supports the concept of citizen actuation to improve energy usage within the experimental environment and we discuss future research directions in this area.
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
Data management efforts such as Master Data Management and Data Curation are a popular approach for high quality enterprise data. However, Data Curation can be heavily centralised and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within big data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk discusses how crowdsourcing data management techniques can be applied within an enterprise context.
Topics covered include:
- Data Quality And Data Curation
- Crowdsourcing
- Case Studies on Crowdsourced Data Curation
- Setting up a Crowdsourced Data Curation Process
- Linked Open Data Example
- Future Research Challenges
Towards Unified and Native Enrichment in Event Processing SystemsEdward Curry
Events are encapsulated pieces of information that flow from one event agent to another. In order to process an event, additional information that is external to the event is often needed. This is achieved using a process called event enrichment. Current approaches to event enrichment are external to event processing engines and are handled by specialized agents. Within large-scale environments with high heterogeneity among events, the enrichment process may become difficult to maintain. This paper examines event enrichment in terms of information completeness and presents a unified model for event enrichment that takes place natively within the event processing engine. The paper describes the requirements of event enrichment and highlights its challenges such as finding enrichment sources, retrieval of information items, finding complementary information and its fusion with events. It then details an instantiation of the model using Semantic Web and Linked Data technologies. Enrichment is realised by dynamically guiding a spreading activation algorithm in a Linked Data graph. Multiple spreading activation strategies have been evaluated on a set of Wikipedia events and experimentation shows the viability of the approach.
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...Camille Mathieu
Metadata Standards and Organizational Resource Allocation: A Case for the Effective Management of Digital Assets (draft) is the draft for my Master's portfolio defense to occur in about 6 months. This presentation summarizes common deficiencies in enterprise content management and links these deficiencies to increases in organizational inefficiency. The standardization of metadata across repositories and across enterprises is advocated as a solution to many content management and information retrieval woes experienced by organizations. Any feedback greatly appreciated!
ABSTRACT: The management of digital intellectual assets has become a crucial governance challenge for many organizations. Investments in metadata standardization would greatly increase an organization’s ability to store, retrieve, and manipulate these assets most effectively. With their reliance on manageable digital assets for resource allocation and internal search skyrocketing, organizations should prioritize the development and implementation of consistent metadata standards.
Towards Expertise Modelling for Routing Data Cleaning Tasks within a Communit...Umair ul Hassan
https://www.insight-centre.org/content/towards-expertise-modelling-routing-data-cleaning-tasks-within-community-knowledge-workers
Presented at the ICIQ 2012
ABSTRACT:
Applications consuming data have to deal with variety of data quality issues such as missing values, duplication, incorrect values, etc. Although automatic approaches can be utilized for data cleaning the results can remain uncertain. Therefore updates suggested by automatic data cleaning algorithms require further human verification. This paper presents an approach for generating tasks for uncertain updates and routing these tasks to appropriate workers based on their expertise. Specifically the paper tackles the problem of modelling the expertise of knowledge workers for the purpose of routing tasks within collaborative data quality management. The proposed expertise model represents the profile of a worker against a set of concepts describing the data. A simple routing algorithm is employed for leveraging the expertise profiles for matching data cleaning tasks with workers. The proposed approach is evaluated on a real world dataset using human workers. The results demonstrate the effectiveness of using concepts for modelling expertise, in terms of likelihood of receiving responses to tasks routed to workers.
Envisioning a discussion dashboard for collective intelligence of web convers...jodischneider
Can we use Walton's discussion types to provide context for discussions?
Presentation for a CSCW2012 workshop on Collective Intelligence as Community Discourse and Action, focusing on the *why* of discussions, and detecting that with textmining.
See the position paper, Envisioning A Discussion Dashboard for Collective Intelligence of Web Conversations, at
http://events.kmi.open.ac.uk/cscw-ci2012/wp-content/uploads/2012/02/SchneiderPassant-cscw12-w33.pdf
Data2030 Summit Data Megatrends Turner Sept 2022.pptxMatt Turner
The next challenge in data is rapidly becoming clear: how can we scale data value and bring data driven decision making to everyone? We’ve made tremendous progress in bringing data together. The megatrends in data - data mesh, data fabric, modern data stack - are all about crossing the last mile to get data to everyone, not just the data experts. How can we empower everyone to better use data? Are the megatrends the road to actually scaling data value? And what does that mean for the data teams and data engineers creating systems and delivering dataops?
Down to Business: Taking Action Quickly with Linked Data ServicesInside Analysis
The Briefing Room with Krish Krishnan and Denodo
Live Webcast 5-28-2013
Rapid time-to-insight makes analysts happy, but rapid time-to-action is what executives want most. Being able to respond quickly to market changes, new opportunities or customer requests is increasingly a must-have in today's competitive landscape. The key ingredient for this kind of organizational flexibility? Data! Companies that can quickly pull together a variety of data sources have a significant advantage over those that cannot.
Register for this episode of The Briefing Room to hear Analyst Krish Krishnan of Sixth Sense explain how linked data services can provide the necessary foundation for an agile enterprise. He'll be briefed by Suresh Chandrasekaran of Denodo Technologies who will showcase his company's mature data virtualization platform. He'll demonstrate how a point-and-click interface can be used to quickly assemble a wide range of data sets, thus enabling companies to build business solutions that address very specific enterprise needs.
Visit: http://www.insideanalysis.com
The US Department of Health and Human Services (HHS) began publishing Linked Data in 2011 as part of an ongoing effort to inform the public and stimulate new health care applications.
The Digital Enterprise Research Institute (DERI) is recognized as one of the leading international web science research institutes interlinking technologies, information and people to advance business and benefit society.
In the US, the President's Council of Advisors on Science and Technology (PCAST) published a report on Health IT that imagines new scenarios and recommends new capabilities for interacting with health data.
At DERI, innovative ontology and software implementations demonstrate how users can create and manage fine-grained privacy preferences that restrict or grant access to their Linked Data
This session will give an overview of the HHS/DERI collaboration to implement 'data element access services' towards the realization of patient controlled privacy.
• US Department of Health and Human Services
• PCAST Health Information Technology Report
• Digital Enterprise Research Institute
• Privacy Preference Ontology and Manager
• Puelia and Linked Data API
http://semtechbizsf2012.semanticweb.com/sessionPop.cfm?confid=65&proposalid=4539
Presented at Opening Up Government Data in DERI, NUI Galway, Ireland as part of Irish Open Data Week.
For more information on Irish Open Data, check out the workspace at http://workspace.opendata.ie
Presented at the Workshop on the Potential of Social Media Tools and Data in the News Media Industry (SocMedNews) of the 6th International Conference on Weblogs and Social Media (ICWSM 12).
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Accelerate your Kubernetes clusters with Varnish Caching
Data Curation at the New York Times
1. Digital Enterprise Research Institute www.deri.ie
Data Curation at the
New York Times
Edward Curry, Andre Freitas, Seán O'Riain
ed.curry@deri.org
http://www.deri.org/
http://www.EdwardCurry.org/
Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
2. Speaker Profile
Digital Enterprise Research Institute www.deri.ie
Research Scientist at the Digital Enterprise Research
Institute (DERI)
Leading international web science research organization
Researching how web of data is changing way business
work and interact with information
Projects include studies of enterprise linked data, community-
based data curation, semantic data analytics, and semantic
search
Investigate utilization within the pharmaceutical, oil & gas,
financial, advertising, media, manufacturing, health care, ICT,
and automotive industries
Invited speaker at the 2010 MIT Sloan CIO Symposium
to an audience of more than 600 CIOs
3. Overview
Digital Enterprise Research Institute www.deri.ie
Curation Background
The Business Need for Curated Data
What is Data Curation?
Data Quality and Curation
How to Curate Data
New York Times Case Study
Best Practices from Case Study Learning
4. The Business Need
Digital Enterprise Research Institute www.deri.ie
Knowledge workers need:
Access to the right information
Confidence in that information
Working incomplete
inaccurate, or wrong
information can have
disastrous consequences
5. The Problems with Data
Digital Enterprise Research Institute www.deri.ie
Flawed Data
Effects 25% of critical data in world‟s top companies
(Gartner)
Data Quality
Recent banking crisis (Economist Dec‟09)
Inaccurate figures made it difficult to manage operations
(investments exposure and risk)
– “asset are defined differently in different programs”
– “numbers did not always add up”
– “departments do not trust each other‟s figures”
– “figures … not worth the pixels they were made of”
6. What is Data Curation?
Digital Enterprise Research Institute www.deri.ie
Digital Curation
Selection, preservation, maintenance, collection, and
archiving of digital assets
Data Curation
Active management of data over its life-cycle
Data Curators
Ensure data is trustworthy, discoverable, accessible,
reusable, and fit for use
– Museum cataloguers of the Internet age
7. What is Data Curation?
Digital Enterprise Research Institute www.deri.ie
Data Governance
Convergence of data quality, data management,
business process management, and risk
management
Data Curation is a complimentary activity
Part of overall data governance strategy for
organization
Data Curator = Data Steward ??
Overlapping terms between communities
8. Data Quality and Curation
Digital Enterprise Research Institute www.deri.ie
What is Data Quality?
Desirable characteristics for information resource
Described as a series of quality dimensions
– Discoverability, Accessibility, Timeliness, Completeness,
Interpretation, Accuracy, Consistency, Provenance &
Reputation
Data curation can be used to improve these
quality dimensions
9. Data Quality and Curation
Digital Enterprise Research Institute www.deri.ie
Discoverability & Accessibility
Curate to streamline search by storing and classifying
in appropriate and consistent manner
Accuracy
Curate to ensure data correctly represents the “real-
world” values it models
Consistency
Curate to ensure data created and maintained using
standardized definitions, calculations, terms, and
identifiers
10. Data Quality and Curation
Digital Enterprise Research Institute www.deri.ie
Provenance & Reputation
Curate to track source of data and determine reputation
Curate to include the objectivity of the source/producer
– Is the information unbiased, unprejudiced, and impartial?
– Or does it come from a reputable but partisan source?
Other dimensions discussed in chapter
11. How to Curate Data
Digital Enterprise Research Institute www.deri.ie
Data Curation is a large field with sophisticated
techniques and processes
Section provides high-level overview on:
Should you curate data?
Types of Curation
Setting up a curation process
Additional detail and references available in book
chapter
12. Should You Curate Data?
Digital Enterprise Research Institute www.deri.ie
Curation can have multiple motivations
Improving accessibility, quality, consistency,…
Will the data benefit from curation?
Identify business case
Determine if potential return support investment
Not all enterprise data should be curated
Suits knowledge-centric data rather than transactional
operations data
13. Types of Data Curation
Digital Enterprise Research Institute www.deri.ie
Multiple approaches to curate data, no single
correct way
Who?
– Individual Curators
– Curation Departments
– Community-based Curation
How?
– Manual Curation
– (Semi-)Automated
– Sheer Curation
14. Types of Data Curation – Who?
Digital Enterprise Research Institute www.deri.ie
Individual Data Curators
Suitable for infrequently changing small quantity of
data
– (<1,000 records)
– Minimal curation effort (minutes per record)
15. Types of Data Curation – Who?
Digital Enterprise Research Institute www.deri.ie
Curation Departments
Curation experts working with subject matter experts
to curate data within formal process
– Can deal with large curation effort (000‟s of records)
Limitations
Scalability: Can struggle with large quantities of
dynamic data (>million records)
Availability: Post-hoc nature creates delay in curated
data availability
16. Types of Data Curation - Who?
Digital Enterprise Research Institute www.deri.ie
Community-Based Data Curation
Decentralized approach to data curation
Crowd-sourcing the curation process
– Leverages community of users to curate data
Wisdom of the community (crowd)
Can scale to millions of records
17. Types of Data Curation – How?
Digital Enterprise Research Institute www.deri.ie
Manual Curation
Curators directly manipulate data
Can tie users up with low-value add activities
(Sem-)Automated Curation
Algorithms can (semi-)automate curation activities
such as data cleansing, record duplication and
classification
Can be supervised or approved by human curators
18. Types of Data Curation – How?
Digital Enterprise Research Institute www.deri.ie
Sheer curation, or Curation at Source
Curation activities integrated in normal workflow of
those creating and managing data
Can be as simple as vetting or “rating” the results of a
curation algorithm
Results can be available immediately
Blended Approaches: Best of Both
Sheer curation + post hoc curation department
Allows immediate access to curated data
Ensures quality control with expert curation
19. Setting up a Curation Process
Digital Enterprise Research Institute www.deri.ie
5 Steps to setup a curation process:
1 - Identify what data you need to curate
2 - Identify who will curate the data
3 - Define the curation workflow
4 - Identity appropriate data-in & data-out formats
5 - Identify the artifacts, tools, and processes needed to
support the curation process
20. The New York Times
Digital Enterprise Research Institute www.deri.ie
100 Years of Expert Data Curation
21. The New York Times
Digital Enterprise Research Institute www.deri.ie
Largest metropolitan and third largest
newspaper in the United States
nytimes.com
Most popular newspaper
website in US
100 year old curated
repository defining its
participation in the
emerging Web of Data
22. The New York Times
Digital Enterprise Research Institute www.deri.ie
Data curation dates back to 1913
Publisher/owner Adolph S. Ochs decided to provide a
set of additions to the newspaper
New York Times Index
Organized catalog of articles titles and summaries
– Containing issue, date and column of article
– Categorized by subject and names
– Introduced on quarterly then annual basis
Transitory content of newspaper became
important source of searchable historical data
Often used to settle historical debates
23. The New York Times
Digital Enterprise Research Institute www.deri.ie
Index Department was created in 1913
Curation and cataloguing of NYT resources
– Since 1851 NYT had low quality index for internal use
Developed a comprehensive catalog using a
controlled vocabulary
Covering subjects, personal names, organizations,
geographic locations and titles of creative works
(books, movies, etc), linked to articles and their
summaries
Current Index Dept. has ~15 people
24. The New York Times
Digital Enterprise Research Institute www.deri.ie
Challenges with consistently and accurately
classifying news articles over time
Keywords expressing subjects may show some
variance due to cultural or legal constraints
Identities of some entities, such as organizations and
places, changed over time
Controlled vocabulary grew to hundreds of
thousands of categories
Adding complexity to classification process
25. The New York Times
Digital Enterprise Research Institute www.deri.ie
Increased importance of Web drove need to
improve categorization of online content
Curation carried out by Index Department
Library-time (days to weeks)
Print edition can handle next-day index
Not suitable for real-time online publishing
nytimes.com needed a same-day index
26. The New York Times
Digital Enterprise Research Institute www.deri.ie
Introduced two stage curation process
Editorial staff performed best-effort semi-automated
sheer curation at point of online pub.
– Several hundreds journalists
Index Department follow up with long-term accurate
classification and archiving
Benefits:
Non-expert journalist curators provide instant
accessibility to online users
Index Department provides long-term high-quality
curation in a “trust but verify” approach
27. NYT Curation Workflow
Digital Enterprise Research Institute www.deri.ie
Curation starts with article getting out of the newsroom
28. NYT Curation Workflow
Digital Enterprise Research Institute www.deri.ie
Member of editorial staff submits article to web-based rule
based information extraction system (SAS Teragram)
29. NYT Curation Workflow
Digital Enterprise Research Institute www.deri.ie
Teragram uses linguistic extraction rules based on subset of
Index Dept‟s controlled vocab.
30. NYT Curation Workflow
Digital Enterprise Research Institute www.deri.ie
Teragram suggests tags based on the Index vocabulary that
can potentially describe the content of article
31. NYT Curation Workflow
Digital Enterprise Research Institute www.deri.ie
Editorial staff member selects terms that best describe the
contents and inserts new tags if necessary
32. NYT Curation Workflow
Digital Enterprise Research Institute www.deri.ie
Reviewed by the taxonomy managers with feedback to
editorial staff on classification process
33. NYT Curation Workflow
Digital Enterprise Research Institute www.deri.ie
Article is published online at nytimes.com
34. NYT Curation Workflow
Digital Enterprise Research Institute www.deri.ie
At later stage article receives second level curation by Index
Dept. additional Index tags and a summary
36. The New York Times
Digital Enterprise Research Institute www.deri.ie
Early adopter of Linked Open Data (June „09)
37. The New York Times
Digital Enterprise Research Institute www.deri.ie
Linked Open Data @ data.nytimes.com
Subset of 10,000 tags from index vocabulary
Dataset of people, organizations & locations
– Complemented by search services to consume data
about articles, movies, best sellers, Congress votes,
real estate,…
Benefits
Improves traffic by third party data usage
Lowers development cost of new applications for
different verticals inside the website
– E.g. movies, travel, sports, books
38. Overview
Digital Enterprise Research Institute www.deri.ie
Curation Background
The Business Need for Curated Data
What is Data Curation?
Data Quality and Curation
How to Curate Data
Case Study New York Times
Best Practices from Case Study Learning
39. Best Practices from Case Study
Learning
Digital Enterprise Research Institute www.deri.ie
Social Best Practices
Participation
Engagement
Incentives
Community Governance Models
Technical Best Practices
Data Representation
Human- and AutomatedCuration
Track Provenance
40. Social Best Practices
Digital Enterprise Research Institute www.deri.ie
Participation
Stakeholders involvement for data producers and
consumers must occur early in project
– Provides insight into basic questions of what they want
to do, for whom, and what it will provide
White papers are effective means to present these
ideas, and solicit opinion from community
– Can be used to establish informal „social contract‟ for
community
41. Social Best Practices
Digital Enterprise Research Institute www.deri.ie
Engagement
Outreach activities essential for promotion and
feedback
Typical consumers-to-contributors ratios of less than
5%
Social communication and networking forums are
useful
– Majority of community may not communicate using
these media
– Communication by email still remains important
42. Social Best Practices
Digital Enterprise Research Institute www.deri.ie
Incentives
Sheer curation needs line of sight from data curating
activity, to tangible exploitation benefits
Lack of awareness of value proposition will slow
emergence of collaborative contributions
Recognizing contributing curators through a formal
feedback mechanism
– Reinforces contribution culture
– Directly increases output quality
43. Social Best Practices
Digital Enterprise Research Institute www.deri.ie
Community Governance Models
Effective governance structure is vital to ensure
success of community
Internal communities and consortium perform well
when they leverage traditional corporate and
democratic governance models
Open communities need to engage the community
within the governance process
– Follow less orthodox approaches using meritocratic
and autocratic principles
44. Technical Best Practices
Digital Enterprise Research Institute www.deri.ie
Data Representation
Must be robust and standardized to encourage
community usage and tools development
Support for legacy data formats and ability to
translate data forward to support new technology and
standards
Human & Automated Curation
Balancing will improve data quality
Automated curation should always defer to, and never
override, human curation edits
– Automate validating data deposition and entry
– Target community at focused curation tasks
45. Technical Best Practices
Digital Enterprise Research Institute www.deri.ie
Track Provenance
All curation activities should be recorded and
maintained as part data provenance effort
– Especially where human curators are involved
Users can have different perspectives of provenance
– A scientist may need to evaluate the fine grained
experiment description behind the data
– For a business analyst the ‟brand‟ of data provider can
be sufficient for determining quality
46. Conclusions
Digital Enterprise Research Institute www.deri.ie
Data curation can ensure the quality of data and
its fitness for use
Pre-competitive data can be shared without
conferring a commercial advantage
Pre-competitive data communities
Common curation tasks carried out once in public
domain
Reduces cost, increase quantity and quality
47. Acknowledgements
Digital Enterprise Research Institute www.deri.ie
Collaborators Andre Freitas & Seán O'Riain
Insight from Thought Leaders
Evan Sandhaus (Semantic Technologist), Rob Larson (Vice President Product
Development and Management), and Gregg Fenton (Director Emerging Platforms)
from the New York Times
Krista Thomas (Vice President, Marketing & Communications), Tom Tague
(OpenCalais initiative Lead) from Thomson Reuters
Antony Williams (VP of Strategic Development ) from ChemSpider
Helen Berman (Director), John Westbrook (Product Development) from the Protein
Data Bank
Nick Lynch (Architect with AstraZeneca) from the Pistoia Alliance.
The work presented has been funded by Science
Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-
2).
48. Further Information
Digital Enterprise Research Institute www.deri.ie
The Role of Community-Driven
Data Curation for Enterprises
Edward Curry, Andre Freitas, & Seán O'Riain
In David Wood (ed.),
Linking Enterprise Data Springer, 2010.
Available Free at:
http://3roundstones.com/led_book/led-curry-et-al.html