With increased utilization of data within their operational and strategic processes, enterprises need to ensure data quality and accuracy. Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance. This chapter provides an overview of data curation, discusses the business motivations for curating data and investigates the role of community-based data curation, focusing on internal communities and pre-competitive data collaborations. The chapter is supported by case studies from Wikipedia, The New York Times, Thomson Reuters, Protein Data Bank and ChemSpider upon which best practices for both social and technical aspects of community-driven data curation are described.
E. Curry, A. Freitas, and S. O’Riáin, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25-47.
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataEdward Curry
Data management efforts such as MDM are a popular approach for high quality enterprise data. However, MDM can be heavily centralized and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within collaborative data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk is discusses how collaborative data management can be applied within an enterprise context using platforms such as Amazon Mechanical Turk, Mobile Works, and internal enterprise human computation platforms.
Topics covered include:
- Introduction to Crowdsourcing and Human Computation for Data Management
- Crowds vs. Communities, When to use them and why
- Push vs. Pull methods of crowdsourcing data management
- Setting up and running a collaborative data management process
- Modelling the expertise of communities
Developing an Sustainable IT Capability: Lessons From Intel's JourneyEdward Curry
Intel Corporation set itself a goal to reduce its global-warming greenhouse gas footprint by 20% by 2012 from 2007 levels. Through the use of sustainable IT, the Intel IT organization is recognized as a significant contributor to the company’s sustainability strategy by transforming its IT operations and overall Intel operations. This article describes how Intel has achieved IT sustainability benefits thus far by developing four key capabilities. These capabilities have been incorporated into the Sustainable ICT Capability Maturity Framework (SICT-CMF), a model developed by an industry consortium in which the authors were key participants. The article ends with lessons learned from Intel’s experiences that can be applied by business and IT executives in other enterprises.
Wikipedia (DBpedia): Crowdsourced Data CurationEdward Curry
Wikipedia is an open-source encyclopedia, built collaboratively by a large community of web editors. The success of Wikipedia as one of the most important sources of information available today still challenges existing models of content creation. Despite the fact that the term ‘curation’ is not commonly addressed by Wikipedia’s contributors, the task of digital curation is the central activity of Wikipedia editors, who have the responsibility for information quality standards.
Wikipedia, is already widely used as a collaborative environment inside organizations5.
The investigation of the collaboration dynamics behind Wikipedia highlights important features and good practices which can be applied to different organizations. Our analysis focuses on the curation perspective and covers two important dimensions: social organization and artifacts, tools & processes for cooperative work coordination. These are key enablers that support the creation of high quality information products in Wikipedia’s decentralized environment.
Dealing with Semantic Heterogeneity in Real-Time InformationEdward Curry
Tutorial at the EarthBiAs 2014 Summer School on Dealing with Semantic Heterogeneity in Real-Time Information
Part I: Large Scale Open Environments
Part Ii: Computational Paradigms
Part III: RDF Event Processing
Part IV: Theory of Event Exchange
Part V: Approaches to Semantic Decoupling
Part VI: Example Application: Linked Energy Intelligence
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
Data management efforts such as Master Data Management and Data Curation are a popular approach for high quality enterprise data. However, Data Curation can be heavily centralised and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within big data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk discusses how crowdsourcing data management techniques can be applied within an enterprise context.
Topics covered include:
- Data Quality And Data Curation
- Crowdsourcing
- Case Studies on Crowdsourced Data Curation
- Setting up a Crowdsourced Data Curation Process
- Linked Open Data Example
- Future Research Challenges
Big Data Analytics: A New Business OpportunityEdward Curry
This talk introduces Big Data analytics and how they can be used to deliver value within organisations. The talk will cover the transformational potential of creating data value chains between different sectors. Developing a Big Data analytics capability will be discussed in addition to the challenges facing the emerging data economy.
Towards a BIG Data Public Private PartnershipEdward Curry
Building an industrial community around Big Data in Europe is the priority of the BIG: Big Data Public Private Forum project. In this workshop we will present the work of the project including analysis of foundational Big Data research technologies, technology and strategy roadmaps to enable business to understand the potential of Big Data technologies, and the necessary collaboration and dissemination infrastructure to link technology suppliers, integrators and leading user organizations. BIG is working towards the definition and implementation of a clear strategy that tackles the necessary efforts in terms of Big Data research and innovation, while also providing a major boost for technology adoption and supporting actions for the successful implementation of the Big Data economy.
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataEdward Curry
Data management efforts such as MDM are a popular approach for high quality enterprise data. However, MDM can be heavily centralized and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within collaborative data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk is discusses how collaborative data management can be applied within an enterprise context using platforms such as Amazon Mechanical Turk, Mobile Works, and internal enterprise human computation platforms.
Topics covered include:
- Introduction to Crowdsourcing and Human Computation for Data Management
- Crowds vs. Communities, When to use them and why
- Push vs. Pull methods of crowdsourcing data management
- Setting up and running a collaborative data management process
- Modelling the expertise of communities
Developing an Sustainable IT Capability: Lessons From Intel's JourneyEdward Curry
Intel Corporation set itself a goal to reduce its global-warming greenhouse gas footprint by 20% by 2012 from 2007 levels. Through the use of sustainable IT, the Intel IT organization is recognized as a significant contributor to the company’s sustainability strategy by transforming its IT operations and overall Intel operations. This article describes how Intel has achieved IT sustainability benefits thus far by developing four key capabilities. These capabilities have been incorporated into the Sustainable ICT Capability Maturity Framework (SICT-CMF), a model developed by an industry consortium in which the authors were key participants. The article ends with lessons learned from Intel’s experiences that can be applied by business and IT executives in other enterprises.
Wikipedia (DBpedia): Crowdsourced Data CurationEdward Curry
Wikipedia is an open-source encyclopedia, built collaboratively by a large community of web editors. The success of Wikipedia as one of the most important sources of information available today still challenges existing models of content creation. Despite the fact that the term ‘curation’ is not commonly addressed by Wikipedia’s contributors, the task of digital curation is the central activity of Wikipedia editors, who have the responsibility for information quality standards.
Wikipedia, is already widely used as a collaborative environment inside organizations5.
The investigation of the collaboration dynamics behind Wikipedia highlights important features and good practices which can be applied to different organizations. Our analysis focuses on the curation perspective and covers two important dimensions: social organization and artifacts, tools & processes for cooperative work coordination. These are key enablers that support the creation of high quality information products in Wikipedia’s decentralized environment.
Dealing with Semantic Heterogeneity in Real-Time InformationEdward Curry
Tutorial at the EarthBiAs 2014 Summer School on Dealing with Semantic Heterogeneity in Real-Time Information
Part I: Large Scale Open Environments
Part Ii: Computational Paradigms
Part III: RDF Event Processing
Part IV: Theory of Event Exchange
Part V: Approaches to Semantic Decoupling
Part VI: Example Application: Linked Energy Intelligence
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
Data management efforts such as Master Data Management and Data Curation are a popular approach for high quality enterprise data. However, Data Curation can be heavily centralised and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within big data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk discusses how crowdsourcing data management techniques can be applied within an enterprise context.
Topics covered include:
- Data Quality And Data Curation
- Crowdsourcing
- Case Studies on Crowdsourced Data Curation
- Setting up a Crowdsourced Data Curation Process
- Linked Open Data Example
- Future Research Challenges
Big Data Analytics: A New Business OpportunityEdward Curry
This talk introduces Big Data analytics and how they can be used to deliver value within organisations. The talk will cover the transformational potential of creating data value chains between different sectors. Developing a Big Data analytics capability will be discussed in addition to the challenges facing the emerging data economy.
Towards a BIG Data Public Private PartnershipEdward Curry
Building an industrial community around Big Data in Europe is the priority of the BIG: Big Data Public Private Forum project. In this workshop we will present the work of the project including analysis of foundational Big Data research technologies, technology and strategy roadmaps to enable business to understand the potential of Big Data technologies, and the necessary collaboration and dissemination infrastructure to link technology suppliers, integrators and leading user organizations. BIG is working towards the definition and implementation of a clear strategy that tackles the necessary efforts in terms of Big Data research and innovation, while also providing a major boost for technology adoption and supporting actions for the successful implementation of the Big Data economy.
Machine learning techniques to improve data management and data quality - this presentation by Prof. Christine Legner and Martin Fadler summarizes research conducted in the Competence Center Corporate Data Quality (CC CDQ). It was held on February 13, 2019 at the DSAG Technologietage in Bonn.
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3aXysas
Advanced data science techniques, like machine learning, have proven to be extremely useful to derive valuable insights from your data. Data Science platforms have become more approachable and user friendly. With all the advancements in the technology space, the Data Scientist is still spending most of the time massaging and manipulating the data into a usable data asset. How can we empower the data scientist? How can we make data more accessible, and foster a data sharing culture?
Join us, and we will show you how Data Virtualization can do just that, with an agile and AI/ML laced data management platform. It can empower your organization, foster a data sharing culture, and simplify the life of the data scientist.
Watch this webinar to learn:
- How data virtualization simplifies the life of the data scientist, by overcoming data access and manipulation hurdles.
- How integrated Denodo Data Science notebook provides for a unified environment
- How Denodo uses AI/ML internally to drive the value of the data and expose insights
- How customers have used Data Virtualization in their Data Science initiatives.
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...Camille Mathieu
Metadata Standards and Organizational Resource Allocation: A Case for the Effective Management of Digital Assets (draft) is the draft for my Master's portfolio defense to occur in about 6 months. This presentation summarizes common deficiencies in enterprise content management and links these deficiencies to increases in organizational inefficiency. The standardization of metadata across repositories and across enterprises is advocated as a solution to many content management and information retrieval woes experienced by organizations. Any feedback greatly appreciated!
ABSTRACT: The management of digital intellectual assets has become a crucial governance challenge for many organizations. Investments in metadata standardization would greatly increase an organization’s ability to store, retrieve, and manipulate these assets most effectively. With their reliance on manageable digital assets for resource allocation and internal search skyrocketing, organizations should prioritize the development and implementation of consistent metadata standards.
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...DATAVERSITY
Do you wonder how to process huge amounts of data in short amount of time? If yes, this session is for you! You will learn why Apache Hadoop and Streams is the core framework that enables storing, managing and analyzing of vast amounts of data. You will learn the idea behind Hadoop's famous map-reduce algorithm and why it is at the heart of solutions that process massive amounts of data with flexible workloads and software based scaling. We explore how to go beyond Hadoop with both real-time and batch analytics, usability, and manageability. For practical examples, we will use IBM InfoSphere BigInsights and Streams, which build on top of open source tooling when going beyond basics and scaling up and out is needed.
With many organisations considering getting on the Hadoop bandwagon, this document provides an overview of the planned use cases for Hadoop, an illustration of some of the common technology components, suggestions on when Hadoop is worth considering, some the challenges organisations are experiencing, cost considerations and finally, how an organisation should position for a Big Data initiative. Any organisation considering a Big Data initiative with Hadoop should thoroughly consider each of these areas before embarking on a course of action.
A technical Introduction to Big Data AnalyticsPethuru Raj PhD
This presentation gives the details about the sources for big data, the value of big data, what to do with big data, the platforms, the infrastructures and the architectures for big data analytics
How COVID-19 is Accelerating Digital Transformation in Health and Social Care?NUS-ISS
Without a doubt, COVID-19 has become the unexpected driver for digital transformation. It is accelerating the transformation, especially in the health and social care space, as we are forced to adapt to the new norm brought about by the crisis. Join us as we discuss the trends and what might be the new health and social care landscape in Singapore after 2020.
Analytics 3.0 Measurable business impact from analytics & big dataMicrosoft
Presentación del evento de Harvard Business Review sobre Analítica y Big Data
(15 de Octubre 2013)
"Featuring analytics expert Tom Davenport, author of Competing on Analytics, Analytics at Work, and the just-released Keeping Up with the Quants" 
Analytics: The Real-world Use of Big DataDavid Pittman
UPDATE: Register now to participate in the 2013 survey: http://ibm.com/2013bigdatasurvey IBM’s Institute for Business Value (IBV) and the University of Oxford released their information-rich and insightful report “Analytics: The real-world use of big data.” Based on a survey of over 1000 professionals from 100 countries across 25+ industries, the report provides insights into organizations’ top business objectives, where they are in their big data journey, and how they are advancing their big data efforts. It also provides a pragmatic set of recommendations to organizations as they proceed down the path of big data. For additional information, including links to a podcast with one of the lead researchers and a link to download the full report, visit http://ibm.co/RB14V0
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
This Edureka Big Data tutorial helps you to understand Big Data in detail. This tutorial will be discussing about evolution of Big Data, factors associated with Big Data, different opportunities in Big Data. Further it will discuss about problems associated with Big Data and how Hadoop emerged as a solution. Below are the topics covered in this tutorial:
1) Evolution of Data
2) What is Big Data?
3) Big Data as an Opportunity
4) Problems in Encasing Big Data Opportunity
5) Hadoop as a Solution
6) Hadoop Ecosystem
7) Edureka Big Data & Hadoop Training
Big Data : From HindSight to Insight to ForesightSunil Ranka
When it comes to Analytics and Reporting , There is a fine line between HindSight to Insight to Foresight . With the evolution of BigData technology, there is a need in deriving value out of the larger datasets, not available in the past. Even before we can start using the new shiny technologies, there is a need of understanding what is categorized as reporting or business intelligence or Big Data and Analytics. Based on my experience, people struggle to distinguish between reporting, Analytics, and Business Intelligence.
Challenges Ahead for Converging Financial DataEdward Curry
Consumers of financial information come in many guises from personal investors looking for that value for money share, to government regulators investigating corporate fraud, to business executives seeking competitive advantage over their competition. While the particular analysis performed by each of these information consumers will vary, they all have to deal with the explosion of information available from multiple sources including, SEC filings, corporate press releases, market press coverage, and expert commentary. Recent economic events have begun to bring sharp focus on the activities and actions of financial markets, institutions and not least regulatory authorities. Calls for enhanced scrutiny will bring increased regulation and information transparency While extracting information from individual filings is relatively easy to perform when a machine readable format is utilized (for example, using XBRL, the eXtensible Business Reporting Language), cross comparison of extracted financial information can be problematic as descriptions and accounting terms vary across companies and jurisdictions. Across multiple sources the problem becomes the classical data integration problem where a common data abstraction is necessary before functional data use can begin. Within this paper we discuss the challenges in converging financial data from multiple sources. We concentrate on integrating data from multiple sources in terms of the abstraction, linking, and consolidation activities needed to consolidate data before more sophisticated analysis algorithms can examine the data for the objectives of particular information consumers (for e.g. competitive analysis, regulatory compliance, or investor analysis). We base our discussion on several years researching and deploying data integration systems in both the web and enterprise environments.
E. Curry, A. Harth, and S. O’Riain, “Challenges Ahead for Converging Financial Data,” in Proceedings of the XBRL/W3C Workshop on Improving Access to Financial Data on the Web, 2009.
A Capability Maturity Framework for Sustainable ICTEdward Curry
Researchers estimate that information and communication technology (ICT) is responsible for at least 2 percent of global greenhouse gas (GHG) emissions. Furthermore, in any individual business, ICT is responsible for a much higher percentage of that business's GHG footprint. Yet researchers also estimate that ICT can provide business solutions to reduce its GHG footprint fivefold. However, because the field is new and evolving, few guidelines and best practices are available. To address this issue, a consortium of leading organizations from industry, the nonprofit sector, and academia has developed and tested a framework for systematically assessing and improving SICT capabilities. The Innovation Value Institute (IVI; http://ivi.nuim.ie) consortium used an open-innovation model of collaboration, engaging academia and industry in scholarly work to create the SICT-Capability Maturity Framework (SICT-CMF), which is discussed in this paper.
B. Donnellan, C. Sheridan, and E. Curry, "A Capability Maturity Framework for Sustainable Information and Communication Technology,â" IEEE IT Professional, vol. 13, no. 1, pp. 33-40, Jan. 2011.
Machine learning techniques to improve data management and data quality - this presentation by Prof. Christine Legner and Martin Fadler summarizes research conducted in the Competence Center Corporate Data Quality (CC CDQ). It was held on February 13, 2019 at the DSAG Technologietage in Bonn.
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3aXysas
Advanced data science techniques, like machine learning, have proven to be extremely useful to derive valuable insights from your data. Data Science platforms have become more approachable and user friendly. With all the advancements in the technology space, the Data Scientist is still spending most of the time massaging and manipulating the data into a usable data asset. How can we empower the data scientist? How can we make data more accessible, and foster a data sharing culture?
Join us, and we will show you how Data Virtualization can do just that, with an agile and AI/ML laced data management platform. It can empower your organization, foster a data sharing culture, and simplify the life of the data scientist.
Watch this webinar to learn:
- How data virtualization simplifies the life of the data scientist, by overcoming data access and manipulation hurdles.
- How integrated Denodo Data Science notebook provides for a unified environment
- How Denodo uses AI/ML internally to drive the value of the data and expose insights
- How customers have used Data Virtualization in their Data Science initiatives.
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...Camille Mathieu
Metadata Standards and Organizational Resource Allocation: A Case for the Effective Management of Digital Assets (draft) is the draft for my Master's portfolio defense to occur in about 6 months. This presentation summarizes common deficiencies in enterprise content management and links these deficiencies to increases in organizational inefficiency. The standardization of metadata across repositories and across enterprises is advocated as a solution to many content management and information retrieval woes experienced by organizations. Any feedback greatly appreciated!
ABSTRACT: The management of digital intellectual assets has become a crucial governance challenge for many organizations. Investments in metadata standardization would greatly increase an organization’s ability to store, retrieve, and manipulate these assets most effectively. With their reliance on manageable digital assets for resource allocation and internal search skyrocketing, organizations should prioritize the development and implementation of consistent metadata standards.
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...DATAVERSITY
Do you wonder how to process huge amounts of data in short amount of time? If yes, this session is for you! You will learn why Apache Hadoop and Streams is the core framework that enables storing, managing and analyzing of vast amounts of data. You will learn the idea behind Hadoop's famous map-reduce algorithm and why it is at the heart of solutions that process massive amounts of data with flexible workloads and software based scaling. We explore how to go beyond Hadoop with both real-time and batch analytics, usability, and manageability. For practical examples, we will use IBM InfoSphere BigInsights and Streams, which build on top of open source tooling when going beyond basics and scaling up and out is needed.
With many organisations considering getting on the Hadoop bandwagon, this document provides an overview of the planned use cases for Hadoop, an illustration of some of the common technology components, suggestions on when Hadoop is worth considering, some the challenges organisations are experiencing, cost considerations and finally, how an organisation should position for a Big Data initiative. Any organisation considering a Big Data initiative with Hadoop should thoroughly consider each of these areas before embarking on a course of action.
A technical Introduction to Big Data AnalyticsPethuru Raj PhD
This presentation gives the details about the sources for big data, the value of big data, what to do with big data, the platforms, the infrastructures and the architectures for big data analytics
How COVID-19 is Accelerating Digital Transformation in Health and Social Care?NUS-ISS
Without a doubt, COVID-19 has become the unexpected driver for digital transformation. It is accelerating the transformation, especially in the health and social care space, as we are forced to adapt to the new norm brought about by the crisis. Join us as we discuss the trends and what might be the new health and social care landscape in Singapore after 2020.
Analytics 3.0 Measurable business impact from analytics & big dataMicrosoft
Presentación del evento de Harvard Business Review sobre Analítica y Big Data
(15 de Octubre 2013)
"Featuring analytics expert Tom Davenport, author of Competing on Analytics, Analytics at Work, and the just-released Keeping Up with the Quants" 
Analytics: The Real-world Use of Big DataDavid Pittman
UPDATE: Register now to participate in the 2013 survey: http://ibm.com/2013bigdatasurvey IBM’s Institute for Business Value (IBV) and the University of Oxford released their information-rich and insightful report “Analytics: The real-world use of big data.” Based on a survey of over 1000 professionals from 100 countries across 25+ industries, the report provides insights into organizations’ top business objectives, where they are in their big data journey, and how they are advancing their big data efforts. It also provides a pragmatic set of recommendations to organizations as they proceed down the path of big data. For additional information, including links to a podcast with one of the lead researchers and a link to download the full report, visit http://ibm.co/RB14V0
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
This Edureka Big Data tutorial helps you to understand Big Data in detail. This tutorial will be discussing about evolution of Big Data, factors associated with Big Data, different opportunities in Big Data. Further it will discuss about problems associated with Big Data and how Hadoop emerged as a solution. Below are the topics covered in this tutorial:
1) Evolution of Data
2) What is Big Data?
3) Big Data as an Opportunity
4) Problems in Encasing Big Data Opportunity
5) Hadoop as a Solution
6) Hadoop Ecosystem
7) Edureka Big Data & Hadoop Training
Big Data : From HindSight to Insight to ForesightSunil Ranka
When it comes to Analytics and Reporting , There is a fine line between HindSight to Insight to Foresight . With the evolution of BigData technology, there is a need in deriving value out of the larger datasets, not available in the past. Even before we can start using the new shiny technologies, there is a need of understanding what is categorized as reporting or business intelligence or Big Data and Analytics. Based on my experience, people struggle to distinguish between reporting, Analytics, and Business Intelligence.
Challenges Ahead for Converging Financial DataEdward Curry
Consumers of financial information come in many guises from personal investors looking for that value for money share, to government regulators investigating corporate fraud, to business executives seeking competitive advantage over their competition. While the particular analysis performed by each of these information consumers will vary, they all have to deal with the explosion of information available from multiple sources including, SEC filings, corporate press releases, market press coverage, and expert commentary. Recent economic events have begun to bring sharp focus on the activities and actions of financial markets, institutions and not least regulatory authorities. Calls for enhanced scrutiny will bring increased regulation and information transparency While extracting information from individual filings is relatively easy to perform when a machine readable format is utilized (for example, using XBRL, the eXtensible Business Reporting Language), cross comparison of extracted financial information can be problematic as descriptions and accounting terms vary across companies and jurisdictions. Across multiple sources the problem becomes the classical data integration problem where a common data abstraction is necessary before functional data use can begin. Within this paper we discuss the challenges in converging financial data from multiple sources. We concentrate on integrating data from multiple sources in terms of the abstraction, linking, and consolidation activities needed to consolidate data before more sophisticated analysis algorithms can examine the data for the objectives of particular information consumers (for e.g. competitive analysis, regulatory compliance, or investor analysis). We base our discussion on several years researching and deploying data integration systems in both the web and enterprise environments.
E. Curry, A. Harth, and S. O’Riain, “Challenges Ahead for Converging Financial Data,” in Proceedings of the XBRL/W3C Workshop on Improving Access to Financial Data on the Web, 2009.
A Capability Maturity Framework for Sustainable ICTEdward Curry
Researchers estimate that information and communication technology (ICT) is responsible for at least 2 percent of global greenhouse gas (GHG) emissions. Furthermore, in any individual business, ICT is responsible for a much higher percentage of that business's GHG footprint. Yet researchers also estimate that ICT can provide business solutions to reduce its GHG footprint fivefold. However, because the field is new and evolving, few guidelines and best practices are available. To address this issue, a consortium of leading organizations from industry, the nonprofit sector, and academia has developed and tested a framework for systematically assessing and improving SICT capabilities. The Innovation Value Institute (IVI; http://ivi.nuim.ie) consortium used an open-innovation model of collaboration, engaging academia and industry in scholarly work to create the SICT-Capability Maturity Framework (SICT-CMF), which is discussed in this paper.
B. Donnellan, C. Sheridan, and E. Curry, "A Capability Maturity Framework for Sustainable Information and Communication Technology,â" IEEE IT Professional, vol. 13, no. 1, pp. 33-40, Jan. 2011.
The New York Times is the largest metropolitan and the third largest newspaper in the United States. The Times website, nytimes.com, is ranked as the most
popular newspaper website in the United States and is an important source of advertisement revenue for the company. The NYT has a rich history for curation of its articles and its 100 year old curated repository has ultimately defined its participation as one of the first players in the emergingWeb of Data.
Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance.
E. Curry, A. Freitas, and S. O’Riáin, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25-47.
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
The growing number of datasets published on the Web as linked data brings both opportunities for high data availability and challenges inherent to querying data in a semantically heterogeneous and distributed environment. Approaches used for querying siloed databases fail at Web-scale because users don't have an a priori understanding of all the available datasets. This article investigates the main challenges in constructing a query and search solution for linked data and analyzes existing approaches and trends.
Designing Next Generation Smart City Initiatives:Harnessing Findings And Les...Edward Curry
The proliferation of “Smart Cities” initiatives around the world is part of the strategic response by governments to the challenges and opportunities of increasing urbanization and the rise of cities as the nexus of societal development. As a framework for urban transformation, Smart City initiatives aim to harness Information and Communication Technologies and Knowledge Infrastructures for economic regeneration, social cohesion, better city administration and infrastructure management. However, experiences from earlier Smart City initiatives have revealed several technical, management and governance challenges arising from the inherent nature of a Smart City as a complex “Socio- technical System of Systems”. While these early lessons are informing modest objectives for planned Smart Cities programs, no rigorous developed framework based on careful analysis of existing initiatives is available to guide policymakers, practitioners, and other Smart City stakeholders. In response to this need, this paper presents a “Smart City Initiative Design (SCID) Framework” grounded in the findings from the analysis of ten major Smart Cities programs from Netherlands, Sweden, Malta, United Arab Emirates, Portugal, Singapore, Brazil, South Korea, China and Japan. The findings provide a design space for the objectives, implementation options, strategies, and the enabling institutional and governance mechanisms for Smart City initiatives.
Open Data Innovation in Smart Cities: Challenges and TrendsEdward Curry
Open Data initiatives are increasingly considered as defining elements of emerging smart cities. However, few studies have attempted to provide a better understanding of the nature of this convergence and the impact on both domains. This talk examines the challenges and trends with open data initiatives using a socio-technical perspective of smart cities. The talk presents findings from a detailed study of 18 open data initiatives across five smart cities to identify emerging best practice. Three distinct waves of open data innovation for smart cities are discussed. The talk details the specific impacts of open data innovation on the different smart cities domains, governance of the cities, and the nature of datasets available in the open data ecosystem within smart cities.
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Edward Curry
The complexity, volume and diversity of government policies and regulations raises significant burden on both the complying parties and government itself. On the one hand, businesses, civil organizations and other societal entities are required to simultaneously comply with and interpret different and possibly conflicting or inconsistent regulations. On the other hand, government as a whole must ensure policy and regulatory coherence across its various policy domains. While the recent wave of open government initiatives have led to significantly more public access to these documents, features allowing cross-referencing related documents and linking to less formal documents or comments on other media more understandable and accessible to the public are not common if at all available today. As a solution to this challenge, we propose an Open Government-wide Policy and Regulation Information Space consisting of documents that are “semantically” annotated and cross-linked to other documents in the information space as well as to external resources such as interpretations, comments and blogs on the social web.
Our approach is three-fold. First, we identify the requirements for the infrastructure. Second, we eloborate a Reference Architecture identifying the various elements needed within the infrastructure. Third, we show how such infrastructure may be realised as a linked data portal where policies and regulations are published as linked open data. Finally, we present a case study involving environmental policy and regulations; discuss the potential impact of such infrastructure on coherency and accessibility of policies and regulations and concludes with challenges associated with provisioning a linked open policy and regulatory information infrastructure.
Citizen Actuation For Lightweight Energy ManagementEdward Curry
In this work, we aim to utilise the concept of citizen sensors but also introduce the theory of citizen actuation. Citizen sensors observe, report, and collect data – we propose by supporting these citizen sensors with methods to affect their surroundings we enable them to become citizen actuators. We outline a use case for citizen actuation in the Energy Management domain, propose an architecture (a Cyber-Physical Social System) built on previous work in Energy Management with Twitter integration, use of Complex Event Processing (CEP), and perform an experiment to test this theory. We motivate the need for citizen actuation in Building Management Systems due to the high cost of actuation systems. We define the concept of citizen actuation and outline an experiment that shows a reduction in average energy usage of 24%. The experiment supports the concept of citizen actuation to improve energy usage within the experimental environment and we discuss future research directions in this area.
Interactive Water Services: The Waternomics ApproachEdward Curry
WATERNOMICS focuses on the development of ICT as an enabling technology to manage water as a resource, increase end-user conservation awareness and affect behavioral changes. Unique aspects of WATERNOMICS include personalized feedback about end-user water consumption, the development of systematic and standards-based water resource management systems, new sensor hardware developments, and the introduction of forecasting and fault detection diagnosis to the analysis of water consumption data. These services will be bundled into the WATERNOMICS Water Information Services Platform. This paper presents the overall architectural approach to WATERNOMICS and details the potential interactive services possible based on this novel platform.
An Environmental Chargeback for Data Center and Cloud Computing ConsumersEdward Curry
Government, business, and the general public increasingly agree that the polluter should pay. Carbon dioxide and environmental damage are considered viable chargeable commodities. The net effect of this for data center and cloud computing operators is that they should look to “chargeback” the environmental impacts of their services to the consuming end-users. An environmental chargeback model can have a positive effect on environmental impacts by linking consumers to the indirect impacts of their usage, facilitating clearer understanding of the impact of their actions. In this paper we motivate the need for environmental chargeback mechanisms. The environmental chargeback model is described including requirements, methodology for definition, and environmental impact allocation strategies. The paper details a proof-of-concept within an operational data center together with discussion on experiences gained and future research directions.
Curry, E.; Hasan, S.; White, M.; and Melvin, H. 2012. An Environmental Chargeback for Data Center and Cloud Computing Consumers. In Huusko, J.; de Meer, H.; Klingert, S.; and Somov, A., eds., First International Workshop on Energy-Efficient Data Centers. Madrid, Spain: Springer Berlin / Heidelberg.
Within the operational phase buildings are now producing more data than ever before, from energy usage, utility information, occupancy patterns, weather data, etc. In order to manage a building holistically it is important to use knowledge from across these information sources. However, many barriers exist to their interoperability and there is little interaction between these islands of information. As part of moving building data to the cloud there is a critical need to reflect on the design of cloud-based data services and how they are designed from an interoperability perspective. If new cloud data services are designed in the same manner as traditional building management systems they will suffer from the data interoperability problems. Linked data technology leverages the existing open protocols and W3C standards of the Web architecture for sharing structured data on the web. In this paper we propose the use of linked data as an enabling technology for cloud-based building data services. The objective of linking building data in the cloud is to create an integrated well-connected graph of relevant information for managing a building. This paper describes the fundamentals of the approach and demonstrates the concept within a Small Medium sized Enterprise (SME) with an owner-occupied office building.
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
Cyber-Physical Energy Systems (CPES) exploit the potential of information technology to boost energy efficiency while minimising environmental impacts. CPES can help manage energy more efficiently by providing a functional view of the entire energy system so that energy activities can be understood, changed, and reinvented to better support sustainable practices. CPES can be applied at different scales from Smart Grids and Smart Cities to Smart Enterprises and Smart Buildings. Significant technical challenges exist in terms of information management, leveraging real-time sensor data, coordination of the various stakeholders to optimize energy usage.
In this talk I describe an approach to overcome these challenges by re-using the Web standards to quickly connect the required systems within a CPES. The resulting lightweight architecture leverages Web technologies including Linked Data, the Web of Things, and Social Media. The paper describes the fundamentals of the approach and demonstrates it within an Enterprise Energy Management scenario smart building.
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsEdward Curry
An invited talk to the Galway-Mayo Institute of Technology on the current state of the art in Sustainable IT for energy management, the challenges, and the emerging trends.
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingEdward Curry
Recent advances in web technologies allow people to help solve complex problems by performing online tasks in return for money, learning, or fun. At present, human contribution is limited to the tasks defined on individual crowdsourcing platforms. Furthermore, there is a lack of tools and technologies that support matching of tasks with appropriate users, across multiple systems. A more explicit capture of the semantics of crowdsourcing tasks could enable the design and development of matchmaking services between users and tasks. The paper presents the SLUA ontology that aims to model users and tasks in crowdsourcing systems in terms of the relevant actions, capabilities, and rewards. This model describes different types of human tasks that help in solving complex problems using crowds. The paper provides examples of describing users and tasks in some real world systems, with SLUA ontology.
Towards Unified and Native Enrichment in Event Processing SystemsEdward Curry
Events are encapsulated pieces of information that flow from one event agent to another. In order to process an event, additional information that is external to the event is often needed. This is achieved using a process called event enrichment. Current approaches to event enrichment are external to event processing engines and are handled by specialized agents. Within large-scale environments with high heterogeneity among events, the enrichment process may become difficult to maintain. This paper examines event enrichment in terms of information completeness and presents a unified model for event enrichment that takes place natively within the event processing engine. The paper describes the requirements of event enrichment and highlights its challenges such as finding enrichment sources, retrieval of information items, finding complementary information and its fusion with events. It then details an instantiation of the model using Semantic Web and Linked Data technologies. Enrichment is realised by dynamically guiding a spreading activation algorithm in a Linked Data graph. Multiple spreading activation strategies have been evaluated on a set of Wikipedia events and experimentation shows the viability of the approach.
Approximate Semantic Matching of Heterogeneous EventsEdward Curry
Event-based systems have loose coupling within space, time and synchronization, providing a scalable infrastructure for information exchange and distributed workflows. However, event-based systems are tightly coupled, via event subscriptions and patterns, to the semantics of the underlying event schema and values. The high degree of semantic heterogeneity of events in large and open deployments such as smart cities and the sensor web makes it difficult to develop and maintain event-based systems. In order to address semantic coupling within event-based systems, we propose vocabulary free subscriptions together with the use of approximate semantic matching of events. This paper examines the requirement of event semantic decoupling and discusses approximate semantic event matching and the consequences it implies for event processing systems. We introduce a semantic event matcher and evaluate the suitability of an approximate hybrid matcher based on both thesauri-based and distributional semantics-based similarity and relatedness measures. The matcher is evaluated over show that the approach matches a representation of Wikipedia and Freebase events. Initial evaluations events structured with maximal combined precision-recall F1 score of 75.89% on average in all experiments with a subscription set of 7 subscriptions. The evaluation shows how a hybrid approach to semantic event matching outperforms a single similarity measure approach.
Hasan S, O'Riain S, Curry E. Approximate Semantic Matching of Heterogeneous Events. In: 6th ACM International Conference on Distributed Event-Based Systems (DEBS 2012).
Crowdsourcing Approaches for Smart City Open Data ManagementEdward Curry
A wide-scale bottom-up approach to the creation and management of open data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. This talk explores how to involving a wide community of users in collaborative management of open data activities within a Smart City. The talk discusses how crowdsourcing techniques can be applied within a Smart City context using crowdsourcing and human computation platforms such as Amazon Mechanical Turk, Mobile Works, and Crowd Flower.
Data Analytics Ethics Issues and Questions
Presented at the University of Chicago Booth Big Data & Analytics Roundtable, April 2018
Presenter:
Arnie Aronoff, Ph.D.
Instructor, MScA in Data Analytics
Instructor, School of Social Services Administration
The University of Chicago
Group Concept OD
Organizational Development and Training
(312) 259-4544
aaronoff33@gmail.com
Presented by
Data-driven decision-making is an incredible process that helps data science professionals boost their businesses. Explore DDDM in detail and learn how you can master it in 2024
Data Quality Strategy: A Step-by-Step ApproachFindWhitePapers
Learn about the importance of having a data quality strategy and setting the overall goals. The six factors of data are also explained in detail and how to tie it together for implementation.
An examination of the ethical considerations involved in data analyticsUncodemy
Data analytics can be used for various purposes, including marketing, product development, and customer service. One of the primary benefits of data analytics is that it can help you identify patterns in your data that you might not have been able to see with other methods.
How to Create a Big Data Culture in PharmaChris Waller
A talk presented at the Big Data and Analytics conference in Boston on January 28, 2014. Emphasis on data and information sharing cultures in companies.
MITS Advanced Research TechniquesResearch ProposalStudent’s NaEvonCanales257
MITS Advanced Research Techniques
Research Proposal
Student’s Name
Higher Education Department
Victorian Institute of Technology
Proposed Title: Data Integrity Threats to Organizations
Abstract
Data integrity, an integral aspect of cyber security, is identified as the consistence and accuracy that is assured of data in its life cycle, and is an imperative aspect of implementation, design, and utilization of systems which processes, stores, and retrieves data (Graham, 2017). It is estimated that almost 90 percent of the world’s data was generated in the last two year, and this goes to show the rate at which data is being availed. There are various threats associated with data integrity, for example, security, human, and transfer errors, cyber-attacks and malware just to name a few. The purpose of examination of data integrity in the context of organizations and business is due to the impact that it has on the latter’s operations and eventual success.
Data integrity is important when it comes to the productivity and operations of an organization, because management make decisions based on real-time data that is offered to them. If the data presented to management is inaccurate due to lack of proper data integrity, then the decisions that they make might have an adverse effect on an organization. For example, if data related to last year’s projections and profits in the finance department is altered in any way, then the decision of making plans in relation to an organization’s financial position might be lead to further losses. Organizations ought to prioritize security measures through there various Information Systems departments or by seeking third party cyber security specialties to protect and mitigate against the threats related to data integrity.
Outline of the Proposed Research
What are the threats associated with data integrity and the impact they have on organizational productivity and operations?
Background
Data plays an integral role in today’s business environment especially when most organizations are harnessing the benefits of data to facilitate their decision-making processes. It is through understanding why and how data is important in business that one may also comprehend the importance of ensuring the integrity of this same data is upheld. Most individual think that data security and integrity are one and the same thing, which is not true, as security refers to leaking of information such as intellectual property and healthcare documents, whereas data integrity refers to the process of ensuring whether data is trustworthy to facilitate the decision-making process.
Due to the lack of proper systems and structures to ensure that data integrity is at the helm of an organization’s priorities, management has found it difficult to solely rely on data and analytics to facilitate their decision-making process. What this means is that a significant number of businesses are missing out on the advantages accorded through aspects such ...
My keynote speech at the ISACA IIA Belgium software watch day in October 2014 in Brussels on the value of big data and data analytics for auditors and other assurance professionals
DISCUSSION 15 4All students must review one (1) Group PowerP.docxcuddietheresa
DISCUSSION 15 4
All students must review one (1) Group PowerPoint Presentation from another group and complete the follow activities:
1. First each student (individually) must summarize the content of the PowerPoint of another group in 200 words or more.
2. Additionally each student must present a detailed discussion of what they learned from the presentation they summarized and discuss the ways in which they would you use this information in their current or future profession.
PowerPoint is attached separately
Homework
Create a new product that will serve two business (organizational) markets.
Write a 750-1,000-word paper that describes your product, explains your strategy for entering the markets, and analyzes the potential barriers you may encounter. Explain how you plan to ensure your product will be successful, given your market strategy.
Include an introduction and conclusion that make relevant connections to course objectives.
Prepare this assignment according to the APA guidelines found in the APA Style Guide
Management Information Systems
Campbellsville University
Week 15: PowerPoint Presentation
Topic: Data
Group: E
GROUP MEMBERS FULL NAME
Data
Data can be defined as a specific piece of information or a basic building block of information.
Data is stored in files or in databases.
Data can be presented into tables, graphs or charts, so that legitimate and analytical results can be derived from the gathered information.
An authentic data is very important for the smooth running of any business organizations. It helps IT managers to make effective decisions. Data helps to interpret and enhance overall business processes (Cai & Zhu, 2015).
Uses of Data
The main purpose of data is to keep the records of several activities and situations.
Gathering data helps to better understand the interest of customers which can enhance the sales of organization (Haug & Liempd, 2011).
Relevant data assists in creating strong business strategies.
Use of big data helps to promote service support to the customers. It also helps organizations to find new markets and new business opportunities.
After all, data plays a great role in running the company more effectively and efficiently.
Data Management
Data management is the implementation of policies and procedures that put organizations in control of their business data regardless of where it resides. Data management is concerned with the end-to-end lifecycle of data, from creation to retirement, and the controlled progression of data to and from each stage within its lifecycle (Dunie, M. 2017).
Data Management
Information technology has evolved to deal with the most important data management computer science which helps the computer leads to the advantage of a navigable and transparent communication space.
Large volumes of data can be processed and managed with the help of management systems through the methods of algebra with applications in economic engineering especially in ...
Data governance course - part 1.
Data Governance is the orchestration of people, process and technology
to enable an organization to leverage data as an enterprise asset.
The core objectives of a governance program are:
Guide information management decision-making
Ensure information is consistently defined and well understood
Increase the use and trust of data as an enterprise asset
Objectives of this presentation :
Introduction to data governance
• Why data governance discussion today : the enterprise challenges
Similar to The Role of Community-Driven Data Curation for Enterprises (20)
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Leading Change strategies and insights for effective change management pdf 1.pdf
The Role of Community-Driven Data Curation for Enterprises
1. The Role of Community-Driven Data Curation for Enterprises Edward Curry, Andre Freitas, Seán O'Riain ed.curry@deri.org http://www.deri.org/ http://www.EdwardCurry.org/
2. Speaker Profile Research Scientist at the Digital Enterprise Research Institute (DERI) Leading international web science research organization Researching how web of data is changing way business work and interact with information Projects include studies of enterprise linked data, community-based data curation, semantic data analytics, and semantic search Investigate utilization within the pharmaceutical, oil & gas, financial, advertising, media, manufacturing, health care, ICT, and automotive industries Invited speaker at the 2010 MIT Sloan CIO Symposium to an audience of more than 600 CIOs
4. Acknowledgements Collaborators Andre Freitas & SeánO'Riain Insight from Thought Leaders Evan Sandhaus (Semantic Technologist), Rob Larson (Vice President Product Development and Management), and Gregg Fenton (Director Emerging Platforms) from the New York Times Krista Thomas (Vice President, Marketing & Communications), Tom Tague (OpenCalais initiative Lead) from Thomson Reuters Antony Williams (VP of Strategic Development ) from ChemSpider Helen Berman (Director), John Westbrook (Product Development) from the Protein Data Bank Nick Lynch (Architect with AstraZeneca) from the Pistoia Alliance. The work presented has been funded by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2).
5. Further Information The Role of Community-Driven Data Curation for Enterprises Edward Curry, Andre Freitas, & Seán O'Riain In David Wood (ed.), Linking Enterprise Data Springer, 2010. Available Free at: http://3roundstones.com/led_book/led-curry-et-al.html
6. Overview Curation Background The Business Need for Curated Data What is Data Curation? Data Quality and Curation How to Curate Data Curation Communities and Enterprise Data Case Studies Wikipedia, The New York Times, Thomson Reuters, ChemSpider, Protein Data Bank Best Practices from Case Study Learning
9. Confidence in that informationWorking incomplete inaccurate, or wrong information can have disastrous consequences
10. The Problems with Data Flawed Data Effects 25% of critical data in world’s top companies (Gartner) Data Quality Recent banking crisis (Economist Dec’09) Inaccurate figures made it difficult to manage operations (investments exposure and risk) “asset are defined differently in different programs” “numbers did not always add up” “departments do not trust each other’s figures” “figures … not worth the pixels they were made of”
11. What is Data Curation? DigitalCuration Selection, preservation, maintenance, collection, and archiving of digital assets DataCuration Active management of data over its life-cycle Data Curators Ensure data is trustworthy, discoverable, accessible, reusable, and fit for use Museum cataloguers of the Internet age
12. What is Data Curation? Data Governance Convergence of data quality, data management, business process management, and risk management Data Curation is a complimentary activity Part of overall data governance strategy for organization Data Curator = Data Steward ?? Overlapping terms between communities
13. Data Quality and Curation What is Data Quality? Desirable characteristics for information resource Described as a series of quality dimensions Discoverability, Accessibility, Timeliness, Completeness, Interpretation, Accuracy, Consistency, Provenance & Reputation Data curation can be used to improve these quality dimensions
14. Data Quality and Curation Discoverability & Accessibility Curate to streamline search by storing and classifying in appropriate and consistent manner Accuracy Curate to ensure data correctly represents the “real-world” values it models Consistency Curate to ensure datacreated and maintained using standardized definitions, calculations, terms, and identifiers
15. Data Quality and Curation Provenance & Reputation Curate to track source of data and determine reputation Curate to include the objectivity of the source/producer Is the information unbiased, unprejudiced, and impartial? Or does it come from a reputable but partisan source? Other dimensions discussed in chapter
16. How to Curate Data Data Curation is a large field with sophisticated techniques and processes Sectionprovides high-leveloverview on: Should you curate data? Types of Curation Setting up a curation process Additional detail and references available in book chapter
17. Should You Curate Data? Curation can have multiple motivations Improving accessibility, quality, consistency,… Will the data benefit from curation? Identify business case Determine if potential return support investment Not all enterprise data should be curated Suits knowledge-centric data rather than transactional operations data
18. Types of Data Curation Multiple approaches to curate data, no single correct way Who? Individual Curators Curation Departments Community-based Curation How? Manual Curation (Semi-)Automated Sheer Curation
19. Types of Data Curation – Who? Individual Data Curators Suitable for infrequently changing small quantity of data (<1,000 records) Minimal curation effort (minutes per record)
20. Types of Data Curation – Who? Curation Departments Curation experts working with subject matter experts to curate data within formal process Can deal with large curation effort (000’s of records) Limitations Scalability: Can struggle with large quantities of dynamic data (>million records) Availability: Post-hoc nature creates delay incurated data availability
21. Types of Data Curation - Who? Community-Based Data Curation Decentralized approach to data curation Crowd-sourcing the curation process Leverages community of users to curate data Wisdom of the community (crowd) Can scale to millions of records
22. Types of Data Curation – How? Manual Curation Curators directly manipulate data Can tie users up with low-value add activities (Sem-)Automated Curation Algorithms can (semi-)automate curation activities such as data cleansing, record duplication and classification Can be supervised or approved by human curators
23. Types of Data Curation – How? Sheer curation, or Curation at Source Curation activities integrated in normal workflow of those creating and managing data Can be as simple as vetting or “rating” the results of a curation algorithm Results can be available immediately Blended Approaches: Best of Both Sheer curation +post hoc curation department Allows immediate access to curated data Ensures quality control with expert curation
24. Setting up a Curation Process 5 Steps to setup a curation process: 1 - Identify what data you need to curate 2 - Identify who will curate the data 3 - Define the curation workflow 4 - Identity appropriate data-in & data-out formats 5 - Identify the artifacts, tools, and processes needed to support the curation process
25. Setting up a Curation Process Step 1: Identify what data you need to curate Newly created data and/or legacy data? How is new data created? Do users create the data, or is it imported from an external source? How frequently is new data created/updated? What quantity of data is created? How much legacy data exists? Is it stored within a single source, or scattered across multiple sources?
26. Setting up a Curation Process Step 2: Identify who will curate the data Individuals, depts, groups, institutions,community Step 3: Define the curation workflow What curation activities are required? How will curation activities be carried out? Step 4: Identity suitable data-in & -out formats What is the best format for the data? Right format for receiving and publishing data is critical Support multiple formats to maximum participation
27. Setting up a Curation Process Step 5: Identify the artifacts, tools, and processes needed to support curation Workflow support/Community collaboration platforms Algorithms can (semi-)automate curation activities Major factors that influence approach: Quantity of data to be curated (new and legacy data) Amount of effort required to curate the data Frequency of data change / data dynamics Availability of experts
28. Overview Curation Background The Business Need for Curated Data What is Data Curation? Data Quality and Curation How to Curate Data Curation Communities and Enterprise Data Case Studies Wikipedia, The New York Times, Thomson Reuters, ChemSpider, Protein Data Bank Best Practices from Case Study Learning
29. Community–based Curation Two community approaches: Internal corporate communities External pre-competitive communities To determine the right model consider: What the purpose of the community is? Will resulting curateddataset be publicly available? Or restricted?
30. Community–based Curation Internal Communities Taps potential of workforce to assist data curation Curate competitive enterprise data that will remain internal to the company May not always be the case e.g. product technical support and marketing data Can work in conjunction with curation dept. Community governance typically follows the organization’s internal governance model
32. What is Pre-Competitive Data? Two Types of Enterprise Data Propriety data for competitive advantage Common data with no competitive advantage What is pre-competitive data? Has little potential for differentiation Can be shared without conferring commercial advantage to competitor Common non-competitive data Needs to be maintaining and curated Companies duplicate effort in-house incurring full-cost
33. Pre-competitive Communities External pre-competitive communities Share costs, risks, and technical challenges Common curation tasks carried out once inpublic domain rather than multiple timesin each company Reduces cost required to provide and maintain data Can increase the quantity, quality, and access Focus turns to value-add competitive activity Move “competitive onus” from novel data to novel algorithms, shifting emphasis from “proprietary data” to a “proprietary understanding of data” e.g. Protein Data Bank and Pistoia Alliance in Pharma
34. External Pre-competitive Communities Two popular community models are Organization consortium Open community Organization consortium Operates like a private democratic club Usually closed community, members invited based on skill-set to contribute Output data - public or limited tomembers Consortiums follow a democratic process Member voting rights may reflect level of investment Larger players may be leaders of the consortium
35. External Pre-competitive Communities Open community Everyone can participate “Founder(s)” defines desired curation activity Seek public support to contribute to curation activates Wikipedia, Linux, and Apache are good examples of large open communities
36. Overview Curation Background The Business Need for Curated Data What is Data Curation? Data Quality and Curation How to Curate Data Curation Communities and Enterprise Data Case Studies Wikipedia, The New York Times, Thomson Reuters, ChemSpider, Protein Data Bank Best Practices from Case Study Learning
38. Wikipedia Open-source encyclopedia Collaboratively built by large community Challenges existing models of content creation More than 19,000,000 articles 270+ languages, 3,200,000+ articles in English More than 157,000 active contributors Studies show accuracy and stylistic formality are equivalent to resources developed in expert-based closed communities i.e. Columbia and Britannica encyclopedias
39. Wikipedia MediaWiki Wiki platform behind Wikipedia Widespread and popular technology Wikis can also support data curation Lowers entry barriers for collaborative data curation Widely used inside organizations Intellipedia covering 16 U.S. Intelligence agencies Wiki Proteins,curatedProtein data for knowledge discovery and annotation
40. Wikipedia Decentralized environment supports creation of high quality information with: Social organization Artifacts, tools & processes for cooperative work coordination Wikipedia collaboration dynamics highlightgood practices
41. Wikipedia – Social Organization Any usercan edit its contents Without prior registration Does not lead to a chaotic scenario In practice highly scalable approach for high quality content creation on the Web Relies on simple but highly effective way to coordinate its curation process Curation is activity of Wikipedia admins Responsibility for information quality standards
42. Wikipedia – Social Organization Four main types of accounts: Anonymous users Identified by their associated IP address Registered users Users with an account in the Wikipedia website Administrators/Editors Registered users with additional permissions in the system Access to curation tools Bots Programs that perform repetitive tasks
44. Wikipedia – Social Organization Incentives Improvement of one’s reputation Sense of efficacy Contributing effectively to a meaningful project Over time focus of editors typically change From curators of a few articles in specific topics To more global curation perspective Enforcing quality assessment of Wikipedia as a whole
45. Wikipedia – Artifacts, Tools & Processes Wiki Article Editor (Tool) WYSIWYG or markup text editor Talk Pages (Tool) Public arena for discussions around Wikipedia resources Watchlists (Tool) Helps curators to actively monitor the integrity and quality of resources they contribute Permission Mechanisms (Tool) Users with administrator status can perform critical actions such as remove pages and grant administrative permissions to new users
46. Wikipedia – Artifacts, Tools & Processes Automated Edition (Tool) Bots are automated or semi-automated tools that perform repetitive tasks over content Page History and Restore (Tool) Historical trail of changes to a Wikipedia Resource Guidelines, Policies & Templates (Artifact) Defines curation guidelines for editors to assess article quality Dispute Resolution (Process) Dispute mechanism between editors over the article contents Article Edition, Deletion, Merging, Redirection, Transwiking, Archival (Process) Describe the curation actions over Wikipedia resources
47. Wikipedia - DBPedia DBPedia Knowledge base Inherits massive volume of curated Wikipedia data Built using information info box properties Indirectly uses wiki as data curation platform DBPediaprovides direct access to data 3.4 million entities and 1 billion RDF triples Comprehensive data infrastructure Concept URIs, definitions, and basic types
54. The New York Times Index Department was created in 1913 Curation and cataloguingofNYT resources Since 1851 NYT had low quality index for internal use Developed a comprehensive catalog using a controlled vocabulary Covering subjects, personal names, organizations, geographic locations and titles of creative works (books, movies, etc), linked to articles and their summaries Current Index Dept. has~15 people
55. The New York Times Challenges with consistently and accurately classifying news articles over time Keywords expressing subjects may show some variance due to cultural or legal constraints Identities of some entities, such as organizations and places, changed over time Controlled vocabulary grew to hundreds of thousands of categories Adding complexity to classification process
56. The New York Times Increased importance of Web drove need to improve categorization of online content Curation carried out by Index Department Library-time (days to weeks) Print edition can handle next-day index Not suitable for real-time online publishing nytimes.com needed a same-day index
57. The New York Times Introduced two stage curation process Editorial staff performed best-effort semi-automated sheer curation at point of online pub. Several hundreds journalists Index Department follow up with long-term accurate classification and archiving Benefits: Non-expert journalist curators provide instant accessibility to online users Index Department provides long-term high-quality curation in a “trust but verify” approach
67. The New York Times Early adopter of Linked Open Data (June ‘09)
68. The New York Times Linked Open Data @ data.nytimes.com Subset of 10,000 tagsfrom index vocabulary Dataset of people, organizations & locations Complemented by search services to consume data about articles, movies, best sellers, Congress votes, real estate,… Benefits Improves traffic by third party data usage Lowers development cost of new applications for different verticals inside the website E.g. movies, travel, sports, books
70. Thomson Reuters Thomson Reuters is an information provider Created by acquisition of Reuters by Thomson Over 50,000 employees Commercial presence in 100+ countries Provides specialist curated information and information-based services Selects most relevant information for customers Classifying, enriching and distributing it in a way that can be readily consumed
71. Thomson Reuters Curation process Working over approximately 1000 data sources Automatic tools provide first level triage and classification Refined by intervention of human curators Curator is a domain specialist Employs thousands of curators
72. Thomson Reuters OneCalais platform Reduces workload for classification ofcontent Natural Language Processingonunstructured text Automatically derives tags for analyzed content Enrichment with machine readable structured data Provides description of specific entities (places, people, events, facts) present in the text Open Calais (free version of OneCalais) 20.000+ users,>4 million trans per day CNET, CBS Interactive, The Huffington Post, The Powerhouse Museum of Science and Design,…
73. ChemSpider Structure centric chemical community Over 300 data sources with 25 million records Provided by chemical vendors, government databases, private laboratories and individual Pharmarealizing benefits of open data Heavily leveraged by pharmaceutical companies as pre-competitive resources for experimental and clinical trial investigation Glaxo Smith Kline made its proprietary malaria dataset of 13,500 compounds available
74. Protein Data Bank Dedicated to improving understanding of biological systems functions with 3-D structure of macromolecules Started in 1971 with 3 core members Originally offered 7 crystal structures Grown to 63,000 structures Over 300 million dataset downloads Expanded beyond curated data download service to include complex molecular visualized, search, and analysis capabilities
75. Overview Curation Background The Business Need for Curated Data What is Data Curation? Data Quality and Curation How to Curate Data Curation Communities and Enterprise Data Case Studies Wikipedia, The New York Times, Thomson Reuters, ChemSpider, Protein Data Bank Best Practices from Case Study Learning
76. Best Practices from Case Study Learning Social Best Practices Participation Engagement Incentives Community Governance Models Technical Best Practices Data Representation Human- andAutomatedCuration Track Provenance
77. Social Best Practices Participation Stakeholders involvement fordata producers and consumers must occur early in project Provides insight into basic questions of what they want to do, for whom, and what it will provide White papers are effective means to present these ideas, and solicit opinion from community Can be used to establish informal ‘social contract’ for community
78. Social Best Practices Engagement Outreach activities essential for promotion and feedback Typical consumers-to-contributors ratios of less than 5% Social communication and networking forums are useful Majority of community may not communicate using these media Communication by email still remains important
79. Social Best Practices Incentives Sheer curationneedsline of sight from data curating activity, to tangible exploitation benefits Lack of awareness of value proposition will slow emergence ofcollaborative contributions Recognizing contributing curators through a formal feedback mechanism Reinforces contribution culture Directly increases output quality
80. Social Best Practices Community Governance Models Effective governance structure is vital to ensure success of community Internal communities and consortium perform well when they leverage traditional corporate and democratic governance models Open communities need to engage the community within the governance process Follow less orthodox approaches using meritocratic and autocratic principles
81. Technical Best Practices Data Representation Must be robust and standardized to encourage community usage and tools development Support for legacy data formats and ability to translate data forward to support new technology and standards Human & Automated Curation Balancing will improve data quality Automated curation should always defer to, and never override, human curation edits Automate validating data deposition and entry Target community at focused curation tasks
82. Technical Best Practices Track Provenance All curation activities should be recorded and maintained as part data provenance effort Especially where human curators are involved Users can have different perspectives of provenance A scientist may need to evaluate the fine grained experiment description behind the data For a business analyst the ’brand’ of data provider can be sufficient for determining quality
83. Conclusions Data curation can ensure the quality of data and its fitness for use Pre-competitive data can be shared without conferring a commercial advantage Pre-competitive data communities Common curation tasks carried out once in public domain Reduces cost, increase quantity and quality