This document proposes using web data provenance for automated quality assessment. It defines provenance as information about the origin and processing of data. The goal is to develop methods to automatically assess quality criteria like timeliness. It outlines a general provenance-based assessment approach involving generating a provenance graph, annotating it with impact values representing how provenance elements influence quality, and calculating a quality score with an assessment function. As an example, it shows how the approach could be applied to assess the timeliness of sensor measurements based on their provenance.
The goal of this paper is to explore executive perceptions and opinions about real time data applications and operations. We have interviewed over forty (40) Key Innovation Leaders who have been cited as the most innovative thinkers within the world of analytics and have documented distinctive case studies which have clearly optimized business intelligence.
Introduction to Data Governance
Seminar hosted by Embarcadero technologies, where Christopher Bradley presented a session on Data Governance.
Drivers for Data Governance & Benefits
Data Governance Framework
Organization & Structures
Roles & responsibilities
Policies & Processes
Programme & Implementation
Reporting & Assurance
The enterprise marketer's playbook: Building an integrated data strategy.
An integrated data strategy can help any business see customer journeys more clearly ― and then give customers more relevant ads and experiences that get results. So why doesn't everyone have such a strategy? We look at what sets the marketing leaders apart.
Let marketing data be your guide
If you've ever felt too swamped by data to find the customer insights you need, you're not alone. But there's a new and better approach to gaining deeper audience insights: building an integrated data strategy.
Read this report to learn how:
86% of senior executives agree that eliminating organizational silos is critical to expanding the use of data and analytics in decision-making.
75% of marketers agree that lack of education and training on data and analytics is the biggest barrier to more business decisions being made based on data insights.
Leading marketers are 59% more likely to use digital analytics to optimize the user experience in real time.
The goal of this paper is to explore executive perceptions and opinions about real time data applications and operations. We have interviewed over forty (40) Key Innovation Leaders who have been cited as the most innovative thinkers within the world of analytics and have documented distinctive case studies which have clearly optimized business intelligence.
Introduction to Data Governance
Seminar hosted by Embarcadero technologies, where Christopher Bradley presented a session on Data Governance.
Drivers for Data Governance & Benefits
Data Governance Framework
Organization & Structures
Roles & responsibilities
Policies & Processes
Programme & Implementation
Reporting & Assurance
The enterprise marketer's playbook: Building an integrated data strategy.
An integrated data strategy can help any business see customer journeys more clearly ― and then give customers more relevant ads and experiences that get results. So why doesn't everyone have such a strategy? We look at what sets the marketing leaders apart.
Let marketing data be your guide
If you've ever felt too swamped by data to find the customer insights you need, you're not alone. But there's a new and better approach to gaining deeper audience insights: building an integrated data strategy.
Read this report to learn how:
86% of senior executives agree that eliminating organizational silos is critical to expanding the use of data and analytics in decision-making.
75% of marketers agree that lack of education and training on data and analytics is the biggest barrier to more business decisions being made based on data insights.
Leading marketers are 59% more likely to use digital analytics to optimize the user experience in real time.
Augmented Analytics and Automation in the Age of the Data ScientistWhereScape
At DAMA Day NYC, WhereScape's CTO Neil Barton spoke about the automation of data infrastructure as a necessary component to effectively enable the citizen data scientist and augmented analytics.
Neil also discussed how AI/ML can be used to recommend data ingestion pipelines and models in either supervised or unsupervised paradigms.
From Information to Insight: Data Storytelling for OrganizationsThinking Machines
What kind of stories are best told with data? How do you take raw numbers and turn them into an engaging, meaningful story? Thinking Machines' content strategist Pia Faustino delivered this presentation on the data storytelling process at the "Humans + Machines: Using Artificial Intelligence to Power Your People" conference on February 19, 2016 in Bonifacio Global City, Taguig, Philippines.
User analysis is the process by which we track how users engage and interact with our digital product (software or mobile application) in an attempt to derive business insights for marketing, product & development teams.
These insights are then used by teams across the business to launch a new marketing campaign, decide on features to build for an app, track the success of the app by measuring user engagement and improve the experience altogether while helping the business grow.
You are working with the product team of Instagram and the product manager has asked you to provide insights on the questions asked by the management team.
1.Find the 5 oldest users of the Instagram from the database provided
2. Find the users who have never posted a single photo on Instagram
3.Identify the winner of the contest and provide their details to the team
4. Identify and suggest the top 5 most commonly used hashtags on the platform
5. What day of the week do most users register on? Provide insights on when to schedule an ad campaign
6. Provide how many times does average user posts on Instagram. Also, provide the total number of photos on Instagram/total number of users
7. Provide data on users (bots) who have liked every single photo on the site (since any normal user would not be able to do this).
Baking analytics into the culture of an organization is not always the easiest thing because it doesn't come intuitively to humans. This presentation was given at Kumpul co-working space in Sanur, Bali and it involves a sharing of my team's experience in building a data-driven culture at TradeGecko.
Data Engineering and the Data Science LifecycleAdam Doyle
Everyone wants to be a data scientist. Data modeling is the hottest thing since Tickle Me Elmo. But data scientists don’t work alone. They rely on data engineers to help with data acquisition and data shaping before their model can be developed. They rely on data engineers to deploy their model into production. Once the model is in production, the data engineer’s job isn’t done. The model must be monitored to make sure that it retains its predictive power. And when the model slips, the data engineer and the data scientist need to work together to correct it through retraining or remodeling.
How to Build & Sustain a Data Governance Operating Model DATUM LLC
Learn how to execute a data governance strategy through creation of a successful business case and operating model.
Originally presented to an audience of 400+ at the Master Data Management & Data Governance Summit.
Visit www.datumstrategy.com for more!
Slides zum Impuls-Vortrag "Data Strategy & Governance" - BI or DIE LEVEL UP 2022
Aufzeichnung des Vortrags: https://www.youtube.com/watch?v=705DfyfF5-M
Accenture Strategy surveyed 1,252 business leaders from diverse industries across the world to better understand the degree to which companies are capturing ecosystem opportunities.
Big Data Management: What's New, What's Different, and What You Need To KnowSnapLogic
This presentation is from a recorded webinar with 451 Research analyst and thought leader Matt Aslett for a discussion about the growing importance of the right data management best practices and techniques for delivering on the promise of big data in the enterprise. Matt reviews the big data landscape, how the data lake complements and competes with the data warehouse, and key takeaways as you move from big data test and development environments to production. You can watch the webinar here: http://bit.ly/25ShiQu
Master Data Management – Aligning Data, Process, and GovernanceDATAVERSITY
Master Data Management (MDM) provides organizations with an accurate and comprehensive view of their business-critical data such as customers, products, vendors, and more. While mastering these key data areas can be a complex task, the value of doing so can be tremendous – from real-time operational integration to data warehousing and analytic reporting. This webinar will provide practical strategies for gaining value from your MDM initiative, while at the same time assuring a solid architectural and governance foundation that will ensure long-term, enterprise-wide success.
Data modelling is considered a staple in the world of data management. The skill of the data modeler and their knowledge of the business plays a large role in successful Enterprise Information Management across many organizations. Data modeling requires formal accountability, attention to metadata and getting the business heavily involved in data requirement development. These are all traits of solid Data Governance programs.
Join Bob Seiner and a special guest modeler extraordinaire in this month’s installment of Real-World Data Governance to discuss data modeling as a form of data governance. Learn how to use the skillfulness of the data modeler to advance data-as-an-asset and governance agendas while conveying the importance and value of both disciplines.
In this webinar Bob and a special guest will talk about:
•Data Modeling as Art or Science
•Role of Data Modeler in a Governance Program
•Data Modeler Skills as Governance Skills
•Modeling and Governance Best Practices
•Leveraging the Model as a Governance Artifact
It’s been three years since the General Data Protection Regulation shook up how organizations manage data security and privacy, ushering in a new focus on Data Governance. But what is the state of Data Governance today?
How has it evolved? What’s its role now? Building on prior research, erwin by Quest and ESG have partnered on a new study about what’s driving the practice of Data Governance, program maturity and current challenges. It also examines the connections to data operations and data protection, which is interesting given the fact that improving data security is now the No. 1 driver of Data Governance, according to this year’s survey respondents.
So please join us for this webinar to learn about the:
Other primary drivers for enterprise Data Governance programs
Most common bottlenecks to program maturity and sustainability
Advantages of aligning Data Governance with the other data disciplines
In a post-COVID world, data has the power to be even more transformative, and 84% of business and technology professionals say it represents the best opportunity to develop a competitive advantage during the next 12 to 24 months. Let’s make sure your organization has the intelligence it needs about both data and data systems to empower stakeholders in the front and back office to do what they need to do.
Data Modelling 101 half day workshop presented by Chris Bradley at the Enterprise Data and Business Intelligence conference London on November 3rd 2014.
Chris Bradley is a leading independent information strategist.
Contact chris.bradley@dmadvisors.co.uk
Here's a starting template for anyone presenting data science topic to elementary school students. Exhibits how fun the field is and how the job market for these skills is excellent. Includes hyperlinks to various examples of interesting interactive visualizations.
Data governance with Unity Catalog PresentationKnoldus Inc.
Databricks Unity Catalog is the industry’s first unified governance solution for data and AI on the lakehouse. With Unity Catalog, organizations can seamlessly govern their structured and unstructured data, machine learning models, notebooks, dashboards and files on any cloud or platform. Data scientists, analysts and engineers can use Unity Catalog to securely discover, access and collaborate on trusted data and AI assets, leveraging AI to boost productivity and unlock the full potential of the lakehouse environment. This session will cover the potential of unity catalog to achieve a flexible and scalable governance implementation without sacrificing the ability to manage and share data effectively.
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task. The opportunity in getting it right can be significant, however, as data drives many of the key initiatives in today’s marketplace from digital transformation, to marketing, to customer centricity, population health, and more. This webinar will help de-mystify data strategy and data architecture and will provide concrete, practical ways to get started.
Organizations have been collecting, storing, and accessing data from the beginning of computerization. Insights gained from analyzing the data enable them to identify new opportunities, improve core processes, enable continuous learning and differentiation, remain competitive, and thrive in an increasingly challenging business environment.
The well-established data architecture, consisting of a data warehouse, fed from multiple operational data stores, and fronted by BI tools, has served most organizations well. However, over the last two decades, with the explosion of internet-scale data, and the advent of new approaches to data and computational processing, this tried-and-true data architecture has come under strain, and has created both challenges and opportunities for organizations.
In this green paper, we will discuss modern approaches to data architecture that have evolved to address these challenges and provide a framework for companies to build a data architecture and better adapt to increasing demands of the modern business environment. This discussion of data architecture will be tied to the Data Maturity Journey introduced in EQengineered’s June 2021 green paper on Data Modernization.
Edelman's Social Intelligence Command Center (SICC)Edelman Digital
Edelman's Social Intelligence Command Center or "SICC" is the firm's proprietary system of combining real time analytics with insights, content development, engagement strategies and tactics. SICC's are a combination of people (staffing), process and a variety of tech platforms converged in collaborative physical spaces. This presentation outlines how the space operates in combination with Edelman's distinct SICC approach. For more information please reach out to david(dot)armano(at)edelman(dot)com.
Augmented Analytics and Automation in the Age of the Data ScientistWhereScape
At DAMA Day NYC, WhereScape's CTO Neil Barton spoke about the automation of data infrastructure as a necessary component to effectively enable the citizen data scientist and augmented analytics.
Neil also discussed how AI/ML can be used to recommend data ingestion pipelines and models in either supervised or unsupervised paradigms.
From Information to Insight: Data Storytelling for OrganizationsThinking Machines
What kind of stories are best told with data? How do you take raw numbers and turn them into an engaging, meaningful story? Thinking Machines' content strategist Pia Faustino delivered this presentation on the data storytelling process at the "Humans + Machines: Using Artificial Intelligence to Power Your People" conference on February 19, 2016 in Bonifacio Global City, Taguig, Philippines.
User analysis is the process by which we track how users engage and interact with our digital product (software or mobile application) in an attempt to derive business insights for marketing, product & development teams.
These insights are then used by teams across the business to launch a new marketing campaign, decide on features to build for an app, track the success of the app by measuring user engagement and improve the experience altogether while helping the business grow.
You are working with the product team of Instagram and the product manager has asked you to provide insights on the questions asked by the management team.
1.Find the 5 oldest users of the Instagram from the database provided
2. Find the users who have never posted a single photo on Instagram
3.Identify the winner of the contest and provide their details to the team
4. Identify and suggest the top 5 most commonly used hashtags on the platform
5. What day of the week do most users register on? Provide insights on when to schedule an ad campaign
6. Provide how many times does average user posts on Instagram. Also, provide the total number of photos on Instagram/total number of users
7. Provide data on users (bots) who have liked every single photo on the site (since any normal user would not be able to do this).
Baking analytics into the culture of an organization is not always the easiest thing because it doesn't come intuitively to humans. This presentation was given at Kumpul co-working space in Sanur, Bali and it involves a sharing of my team's experience in building a data-driven culture at TradeGecko.
Data Engineering and the Data Science LifecycleAdam Doyle
Everyone wants to be a data scientist. Data modeling is the hottest thing since Tickle Me Elmo. But data scientists don’t work alone. They rely on data engineers to help with data acquisition and data shaping before their model can be developed. They rely on data engineers to deploy their model into production. Once the model is in production, the data engineer’s job isn’t done. The model must be monitored to make sure that it retains its predictive power. And when the model slips, the data engineer and the data scientist need to work together to correct it through retraining or remodeling.
How to Build & Sustain a Data Governance Operating Model DATUM LLC
Learn how to execute a data governance strategy through creation of a successful business case and operating model.
Originally presented to an audience of 400+ at the Master Data Management & Data Governance Summit.
Visit www.datumstrategy.com for more!
Slides zum Impuls-Vortrag "Data Strategy & Governance" - BI or DIE LEVEL UP 2022
Aufzeichnung des Vortrags: https://www.youtube.com/watch?v=705DfyfF5-M
Accenture Strategy surveyed 1,252 business leaders from diverse industries across the world to better understand the degree to which companies are capturing ecosystem opportunities.
Big Data Management: What's New, What's Different, and What You Need To KnowSnapLogic
This presentation is from a recorded webinar with 451 Research analyst and thought leader Matt Aslett for a discussion about the growing importance of the right data management best practices and techniques for delivering on the promise of big data in the enterprise. Matt reviews the big data landscape, how the data lake complements and competes with the data warehouse, and key takeaways as you move from big data test and development environments to production. You can watch the webinar here: http://bit.ly/25ShiQu
Master Data Management – Aligning Data, Process, and GovernanceDATAVERSITY
Master Data Management (MDM) provides organizations with an accurate and comprehensive view of their business-critical data such as customers, products, vendors, and more. While mastering these key data areas can be a complex task, the value of doing so can be tremendous – from real-time operational integration to data warehousing and analytic reporting. This webinar will provide practical strategies for gaining value from your MDM initiative, while at the same time assuring a solid architectural and governance foundation that will ensure long-term, enterprise-wide success.
Data modelling is considered a staple in the world of data management. The skill of the data modeler and their knowledge of the business plays a large role in successful Enterprise Information Management across many organizations. Data modeling requires formal accountability, attention to metadata and getting the business heavily involved in data requirement development. These are all traits of solid Data Governance programs.
Join Bob Seiner and a special guest modeler extraordinaire in this month’s installment of Real-World Data Governance to discuss data modeling as a form of data governance. Learn how to use the skillfulness of the data modeler to advance data-as-an-asset and governance agendas while conveying the importance and value of both disciplines.
In this webinar Bob and a special guest will talk about:
•Data Modeling as Art or Science
•Role of Data Modeler in a Governance Program
•Data Modeler Skills as Governance Skills
•Modeling and Governance Best Practices
•Leveraging the Model as a Governance Artifact
It’s been three years since the General Data Protection Regulation shook up how organizations manage data security and privacy, ushering in a new focus on Data Governance. But what is the state of Data Governance today?
How has it evolved? What’s its role now? Building on prior research, erwin by Quest and ESG have partnered on a new study about what’s driving the practice of Data Governance, program maturity and current challenges. It also examines the connections to data operations and data protection, which is interesting given the fact that improving data security is now the No. 1 driver of Data Governance, according to this year’s survey respondents.
So please join us for this webinar to learn about the:
Other primary drivers for enterprise Data Governance programs
Most common bottlenecks to program maturity and sustainability
Advantages of aligning Data Governance with the other data disciplines
In a post-COVID world, data has the power to be even more transformative, and 84% of business and technology professionals say it represents the best opportunity to develop a competitive advantage during the next 12 to 24 months. Let’s make sure your organization has the intelligence it needs about both data and data systems to empower stakeholders in the front and back office to do what they need to do.
Data Modelling 101 half day workshop presented by Chris Bradley at the Enterprise Data and Business Intelligence conference London on November 3rd 2014.
Chris Bradley is a leading independent information strategist.
Contact chris.bradley@dmadvisors.co.uk
Here's a starting template for anyone presenting data science topic to elementary school students. Exhibits how fun the field is and how the job market for these skills is excellent. Includes hyperlinks to various examples of interesting interactive visualizations.
Data governance with Unity Catalog PresentationKnoldus Inc.
Databricks Unity Catalog is the industry’s first unified governance solution for data and AI on the lakehouse. With Unity Catalog, organizations can seamlessly govern their structured and unstructured data, machine learning models, notebooks, dashboards and files on any cloud or platform. Data scientists, analysts and engineers can use Unity Catalog to securely discover, access and collaborate on trusted data and AI assets, leveraging AI to boost productivity and unlock the full potential of the lakehouse environment. This session will cover the potential of unity catalog to achieve a flexible and scalable governance implementation without sacrificing the ability to manage and share data effectively.
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task. The opportunity in getting it right can be significant, however, as data drives many of the key initiatives in today’s marketplace from digital transformation, to marketing, to customer centricity, population health, and more. This webinar will help de-mystify data strategy and data architecture and will provide concrete, practical ways to get started.
Organizations have been collecting, storing, and accessing data from the beginning of computerization. Insights gained from analyzing the data enable them to identify new opportunities, improve core processes, enable continuous learning and differentiation, remain competitive, and thrive in an increasingly challenging business environment.
The well-established data architecture, consisting of a data warehouse, fed from multiple operational data stores, and fronted by BI tools, has served most organizations well. However, over the last two decades, with the explosion of internet-scale data, and the advent of new approaches to data and computational processing, this tried-and-true data architecture has come under strain, and has created both challenges and opportunities for organizations.
In this green paper, we will discuss modern approaches to data architecture that have evolved to address these challenges and provide a framework for companies to build a data architecture and better adapt to increasing demands of the modern business environment. This discussion of data architecture will be tied to the Data Maturity Journey introduced in EQengineered’s June 2021 green paper on Data Modernization.
Edelman's Social Intelligence Command Center (SICC)Edelman Digital
Edelman's Social Intelligence Command Center or "SICC" is the firm's proprietary system of combining real time analytics with insights, content development, engagement strategies and tactics. SICC's are a combination of people (staffing), process and a variety of tech platforms converged in collaborative physical spaces. This presentation outlines how the space operates in combination with Edelman's distinct SICC approach. For more information please reach out to david(dot)armano(at)edelman(dot)com.
Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...satyasanket
The tutorial will be of interest to: (a) academic researchers who are incorporating provenance metadata in their research for data quality; (b) developers working on scalable platforms for emerging domain applications, such as IOT, LOD, and healthcare and life sciences. In addition to its meaningful breadth, the tutorial will present key technical topics that have seen significant research. These include, provenance modeling, querying and indexing techniques for W3C RDF datasets for provenance querying, and building complex provenance-enabled healthcare informatics platforms. The tutorial will cover the W3C PROV specifications, which are being used to integrate provenance in information systems, including the PROV Data Model (PROV-DM), PROV Ontology (PROV-O), and the PROV constraints.
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Beniamino Murgante
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Data Quality Interpretation
Erik Borg, Bernd Fichtelmann - German Aerospace Center, German Remote Sensing Data Center
Hartmut Asche - Department of Geography, University of Potsdam
Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality AssessmentUmair ul Hassan
Crowdsourcing has emerged as a powerful paradigm for quality assessment and improvement of Linked Data. A major challenge of employing crowdsourcing, for quality assessment in Linked Data, is the cold-start problem: how to estimate the reliability of crowd workers and assign the most reliable workers to tasks? We address this challenge by
proposing a novel approach for generating test questions from DBpedia based on the topics associated with quality assessment tasks. These test questions are used to estimate the reliability of the new workers. Subsequently, the tasks are dynamically assigned to reliable workers to help improve the accuracy of collected responses. Our proposed approach, ACRyLIQ, is evaluated using workers hired from Amazon Mechanical Turk, on two real-world Linked Data datasets. We validate the proposed approach in terms of accuracy and compare it against the baseline approach of reliability estimate using gold-standard task. The results demonstrate that our proposed approach achieves high accuracy without using gold-standard tasks.
"Methodology for Assessment of Linked Data Quality: A Framework" at Workshop on Linked Data Quality
Paper: https://dl.dropboxusercontent.com/u/2265375/LDQ/ldq2014_submission_3.pdf
Linked Data Quality assessment applied and integrated to the Linked Data generation and publication workflow. Presented at the Data Quality tutorial, satellite event at SEMANTICS2016.
Assessing and Refining Mappings to RDF to Improve Dataset Qualityandimou
RDF dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment into the publishing workflow. Adjustments are manually {but rarely{ applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the RDF dataset will be generated, is not identified. We suggest an incremental, iterative and uniform validation workflow for RDF datasets stemming originally from (semi-)structured data (e.g., CSV, XML, JSON). In this work, we focus on assessing and improving their mappings. We incorporate (i) a test-driven approach for assessing the mappings instead of the RDF dataset itself, as mappings reflect how the dataset will be formed when generated; and (ii) perform semi-automatic mapping refinements based on the results of the quality assessment. The proposed workflow is applied to diverse cases, e.g., large, crowdsourced datasets such as DBpedia, or newly generated, such as iLastic. Our evaluation indicates the eefficiency of our workflow, as it significantly improves the overall quality of an RDF dataset in the observed cases.
METHODS, MATHEMATICAL MODELS, DATA QUALITY ASSESSMENT AND RESULT INTERPRETATI...HTAi Bilbao 2012
METHODS, MATHEMATICAL MODELS, DATA QUALITY ASSESSMENT AND RESULT INTERPRETATION: SOLUTIONS DEVELOPED IN THE IFEDH FRAMEWORK
G. Zauner
dwh Simulation Services
Vienna , Austria
A brief introduction to Data Quality rule development and implementation covering:
- What are Data Quality Rules.
- Examples of Data Quality Rules.
- What are the benefits of rules.
- How can I create my own rules?
- What alternate approaches are there to building my own rules?
The presentation also includes a very brief overview of our Data Quality Rule services. For more information on this please contact us.
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...Mark Wilkinson
This slide deck accompanies the manuscript "Interoperability and FAIRness through a novel combination of Web technologies", submitted to PeerJ Computer Science: https://doi.org/10.7287/peerj.preprints.2522v1
It describes the output of the "Skunkworks" FAIR implementation group, who were tasked with building a prototype infrastructure that would fulfill the FAIR Principles for scholarly data publishing. We show how a novel combination of the Linked Data Platform, RDF Mapping Language (RML) and Triple Pattern Fragments (TPF) can be combined to create a scholarly publishing infrastructure that is markedly interoperable, at both the metadata and the data level.
This slide deck (or something close) will be presented at the Dutch Techcenter for Life Sciences Partners Workshop, November 4, 2016.
Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R
Prov-O-Viz is a visualisation service for provenance graphs expressed using the W3C PROV vocabulary. It uses the Sankey-style visualisation from D3js.
See http://provoviz.org
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...Agile Testing Alliance
The presentation on Bee-Hive approach for Big Data Testing [End to End Continuous Test Automation solution for Big Data] was done during #ATAGTR2017, one of the largest global testing conference. All copyright belongs to the author.
Author and presenter : Usharani Subramanian
Supply chain analytics solutions combine technology and human effort to compare and highlight opportunities in supply chain functions. They leverage enterprise applications, web technologies, and data warehouses to locate patterns among transactional, demographic and behavioral data
Pragmatics Driven Issues in Data and Process Integrity in EnterprisesAmit Sheth
Keynote/Invited Talk
IFIP TC-11 First Working Conference on
Keynote/Invited Talk at the IFIP TC-11 First Working Conference on
Integrity and Internal Control in Information Systems
Zurich, Switzerland, December 4-5, 1997
Open Data, by definition, provides the chance to re-shape and publish heterogeneous pieces and fragments of information which are open, namely anyone is free to use, reuse, and redistribute it. In order for users to fully benefit this idea, Open Data Systems of tomorrow must provide high quality data, relying on real time and ubiquitous services, along with a deep integration with mobile and smart devices and infrastructures.
In this session, we present a syntheses of Whitehall proposal addressed a this vision: is addressed at building Open Data in a fully-fledged Big Data infrastructure, realized using graph based and NoSQL technologies. This idea is shaped in a cultural heritage scenario, where data in envisaged at valorizing one of the main assets of Italy: cultural heritage.
"At the toolbar (menu, whatever) associated with a document there is a button marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to the Web, 'so how do I know I can trust this information?'. The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons."
Tim Berners-Lee, W3C Chair, Web Design Issues, September 1997
Provenance is focused on the description and understanding of where and how data is produced, the actors involved in the production of such data, and the processes by which the data was manipulated and transformed until it arrived to the collection from which it is being accessed. Provenance aims at providing the ability to trace the sources of data, enabling the exploration not just of the relationships between datasets, but also of their authors and affiliations, with the goal of preserving data ownership and establishing a notion of trust based on authenticity and reliability.
The Future Internet poses important challenges for provenance, derived from complex and rich scenarios characterized by the presence of large amounts of data stemming from heterogeneous sources like user communities, services, and things. Such challenges span across technical but also socioeconomic dimensions. The former includes aspects like vocabularies for representing provenance, interoperability and scalability issues, and means to produce, acquire, and reason with provenance in order to provide measures of trust and information quality. However, it is probably in the socieconomic dimension where more significant efforts need to be made as to addressing issues like the role of provenance in the overall picture of the Future Internet, entry barriers preventing the generation of provenance-aware internet content, means required to incentivate the production of such content, and ways to prevent provenance forgery.
In this talk, we provide and overview on provenance and the above mentioned challenges and introduce ongoing work in order to address trust issues from the provenance perspective in the Future Internet. We also link provenance to other relevant aspects for trust discussed in the session, like security, legal frameworks, and economics.
HCLT Brochure: E-Discovery and Document Review SolutionsHCL Technologies
http://www.hcltech.com/search/apachesolr_search/business-services~ More on Business Services
With the number of litigations expected to increase due to the economy, corporations and law firms are increasingly concerned with cost effective high-quality electronic d`iscovery (“e-discovery”) solutions. With 70% of the total cost of a litigation attributed to the document review fees, corporations and law firms must select innovative document review solutions to stay in budget. Simple Solutions’ e-Discovery and Document Review Services provides corporations and law firms with high quality, cost-effective document review services that gives them the cost certainty needed to stay in budget.
e-Discovery companies are leveraging cloud computing and deployment of Software as a Service (SaaS) platforms with focus on back office services to improve legal compliance service levels.
Download our e-Discovery and Document Review Solutions Brochure to understand how HCL focuses on creating efficient and cost effective document review solutions by marrying e-discovery.
Future of test automation tools & infrastructureAnand Bagmar
After being in the IT field for 15+ years of which 11+ years in the software test field, I am sharing my view of the trend in the industry in terms of UI advancements, and, I would like to present a new generation of test automation framework - UDD - UI Driven Development.
Evaluating Big Data Predictive Analytics PlatformsTeradata Aster
Mike Gualtieri, Principal Analyst, Forrester Research, presents at the Big Analytics Roadshow, 2012 in New York City on December 12, 2012
Presentation title: Evaluating Big Data Predictive Analytics Platforms
Abstract: Great. You have Big Data. Now what? You have to analyze it to find game-changing predictive models that you can use to make smart decisions, reduce risk, or deliver breakthrough customer experiences. Big Data Predictive Analytics solutions are software and/or hardware solutions that allow firms to discover, evaluate, optimize, and deploy predictive models by analyzing big data sources. In this session, Forrester Principal Analyst Mike Gualtieri will discuss the key criteria you should use to evaluate Big Data Predictive Analytics platforms to meet your specific needs.
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni
When building your data stack, the architecture could be your biggest challenge. Yet it could also be the best predictor for success. With so many elements to consider and no proven playbook, where do you begin to assemble best practices for a scalable data architecture? Ben Sharma, thought leader and coauthor of Architecting Data Lakes, offers lessons learned from the field to get you started.
Similar to Using Web Data Provenance for Quality Assessment (20)
LDQL: A Query Language for the Web of Linked DataOlaf Hartig
I used this slideset to present our research paper at the 14th Int. Semantic Web Conference (ISWC 2015). Find a preprint of the paper here:
http://olafhartig.de/files/HartigPerez_ISWC2015_Preprint.pdf
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationOlaf Hartig
These are the slides of my invited talk at the 5th Int. Workshop on Usage Analysis and the Web of Data (USEWOD 2015): http://usewod.org/usewod2015.html
The abstract of this talks is given as follows:
To reduce user-perceived response time many interactive Web applications visualize information in a dynamic, incremental manner. Such an incremental presentation can be particularly effective for cases in which the underlying data processing systems are not capable of completely answering the users' information needs instantaneously. An example of such systems are systems that support live querying of the Web of Data, in which case query execution times of several seconds, or even minutes, are an inherent consequence of these systems' ability to guarantee up-to-date results. However, support for an incremental result visualization has not received much attention in existing work on such systems. Therefore, the goal of this talk is to discuss approaches that enable query systems for the Web of Data to return query results incrementally.
An Overview on PROV-AQ: Provenance Access and QueryOlaf Hartig
The slides which I used at the Dagstuhl seminar on Principles of Provenance (Feb.2012) for presenting the main contributions and open issues of the PROV-AQ document created by the W3C provenance working group.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
A tale of scale & speed: How the US Navy is enabling software delivery from l...
Using Web Data Provenance for Quality Assessment
1. Using
Web Data Provenance
for
Quality Assessment
Olaf Hartig*
Jun Zhao˚
*Humboldt-Universität zu Berlin ˚University of Oxford
2. Information Quality (IQ)
● Common definition: fitness for use of information
● Multidimensional concept
Category* Criteria / Dimensions
Intrinsic Accuracy, Believability, Objectivity, ...
Contextual Completeness, Relevance, Timeliness, ...
Representational Conciseness, Understandability, ...
Accessibility Availability, Security, ...
*Classification by Wang and Strong, 1996
● IQ criteria not independent of each other
● Relevancy of criteria determined by task and preferences
Olaf Hartig - Using Web Data Provenance for Quality Assessment 2
3. IQ Assessment
● Assigning numerical values (IQ scores) to IQ criteria
● It is difficult!
● Precision vs. Practicality
Manual methods Semi-automatic methods
● Questionnaires ● Rating-based
● Reputation-based
Olaf Hartig - Using Web Data Provenance for Quality Assessment 3
4. Automated IQ Assessment
● Literature only outlines ideas for automatic methods
● Content analysis
● Comparison (e.g. outlier detection)
● Application of information retrieval methods
● Analysis of results from data cleansing
● Sampling techniques
● Context analysis
● Analysis of metadata
● Utilization of domain knowledge
Olaf Hartig - Using Web Data Provenance for Quality Assessment 4
5. Our Goal:
Methods to automatically assess
IQ criteria of Web data
Primary means:
Provenance of assessed data
Olaf Hartig - Using Web Data Provenance for Quality Assessment 5
6. Outline
1. Web Data Provenance
2. General Assessment Approach
3. Development of Assessment Methods
Olaf Hartig - Using Web Data Provenance for Quality Assessment 6
7. Existing Provenance Research
● Main research areas: (scientific) workflows, DBMSs
● General focus:
data creation
Olaf Hartig - Using Web Data Provenance for Quality Assessment 7
8. Provenance of Web Data
Olaf Hartig - Using Web Data Provenance for Quality Assessment 8
9. Provenance of Web Data
Web data provenance
comprises
two dimensions:
Data Creation • Data Access
Olaf Hartig - Using Web Data Provenance for Quality Assessment 9
10. Model of Web Data Provenance
● Provenance graph describes provenance of a data item
● Nodes: provenance elements – pieces of provenance info
● Edges: relate provenance elements to each other
● Subgraphs for related data items possible
Olaf Hartig - Using Web Data Provenance for Quality Assessment 10
11. Model of Web Data Provenance
● Provenance model defines: Actors
● Types of provenance elements
Executions
● Relationships
Artifacts
Olaf Hartig - Using Web Data Provenance for Quality Assessment 11
12. Data Access Dimension
Data Item
Data Accessor
(Non-Human)
contains
performs retrieved by Document
Execution Time
Data Access
accessed
Data Providing Service
(Non-Human)
controls
uses
Service Provider
Data Publisher
(Human)
Relation to
the provided Information
Resource
Olaf Hartig - Using Web Data Provenance for Quality Assessment 12
13. Data Access Dimension cont.
(Verified)
Artifact
Integrity Verification
Verification Result
{incomplete}
Signer
Signature Verification Relation to
the signed Data
Signature Method
Olaf Hartig - Using Web Data Provenance for Quality Assessment 13
14. Data Creation Dimension
Provenance
Information
Source Data
Execution Time Provenance
Information
Creation Guidelines
Data Creator
Data Creation
(Human or Non-human)
{complete,disjoint}
Data Creating Device
(e.g. Sensor) Data Item
Data Creating Service
(e.g. Software Agent) part of
responsible for responsible for Provenance
Data Creating Entity Information
(e.g. Person, Group, Orga.) (Encompassing)
Data Item
Relation to
the created Data
Olaf Hartig - Using Web Data Provenance for Quality Assessment 14
15. Outline
1. Web Data Provenance
2. General Assessment Approach
3. Development of Assessment Methods
Olaf Hartig - Using Web Data Provenance for Quality Assessment 15
16. A General Approach
● Blueprint for actual assessment methods that
● Address specific scenario
● Focus on specific IQ criterion
● Provenance elements have an influence on IQ
● Impact values represent these influences
● Assessment is affected by knowing about the influences
● Calculation of the IQ score with an assessment function
that combines all impact values
Olaf Hartig - Using Web Data Provenance for Quality Assessment 16
17. General Assessment Procedure
Step 1 – Generate a provenance graph for the data item
Step 2 – Annotate the provenance graph with impact values
Step 3 – Execute the assessment function
Olaf Hartig - Using Web Data Provenance for Quality Assessment 17
18. Outline
1. Web Data Provenance
2. General Assessment Approach
3. Development of Assessment Methods
Olaf Hartig - Using Web Data Provenance for Quality Assessment 18
19. Designing Assessment Methods
● Developing the general approach into an actual method
● Fundamental design question:
For which IQ criterion do we want to apply the method?
Olaf Hartig - Using Web Data Provenance for Quality Assessment 19
20. Designing Assessment Methods
● Developing the general approach into an actual method
● Fundamental design question:
For which IQ criterion do we want to apply the method?
● Timeliness: degree to which the data item is up-to-date
with respect to the task at hand
● Representation* as an absolute measure in [0,1]
● 1 – meeting the most strict timeliness standards
● 0 – unacceptable
*Following Ballou et al., 1998
Olaf Hartig - Using Web Data Provenance for Quality Assessment 20
21. 1 Generate the Provenance Graph
What types of provenance elements are necessary?
What level of detail (i.e. granularity) is necessary?
Where and how do we get provenance information?
● Two complementary options:
● Recording
● Analyzing metadata
Olaf Hartig - Using Web Data Provenance for Quality Assessment 21
22. 1 Generate the Provenance Graph
Example:
● Sensors (e.g. sensor1) hourly take measurement (e.g. msr)
● All msr stored in a Web-accessible storage device (store)
● Our system (sys) accesses them for further processing
● sys assesses the timeliness of all msr
Olaf Hartig - Using Web Data Provenance for Quality Assessment 22
23. 1 Generate the Provenance Graph
Example:
● Sensors (e.g. sensor1) hourly take measurement (e.g. msr)
● All msr stored in a Web-accessible storage device (store)
● Our system (sys) accesses them for further processing
● sys assesses the timeliness of all msr
msr created by performed by sensor1
type: Data Item cExc type: Data Creator
type: Data Creation
contained by Execution Time: 10:00
doc retrieved by store
type: Document type: Data Providing Service
aExc accessed
type: Data Access
sys performed by
type: Data Accessor Execution Time: 10:13
Olaf Hartig - Using Web Data Provenance for Quality Assessment 23
24. 2 Annotation with Impact Values
How might each provenance
element influence the IQ criterion?
● Systematically analyze each type of provenance elements
What kind of impact values are necessary?
How do we represent the influences by impact values?
● Impact values not necessarily numerical
● Depends on the assessment function in step 3
How do we determine impact values?
Olaf Hartig - Using Web Data Provenance for Quality Assessment 24
25. Determining Impact Values
● From the provenance information
● From user input
● Configuration options
● Rating-based, Reputation-based
● By content analysis
● Comparison (e.g. outlier detection)
● Adoption of information retrieval methods
● Adoption of data cleansing techniques
● By context analysis
● Further metadata
● Domain knowledge
Olaf Hartig - Using Web Data Provenance for Quality Assessment 25
26. 2 Annotation with Impact Values
How might each provenance
element influence the IQ criterion?
Data Creation Dimension:
Prov. Element Type Impact Values
Data Creation ● creation time
● weights
Creation Guidelines -
(Source) Data Item ● expiry time
Data Creator -
Olaf Hartig - Using Web Data Provenance for Quality Assessment 26
27. 2 Annotation with Impact Values
msr created by performed by sensor1
type: Data Item cExc type: Data Creator
type: Data Creation
contained by Execution Time: 10:00
doc retrieved by store
type: Document type: Data Providing Service
aExc accessed
type: Data Access
sys performed by
type: Data Accessor Execution Time: 10:13
Prov. Element Type Impact Values
Data Creation ● creation time
● weights
Creation Guidelines -
(Source) Data Item ● expiry time
Data Creator -
Olaf Hartig - Using Web Data Provenance for Quality Assessment 27
28. 2 Annotation with Impact Values
msr created by performed by sensor1
type: Data Item cExc type: Data Creator
type: Data Creation
creation time
contained by 10:00 Execution Time: 10:00
doc retrieved by store
type: Document type: Data Providing Service
aExc accessed
type: Data Access
sys performed by
type: Data Accessor Execution Time: 10:13
Prov. Element Type Impact Values
Data Creation ● creation time
● weights
Creation Guidelines -
(Source) Data Item ● expiry time
Data Creator -
Olaf Hartig - Using Web Data Provenance for Quality Assessment 28
29. 2 Annotation with Impact Values
msr created by performed by sensor1
type: Data Item cExc type: Data Creator
expiry time type: Data Creation
11:00 creation time
contained by 10:00 Execution Time: 10:00
doc retrieved by store
type: Document type: Data Providing Service
aExc accessed
type: Data Access
sys performed by
type: Data Accessor Execution Time: 10:13
Prov. Element Type Impact Values
Data Creation ● creation time
● weights
Creation Guidelines -
(Source) Data Item ● expiry time
Data Creator -
Olaf Hartig - Using Web Data Provenance for Quality Assessment 29
30. 3 Assessment Function
How do we represent the IQ criterion by an IQ score?
What does the assessment function look like?
● Develop the function together with the impact values
● Take incompleteness into consideration
● Provenance graphs could be fragmentary
● Annotations could be missing
Olaf Hartig - Using Web Data Provenance for Quality Assessment 30
31. Step 3 – Assessment Function
Olaf Hartig - Using Web Data Provenance for Quality Assessment 31
32. Step 3 – Assessment Function
msr created by performed by sensor1
type: Data Item cExc type: Data Creator
expiry time type: Data Creation
11:00 creation time
contained by 10:00 Execution Time: 10:00
doc retrieved by store
type: Document type: Data Providing Service
aExc accessed
type: Data Access
sys performed by
type: Data Accessor Execution Time: 10:13
Olaf Hartig - Using Web Data Provenance for Quality Assessment 32
33. Step 3 – Assessment Function
msr created by performed by sensor1
type: Data Item cExc type: Data Creator
expiry time type: Data Creation
11:00 creation time
contained by 10:00 Execution Time: 10:00
doc retrieved by store
type: Document type: Data Providing Service
aExc accessed
type: Data Access
sys performed by
type: Data Accessor Execution Time: 10:13
Olaf Hartig - Using Web Data Provenance for Quality Assessment 33
34. Step 3 – Assessment Function
t(msr) = 1 – (10:15 – 10:00) / (11:00 – 10:00)
=1– 0.25h / 1h
= 0.75
msr created by performed by sensor1
type: Data Item cExc type: Data Creator
expiry time type: Data Creation
11:00 creation time
contained by 10:00 Execution Time: 10:00
doc retrieved by store
type: Document type: Data Providing Service
aExc accessed
type: Data Access
sys performed by
type: Data Accessor Execution Time: 10:13
Olaf Hartig - Using Web Data Provenance for Quality Assessment 34
35. Conclusion
● Web Data Provenance (data creation + data access)
● General approach for provenance-based IQ assessment
● Impact values: influence of provenance elements on IQ
● Design decisions for actual assessment methods
● Application to timeliness (more in the paper)
● Future work:
● How do we deal with incompleteness?
● Application of the approach to other IQ criteria
Olaf Hartig - Using Web Data Provenance for Quality Assessment 35
36. These slides have been created by
Olaf Hartig
http://olafhartig.de
This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)
Attribution:
● http://www.flickr.com/photos/rrrrred/3809362767/
● http://www.hasslefreeclipart.com
Olaf Hartig - Using Web Data Provenance for Quality Assessment 36