Your SlideShare is downloading. ×
0
Data Quality Principles in the Semantic WebAhmad Assaf and Aline SenartSAP Research, Real-Time Intelligence Program, SAP R...
Agenda•     Problem definition•     Related work•     Context•     Our proposal•     Conclusion and future work•     Refer...
Problem DefinitionThe Web of DataHttp://lod-cloud.net© 2011 SAP AG. All rights reserved.   Public   3
Problem DefinitionData quality•   Data quality involves data management, modeling, analysis, storage and    presentation [...
Related work•   Some projects have proposed solutions to identify good data sources simplifying    greatly the task of fin...
Context Linked Data principles• Make the data available on the web: assign URIs to identify things• Make the data machine ...
Our Proposal                   Data Quality Principle                              Attribute                              ...
Our Proposal Quality of data sources                   Data Quality Principle                              Attribute      ...
Our Proposal Quality of data sourcesAuthority & Sustainability: Is the data source provider a known credible source oris h...
Our Proposal Quality of raw data                   Data Quality Principle                              Attribute          ...
Our Proposal Quality of raw dataAccuracy: Are the nodes referring to factually and lexically correct information?        R...
Our Proposal Quality of raw dataComprehensibility: Are the data concepts understandable to humans? Do theyconvey logical m...
Our Proposal Quality of raw dataVersatility: Can the data provided be presented using alternative representations?This can...
Our Proposal Quality of the semantic conversion                   Data Quality Principle                              Attr...
Our Proposal Quality of the semantic conversionSemantic conversion is the process of transforming “normal” raw data into “...
Our Proposal Quality of the linking process                   Data Quality Principle                              Attribut...
Our Proposal Quality of the linking processConnectedness: Is the combination of datasets done at the correct resources?Iso...
Our Proposal Global qualityThese principles are applicable to all aspects of a Semantic System (data source,raw data, link...
Conclusion and future work• Data quality in the Semantic Web is an important field to focus on• We presented five main cla...
Thank You!Contact information:Ahmad AssafSAP Research, FranceAhmad.assaf@sap.com+33770198946
References[1] Chapman, Arthur D. 2005. Principles of Data Quality. Copenhagen. : Report for the Global Biodiversity Inform...
Upcoming SlideShare
Loading in...5
×

Data quality principles in the semantic web

410

Published on

Published in: Technology
1 Comment
0 Likes
Statistics
Notes
  • Nice concept, data quality in the often bloated semantic web is really important. I am still wondering though, how easy it will be to evaluate varying datasets with these parameters... wouldn't it be a rather tedious task?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total Views
410
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide
  • Hello, My name is AhmadAssaf from SAP research France, and I am about to present the work I did with my colleague Aline Senart on Data Quality Principles in the Semantic Web. I would like to thank the organizing committee that gave me the opportunity to remotely present this paper as some unforeseen circumstances prevented me from attending the conference in person.
  • I will start my presentation by discussing the problems around data quality in the semantic web, then I will list some of the relevant work in that area. Afterwards, I will discuss some related context to our problem and then present our proposal as a set of questions and examples that we gathered throughout our experience with Data Integration in SAP. In the end, I will conclude and discuss some of the future work we planned.
  • With the rise of semantic web in recent years data became widely available on the web and the development and application of ontologies have been gaining big momentum in a range of application domains. As the Linked Open Data cloud diagram shows, The Web of Linked Data is growing rapidly and contains data from a wide range of different domains, including life science data, geographic data, government data, library and media data, as well as cross-domain data sets such as DBpedia or Freebase. One of the core concepts of semantic web is to that anyone today has the ability to publish and retrieve information to be consumed or integrated into their applications, this resulted in new challenges such as havingheterogeneous sources with diverse data qualities For example; some values may be out of date, incomplete or incorrect or we can have some data sources that mayprovide conflicting values for the same properties.On the other hand, this huge amount of information is of great value especially for organizations and companies that can enrich their internal datasets with data from sources like social media feeds, sensor or governmental data … etc. Analyzing this new type of data within the context of existing enterprise data should bring new or more accurate business insights and allow better recognition of sales and market opportunities. But How easy it is actually to find, trust and integrate the USEFUL information ?!
  • So, how to properly integrate external data with internal ones? A certain level of quality is clearly necessary. Data Quality involves data management, modeling, analysis storage and presentation since it ensures that the data is fit to be combined and used to infer better business decisions. Quality is important in every data driven application; but it is a subjective measure, as the saying goes “beauty is in the eye of the beholder”. Quality is mainly realized when the data is used, it can be defined as fit for purpose of use !! But data can have multiple uses, and we want a generic framework to assess its quality. Several studies have found out that most data quality problems are in fact “data misinterpretations” or problems in the data semantics, for example if I am querying for a revenue per year value, different sources might have different presentations for the term “year”, it can be a fiscal or calendar year or even the last 12 months. By now we identified that there is a need for new Data quality principles in terms of measures and classes.
  • Now we will talk a little bit about some of the related work; several projects have proposed solutions to identify good data sources simplifying greatly the task of finding and consuming high-quality data, for example some have applied linkage ranking by analyzing incoming and outgoing links to sort out resources, however, these approaches ignore the fact that real-world entity can evolve over time and can fall short for temporal data. For example, a person can change her phone number and address and so facts that describe the same real-world entity at different times can contain different values.Sieve is another example of a framework that tried to express quality assessment and fusion methods. A very interesting and closely related attempt was made by MediaWikias they proposed a quality Criteria for Linked Data sources. Though this classification is good, some criteria on the quality of the used ontologies and the links between data and ontology concepts are missing.
  • Now, we will talk a little bit about some related context. as you know The main goal behind using Linked Data is to easily enable knowledge sharing and publishing. The basic assumption is that the usefulness of Linked data will increase if it is more interlinked with other data; Tim Berners-Lee defined 4 keys principles for publishing Linked Data, they are [11]:Make the data available on the web: assign URIs to identify things.Make the data machine readable: use HTTP URIs so that looking up these names is easy.Use publishing standards: when the lookup is done provide useful information using standards like RDF.Link your data: include links to other resources to enable users to discover more things.
  • Building on the previous principles and based on our experience with powerful data integration software to extract, transform, and load data from applications, databases and other data sources, we have derived five principles for data quality in the Semantic Web Quality of data source: This principle is related to the availability of the data and the credibility of the data source.Quality of raw data: This principle is mainly related to the absence of duplicates, entry mistakes, and noise in the data.Quality of the semantic conversion: This principle is related to the transformation of raw data into rich data by using vocabularies.Quality of the linking process: This principle is related to the quality of links between two datasets.Global quality: This principle is cross-cutting the other principles and covers the source, raw data, semantic conversion, reasoning and links quality.
  • In the rest of the presentation, I will describe each of these principles. Lets start with the quality of data sources, they span mainly across 5 different attributes which are Accessibility, Authority & Sustainability,License, Trustworthiness & verifiability and Performance
  • So lets say that an organization wants to embed some external data, what are the challenges that it can face during the process with regards to the data source quality. ?At First, Will it be easy for the company to find relevant useful information, lets say that the company have found several resources that can serve its needs, can it simply trust all of them ?! What if the information needed is not available in any of the well known repositories like Freebase or DBpedia ? Another aspect that matters a lot to companies is how simple it is to directly use the data in terms of its license, is it friendly to use or a commercial license has to be purchased ?! This actually can be cumbersome as legal checks have to made to ensure good compliance with the company`s policies. Now, lets assume that the company managed to find a good, relevant trust worthy resource, will it be easy to access its data ? For example, does the data source provide access points for the data, whether they are SPARQL online endpoints or via a REST API or even someDownloadable dumps. Are the protocols and methods properly defined, for example if I am accessing the source via a REST API, is it well documented ?! In addition to all of this, the performance and throughput of the data source is another aspect especially when dealing with it directly and not through a downloadable dump. For example, how many requests can the API server daily, for example Freebase allows 100,000 queries per user a day, with a maximum of 10 queries a second. All of these points are vital to ensure that a data source can be qualified as a good quality one.================================================A de-referenceable Uniform Resource Identifier or de-referenceable URI is a resource retrieval mechanism that uses any of the internet protocols (e.g. HTTP) to obtain a copy or representation of the resource it identifies.
  • Now we will come to the Quality of raw data: This principle is mainly related to the absence of duplicates, entry mistakes, and noise in the data, it contains 7 main attributes that are: Accuracy, Comprehensibility, Completeness, Typing, Provenance, Versatility and Traceability
  • Raw data accuracy can be achieved by ensuring that nodes refer to factually and lexically correct information, this can be achieved by checking if the data is described using accurate non duplicated labels. In addition to this, raw data should be clean and up to date and formatted in a consistent way (i.e., are the dates all formatted yyyy/mm/dd)? Tools such as Google Refine [12] or Data Wrangler [13] provide already a good answer to these issues by allowing the cleaning of complex data sets.Moreover, data`s consistency should be ensured by validating the underlying vocabulary and syntax of the document with other resources, for example checks should be made to guarantee that there is no definition of entities as being members of disjoint classes, confirm valid usage of inverse-functional properties, there is no redefinition of existing properties, ensure usage of homogeneous data types and that there is no stating of inconsistent values for properties. For example, is the population of Europe the same as the sum of the population of the European countries?
  • To continue with the other attributes, data is comprehensible by providing human-readable labeling of classes, properties and entities by providing an rdfs:label and human-readable description of them by providing an rdfs:comment. Moreover, by giving indications of metadata about a dataset,one or more exemplary URIs,an exemplary SPARQL query, and of some of the vocabularies used. In addition to comprehensibility, completeness is a measure to check if there is all the data needed to represent information about a real world entity, Moreover, is the data related or linked to this set complete as well, e.g., all European countries, all French cities, all street addresses, all postal codes and so on ?!Having data properly typed as concepts from vocabularies allow users to step further into analyzing the data and reach more informed business decisions, so typing the data is an important quality factor of linked data.Moreover, provenance in the Semantic Web is considered as one of the most important indicators of "quality."  Data sets can be used or rejected depending on the availability of sufficient and/or relevant metadata attached, there are lots of important that can be asked on the data available, for example: Where did the data originate from? Why was the data received?  Who is the data describing? Who is the author of this data … and more other questions
  • Ensuring data versatility is important to ensure a good quality wide spread data. For example, provision of the data in various formats,provision of the data in various languages,human and machine -readable indications of a SPARQL-endpoint.To handle various representations, a data source can enable content negotiation. Thereby a consumer can specify accepted formats and languages by adding a corresponding accept-header to an HTTP request.Moreover, Raw data and its attached information like queries, formulas should be also traceable especially if it is a result of a merge process.
  • Now we will talk about the Quality of the semantic conversion: This principle is related to the transformation of raw data into rich data by using vocabularies and consists of three attributes that are: Correctness, granularity and consistency
  • Semantic conversion is the process of transforming “normal” raw data into “rich” data, so we have as our input a tabular data for example, and we want to have an RDF representation of that data using some vocabulary. The use of high quality vocabularies and the efficiency of data discovery process are major factors in increasing the quality of data. Moreover, one of the most important aspect that affects the quality of the semantic conversion is the quality and suitability of its data model with the intended usage, amodel that does not fit your usage could lead to computationally expensive workarounds.Data structure should be properly modeled and applied correctly to all the concepts, so generally any two URI references connected by owl:sameAs should be the same thing. In addition to that, the model should capture enough information to present all the expected data. Finally, the conversion should be done correctly and consistently, an example taken from Dbpedia is that a resource can state different minimal temperatures of a planet ,the respective property being defined as having Kelvin values. These values result from an erroneous conversion of the Kelvin and Celsius values extracted from wikipedia.
  • Now we will talk about the quality of the linking process, This principle is related to the quality of links between two datasets and consists of three main attributes that are: connectedness, isomorphism and directionality
  • connectedness is a measure to ensure that the merge or combination happened at the correct points, Frameworks like Silk [21] ease the linking process but don’t tackle per se the quality of the links that are generated. The quality depends on the link generation configuration. The quality is however improved if your data is linked to some reference dataset. Many Linked Data applications have come to rely on owl:sameAs for linking datasets. However, the current semantics for owl:sameAs assert that identity entails isomorphism, or that if a=b, then all statements of a and b are shared by both. This becomes problematic when dealing with provenance, context, and imperfect representations, all of which are endemic issues in Linked Data. Ensuring that the combined sets are modeled in a compatible way and reconciled is an important factor. In addition to this, the graph representation of the resources after the linkage should be consistent in what we call the directionality property.
  • Now we will talk about some global quality measures that cross cut all the previous classes ( the data source, raw data, links … etc. ). An important factor that should be ensured is the timeliness of that class, for example ensuring that the data source contain that latest up to date raw data, and the inbound and outbound to this data are also updated. In the source specification, is the update and validation frequency stated so that updates are made automatically ? For example, in an astronomical data source, the discovery of an additional moon in the solar system doesn't necessarily influence the correctness of the present data but failing to update changed data however does, (e. g. the redefinition of Pluto now being a dwarf planet resulting in the nullification of the fact of Pluto being a planet).Another aspect is the history and changes record, in order to keep track of who changed the data and when.Finally, the freshness of each class is another important aspect, for example the ability to replicate the remote repository into local triple stores ( via a local dump) and maintain the timeliness of the replica in all of its aspects. Depending on the amount of changed data, it may be important to have data dumps with deltas to previous major releases. For example monthly dump of full dataset and daily deltas. For smaller change sets the use of feeds to announce changes can be very fitting.
  • To Conclude, we have identified the data quality in the semantic web is of increasing importance, hence we have proposed five main classes of data quality principles to be applied in the semantic context. For each class, we listed the specific criteria that represent the quality of a data source on the Web. We believe that following these principles will lead to higher quality Semantic Web, which will result in better data usage and mash-ups thus more informed decisions. We have applied these principles in several scenarios and use cases in some of the systems in SAP, due to the lack of space we did not present our preliminary results in this paper; however we plan to apply them in a broader set of use cases and scenarios in order to have better evaluation. We would also like to provide tools that can formally prove that these principles are met.
  • Transcript of "Data quality principles in the semantic web"

    1. 1. Data Quality Principles in the Semantic WebAhmad Assaf and Aline SenartSAP Research, Real-Time Intelligence Program, SAP Research France SAS1st International Workshop on Data Quality Management and Semantic Technologies(DQMST)September 21, 2012 Public
    2. 2. Agenda• Problem definition• Related work• Context• Our proposal• Conclusion and future work• References© 2011 SAP AG. All rights reserved. Public 2
    3. 3. Problem DefinitionThe Web of DataHttp://lod-cloud.net© 2011 SAP AG. All rights reserved. Public 3
    4. 4. Problem DefinitionData quality• Data quality involves data management, modeling, analysis, storage and presentation [1]• It is an important issue for data driven applications which should be deeply investigated and understood in order to ensure the data is fit to be combined and used to infer better business decisions• Data quality is subjective and cannot be assessed easily, the actual value of data is mainly realized when it is used [2]• Studies found out that most data quality problems are in fact “data misinterpretations” or problems with the data semantics [3] With the rise of Semantic Web, new data quality principles should be identified© 2011 SAP AG. All rights reserved. Public 4
    5. 5. Related work• Some projects have proposed solutions to identify good data sources simplifying greatly the task of finding and consuming high-quality data: • In [7][8] a resource is ranked by the quality of the incoming and outgoing links • “Sieve” [9] is a framework that tries to express quality assessment methods as well as fusion methods• An initial attempt to identify quality criteria for Linked Data sources can be found in MediaWiki [10]. Though this classification is good, some criteria on the quality of the used ontologies and the links between data and ontology concepts are missing© 2011 SAP AG. All rights reserved. Public 5
    6. 6. Context Linked Data principles• Make the data available on the web: assign URIs to identify things• Make the data machine readable: use HTTP URIs so that looking up these names is easy• Use publishing standards: when the lookup is done provide useful information using standards like RDF• Link your data: include links to other resources to enable users to discover more things  By following these guidelines, a certain level of uniformity is achieved, which increases the usability of data© 2011 SAP AG. All rights reserved. Public 6
    7. 7. Our Proposal Data Quality Principle Attribute Accessibility Authority & Sustainability Quality of Data Sources License Trustworthiness & verifiability Performance Referential correspondence Accuracy Cleanness Consistency Quality of raw data Comprehensibility Completeness Typing Provenance Versatility Traceability Correctness Quality of the semantic conversion Granularity Consistency Connectedness Quality of the linking process Isomorphism Directionality© 2011 SAP AG. All rights reserved. Public 7
    8. 8. Our Proposal Quality of data sources Data Quality Principle Attribute Accessibility Authority & Sustainability Quality of Data Sources License Trustworthiness & verifiability Performance Referential correspondence Accuracy Cleanness Consistency Quality of raw data Comprehensibility Completeness Typing Provenance Versatility Traceability Correctness Quality of the semantic conversion Granularity Consistency Connectedness Quality of the linking process Isomorphism Directionality© 2011 SAP AG. All rights reserved. Public 8
    9. 9. Our Proposal Quality of data sourcesAuthority & Sustainability: Is the data source provider a known credible source oris he sponsored by well-known associations and providers? Are there credible basisfor believing the data source will be maintained and available in the future?License: Is the data source license clearly defined?Trustworthiness & Verifiability: Can the data consumer examine the correctnessand accuracy of the data source? The consumer should also be sure that the datahe receives is the same data he has vouched for and from the same resourceAccessibility: Do access methods and protocols perform properly? Are all theURIs de-referenceable? Do the in-going and out-going links operate correctly?Performance: Is the data source capable of coping with increasing requests in lowlatency response time and high throughput?© 2011 SAP AG. All rights reserved. Public 9
    10. 10. Our Proposal Quality of raw data Data Quality Principle Attribute Accessibility Authority & Sustainability Quality of Data Sources License Trustworthiness & verifiability Performance Referential correspondence Accuracy Cleanness Consistency Quality of raw data Comprehensibility Completeness Typing Provenance Versatility Traceability Correctness Quality of the semantic conversion Granularity Consistency Connectedness Quality of the linking process Isomorphism Directionality© 2011 SAP AG. All rights reserved. Public 10
    11. 11. Our Proposal Quality of raw dataAccuracy: Are the nodes referring to factually and lexically correct information? Referential correspondence: Is the data described using accurate labels without duplications? The goal is to have one-to-one references between data and real world. Cleanness: Is the data clean and not polluted with irrelevant or outdated data? Are there duplicates? Consistency: does the data contradict itself? For example, is the population of Europe the same as the sum of the population of the European countries?© 2011 SAP AG. All rights reserved. Public 11
    12. 12. Our Proposal Quality of raw dataComprehensibility: Are the data concepts understandable to humans? Do theyconvey logical meaning of the described entity and allow easy consumption andutilization of the data?Completeness: Do we have all the data needed to represent all the informationrelated to a real world entity?Typing: Is the data properly typed as a concept from a vocabulary or just as astring literal? Having the data properly typed allows users to go a step further in thebusiness analysis and decision process.Provenance: provenance in the Semantic Web is considered as one of the mostimportant indicators of "quality." Data sets can be used or rejected depending onthe availability of sufficient and/or relevant metadata attached.© 2011 SAP AG. All rights reserved. Public 12
    13. 13. Our Proposal Quality of raw dataVersatility: Can the data provided be presented using alternative representations?This can be achieved by conversion into various formats or if the data sourceenables content negotiation.Traceability: Are all the elements of my data traceable (including data itself but alsoqueries, formulae)? Can I know from what data sources they come?© 2011 SAP AG. All rights reserved. Public 13
    14. 14. Our Proposal Quality of the semantic conversion Data Quality Principle Attribute Accessibility Authority & Sustainability Quality of Data Sources License Trustworthiness & verifiability Performance Referential correspondence Accuracy Cleanness Consistency Quality of raw data Comprehensibility Completeness Typing Provenance Versatility Traceability Correctness Quality of the semantic conversion Granularity Consistency Connectedness Quality of the linking process Isomorphism Directionality© 2011 SAP AG. All rights reserved. Public 14
    15. 15. Our Proposal Quality of the semantic conversionSemantic conversion is the process of transforming “normal” raw data into “rich”data,• input: [tabular data]  output: [RDF using x Vocabulary].Correctness: Is the data structure properly modeled and presented for futureconversion?Granularity: does the model capture enough information to be useful? Are all theexpected data present?Consistency: Has the conversion been done correctly ?an example taken from DBpedia is that a resource can state different minimal temperatures ofa planet ,the respective property being defined as having Kelvin values. These values resultfrom an erroneous conversion of the Kelvin and Celsius values extracted from Wikipedia© 2011 SAP AG. All rights reserved. Public 15
    16. 16. Our Proposal Quality of the linking process Data Quality Principle Attribute Accessibility Authority & Sustainability Quality of Data Sources License Trustworthiness & verifiability Performance Referential correspondence Accuracy Cleanness Consistency Quality of raw data Comprehensibility Completeness Typing Provenance Versatility Traceability Correctness Quality of the semantic conversion Granularity Consistency Connectedness Quality of the linking process Isomorphism Directionality© 2011 SAP AG. All rights reserved. Public 16
    17. 17. Our Proposal Quality of the linking processConnectedness: Is the combination of datasets done at the correct resources?Isomorphism: Are the combined datasets modeled in a compatible way? Are thecombined models reconciled?Directionality: After the linkage, is the knowledge represented in the resultinggraph of resources still consistent?© 2011 SAP AG. All rights reserved. Public 17
    18. 18. Our Proposal Global qualityThese principles are applicable to all aspects of a Semantic System (data source,raw data, links, etc.)Timeliness: Is the data up-to date? Does the data source contain the latest rawdata presented with the last updated model? Are the links from and to the datasource updated to the latest references? Does the source state the update andvalidation frequencies? Failing in updating the source data increases the chancethat the referenced URIs have changedHistory: Can we keep track of who edited my data and when?Freshness: The ability to replicate the remote repository into local triple stores andmaintain the timeliness of the replica© 2011 SAP AG. All rights reserved. Public 18
    19. 19. Conclusion and future work• Data quality in the Semantic Web is an important field to focus on• We presented five main classes of data quality principles for the Semantic Web. For each class, we listed the specific criteria that represent the quality of a data source on the WebFuture Work• We have applied these principles in several scenarios and use cases in SAP, we plan to better investigate and assess them in a broader set of use cases• Tools to formally prove that these principles are met© 2011 SAP AG. All rights reserved. Public 19
    20. 20. Thank You!Contact information:Ahmad AssafSAP Research, FranceAhmad.assaf@sap.com+33770198946
    21. 21. References[1] Chapman, Arthur D. 2005. Principles of Data Quality. Copenhagen. : Report for the Global Biodiversity Information Facility, 2005.[2] Juran, Joseph M. and Godfrey, A. Blanton. Jurans Quality Handbook. s.l. : McGraw-Hill, 1998.[3] Improving Data Quality Through Effective Use of Data Semantics. Madnick, Stuart and Zhu, Hongwei. 2005. Cambridge, MA :Composite Information Systems Laboratory (CISL), 2005.[4] Towards an Ontology for e-Document Management in Public Administration – the Case of Schleswig-Holstein. Klischewski, R.2012. s.l. : Proceedings HICSS-36, IEEE, 2012[5] Publishing Life Science Data as Linked Open Data: the Case Study of miRBase. T. Dalamagas, N. Bikakis, G. Papastefanatos, Y.[6] Stavrakas and A. Hatzigeorgiou. 2012. s.l. : 1st International Workshop on Open Data (WOD), 2012Publishing and linking transport data on the Web. Scharffe, J. Plu and F. 2012. s.l. : 1st International Workshop on Open Data (WOD),2012[7] Hierarchical Link Analysis for Ranking Web. Renaud Delbru, Nickolai Toupikov, Michele Catasta, Giovanni. 2010. s.l. : SpringerBerlin Heidelberg, 2010, Vol. 6089[8] Sindice at SemSearch 2010. Renaud Delbru, Nur Aini Rakhmawati, Giovanni Tummarello. 2010. s.l. : WWW2010, 2010[9] Sieve: Linked Data Quality Assessment and Fusion. Pablo N. Mendes, Hannes Mühleisen, Christian Bizer. 2012. Berlin :LWDM2012, 2012[10] MediaWiki. Quality Criteria for Linked Data sources. SourceForge. [Online] [Cited: 6 19, 2012.]http://sourceforge.net/apps/mediawiki/trdf/index.php?title=Quality_Criteria_for_Linked_Data_sources[11] Berners-Lee, Tim. 2006. Linked Data. W3C. [Online] 2006. [Cited: 6 18, 2012.] http://www.w3.org/DesignIssues/LinkedData.html[12] Google Code. Google Refine. [Online]. http://code.google.com/p/google-refine/[13] Stanford Visualization Group. Data Wrangler. [online]. http://vis.stanford.edu/wrangler/© 2011 SAP AG. All rights reserved. Public 21
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×