• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
RDFa: introduction, comparison with microdata and microformats and how to use it
 

RDFa: introduction, comparison with microdata and microformats and how to use it

on

  • 4,414 views

Report for the course 'XML and Web Technologies' of the IT4BI Erasmus Mundus Master's Programme. Introduction, motivation, target domain, schema, attributes, comparing RDFa with RDF, comparing RDFa ...

Report for the course 'XML and Web Technologies' of the IT4BI Erasmus Mundus Master's Programme. Introduction, motivation, target domain, schema, attributes, comparing RDFa with RDF, comparing RDFa with Microformats, comparing RDFa with Microdata, how to use RDFa to improve websites, how to extract metadata defined with RDFa, GRDDL and a simple exercise.

Statistics

Views

Total Views
4,414
Views on SlideShare
4,414
Embed Views
0

Actions

Likes
0
Downloads
25
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    RDFa: introduction, comparison with microdata and microformats and how to use it RDFa: introduction, comparison with microdata and microformats and how to use it Document Transcript

    • Navid Mahlouji Jose Luis Lopez Pino Edgar Isaac Hiroshi León Saiki 15/03/13 RDFa
    • 1 Table of Contents Introduction....................................................................................................................2 1) Target domain ................................................................................................................3 2) Schema………………………………………………………………………………...4 3) Attributes……….……………………………………………………….…………......4 3.1 Property…………………………………………….………………………….5 3.2 Vocab…………………………………………………..……………………...6 3.3 Resource…………………………………………………..…………………...6 3.4 Typeof………………………………………………………..………………..7 4) Comparisons of RDFa…………………………………………………………………8 4.1 Comparing RDFa with RDF…………………………………………………….8 4.2 Comparing RDFa with Microformats…………………………………………9 4.3 Comparing RDFa with Microdata………………………………………...…11 4.4 Conclusions…………………………………………………………………..11 5) Using RDFa .................................................................................................................12 5.1 Using RDFa to improve websites……………………………………………12 5.2 Extracting the data embedded in RDFa……………………………………...13 5.3 Exercise………………………………………………………………………15
    • 2 Introduction In the recent years and by the advancement in web technologies, humans are not the only consumers of the data available on World Wide Web. There are more and more machines searching the Internet for data and knowledge than before. It is not enough anymore to just present your data in a website for people to make a visit. One reason is that the data quantity on Internet is ever growing. To be able to make this data available for different sources and purposes we need to find a way to make the data not only readable for users but also understandable for machines and software. For instance let’s consider search engines, they use web crawlers tocrawl Internet and gather data and classify them to be used in search engines. However the density of data is ever growing and search engines are getting more precise an efficient in finding information in more detailed format. Although there are many criticisms on the feasibility of Semantic Web, it aims to give the massive data which is available on Internet, a structure. Having a structure data available on Internet can be more readable to machines therefore more useful for humans. RDFa is a tool by which we can give the data on web pages a structure.
    • 3 1 Target domain HTML is a very good and efficient way of presenting data, however when it comes to machine understanding that data is not efficient at all. In a usual web page, an author can specify some HTML code like for example a headline, a sub-headline, a block containing some italicized text, another text block with different size and several links. While web browsers will effectively represent the HTML code for people to understand it, nevertheless the computers cannot understand the structure of that data. For instance, the headline expresses a blog post title, the italicized text the publication date and the links are categories. Here is an example explaining what browsers and humans see[17]. On the left, we can see what browsers see, and on the right what humans observe [17]. To cover this need we can use XML technology which is very near to HTML and it can provide structure and semantic to our data. RDFa provides meaningful data for machines. This information can be available in the XHTML elements that are in the web page. For example when someone announces a dinner meeting and put it on a web page, there are applications that extract that information and easily copy to the user’s calendar. Or when, the contact information from the author’s blog can be registered to the address book of the user automatically. Once structure of the data is provided, the computer programs become more useful to understand the meaning of the data so they can use it efficiently [17].
    • 4 2 Schema According to W3C standards, an XHTML-RDFa document identifier should have the following header : <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> You may notice that to conform with XML syntax the header above should appear after XML declaration: <?xml version="1.0" encoding="utf-8"?> Having the above declarations we made sure that our document is being validated according to XML and RDfa schemas. Also the document can be validated using W3C Markup Validation Service.[28] It seems that RDFa schema does not imply any constraints, however since in RDFa we can use other vocabularies by using full IRI, they might force some integrity constraints which we have to follow and our document will be validated according to that. 3Attributes In the following we will introduce RDFa document and we will show some of its features and usages. The essence of RDFa is to provide a set of attributes that can be used to carry metadata in an XML language (hence the 'a' in RDFa).These attributes are: [12]  about – a URI or CURIE specifying the resource the metadata is about.  rel and rev – specifying a relationship and reverse-relationship with another resource, respectively.  src, href and resource – specifying the partner resource.  property – specifying a property for the content of an element or the partner resource.  content – optional attribute that overrides the content of the element when using the property attribute.  datatype – optional attribute that specifies the datatype of text specified for use with the property attribute.
    • 5  typeof – optional attribute that specifies the RDF type(s) of the subject or the partner resource (the resource that the metadata is about).  Vocab – optional attribute that defines a portion of a document from a specific vocabulary [20]. These attributes will add to HTML, XHTML extension to embed rich metadata within Web documents. Adding this attribute will not have any effects on the presentation of data provided by HTML because browsers are only sensitive to some predefined tags and RDFa attributes are not among them. In this case we can add metadata to the existing WebPages without any intact to their structure or data [12]. 3.1Property In the following we are presenting a very simple RDFa example from W3C website [17]. This code presents a very simple HTML page which has a title and a date in its body. From the visual presentation of the page a human can understand that “The Trouble with Bob” is the title of the topic of this document while the date is the date in which this document has been created. However the question is that whether a machine like a crawler can understand this semantic or just look at them as a string and a date? The answer is obvious; machines need a structure through which they will be able to understand the meaning of the content. RDFa attributes provide that structure. Almost everything in RDFa is presented using URL (As it is also the case in the above example). The reason behind this is rooted to data portability, information sharing and consistency. Using this method prevents terminologies to be presented ambiguously. In our example without this, the term "title" might mean “the title of a research paper”, or “a job title” while it is not. Including all the vocabularies by URL provides detailed information for both machine and human. To prevent possible errors in typing URLs for <html> <head> ... </head> <body> ... <h2 property="http://purl.org/dc/terms/title">The Trouble with Bob</h2> <p>Date: <span property="http://purl.org/dc/terms/created">2011- 09-10</span> </p> ... </body>
    • 6 every use, RDFa introduces the attribute vocab. This attribute provides the facility to the author to declare a URL once and use it multiple times. 3.2 Vocab The following example shows the use of vocab attribute to facilitate the use of URL [17]. In this example we can see that using vocab we are not obliged to only reference one URL in our document and we can still include new URL presenting new attributes. 3.3 Resource Sometime in one page multiple terms of one nature have to be presented. In that case the attribute resource, which specifies the context, is being used. In the following example two different terms of blog post nature have been presented [17]. In this example we used vocab attribute to be able to avoid retyping the URL. <html> <head> ... </head> <body vocab="http://purl.org/dc/terms/"> ... <h2 property="title">The Trouble with Bob</h2> <p>Date: <span property="created">2011-09-10</span></p> ... <p>All content on this site is licensed under <a property="http://creativecommons.org/ns#license"href="http://creativecommons.org/ licenses/by/3.0/"> a Creative Commons License</a>. ©2011 Alice Birpemswick.</p> </body> </html> <body vocab="http://purl.org/dc/terms/"> ... <div resource="/alice/posts/trouble_with_bob"> <h2 property="title">The trouble with Bob</h2> <p>Date: <span property="created">2011-09-10</span></p> <h3 property="creator">Alice</h3> ... </div> ... <div resource="/alice/posts/jos_barbecue"> <h2 property="title">Jo's Barbecue</h2> <p>Date: <span property="created">2011-09-14</span></p> <h3 property="creator">Eve</h3> ... </div> ... </body>
    • 7 This page includes two different blog entries each of which has title and created properties, to be able to distinguish between the two blog entries RDFa introduces resource attribute. 3.4 Typeof Alternatively instead of resource attribute we can use typeof attribute, which specifically helps us to declare a new data item with a certain type. The following example represents a social network page in which we defined a new type person and used that type for the owner of the page along with all her friends [17]. <div vocab="http://xmlns.com/foaf/0.1/" typeof="Person"> <p> <span property="name">Alice Birpemswick</span>, Email: <a property="mbox"href="mailto:alice@example.com">alice@example.com</a>, Phone: <a property="phone"href="tel:+1-617-555-7332">+1 617.555.7332</a> </p> <ul> <li property="knows"typeof="Person"> <a property="homepage" href="http://example.com/bob/"><span property="name">Bob</span> </a> </li> <li property="knows"typeof="Person"> <a property="homepage" href="http://example.com/eve/"><span property="name">Eve</span> </a> </li> <li property="knows"typeof="Person"> <a property="homepage" href="http://example.com/manu/"><span property="name">Manu</span> </a> </li> </ul> </div>
    • 8 Here we can see that the page has an owner of type person, who knows three other persons. The tree representation of the above social network has been shown in the following figure [17]. 4 Comparisons of RDFa 4. 1 Comparing RDFa with RDF RDFa is related to RDF, which is a standard for sharing data that is understandable for machines. The Resource Description Framework or RDF is an abstract representation of the data that can be shown as a graph model, with the idea of describing in a certain domain, the relationship between the web resources, using a form of subject-predicate-object and this expression is called a triple. The subject is represented at the beginning of the arrow, followed by the property which is the arrow and at the end of the arrow is the object [17], this structure of the RDF is called triples.
    • 9 Illustration of the triple [15] Actually triples represent relationships between related nodes. The objective of RDF is to present a language to express relationships and data [17]. RDF is an abstract data model which function is to reuse vocabularies in order to find resources and the relations between them [16]. On the other hand, RDFa express RDF data within XHTML, letting the computer understand its meaning, while reusing the existing data that is understandable for human in the document [17]. 4.2 Comparing RDFa with Microformats Microformats are just extensions of HTML which main objective is in fact the same of RDFa, both encode information to the XHTML documents making the code that is displayed for humans, readable for computers. In a similar approach, both add attributes to the XHTML code which is in general hidden from the users, while its only purpose is to add meaning to the data [13]. Both technologies give structure to the data in webpages, constructing the so called Semantic Web [23]. In order to apply the Microformats in XHTML code, we can use the following attributes [18]:  Class– This one contains data to describe properties and behaviour of an element.  Rel – Describes the location of the element.  Rev– Description of the referenced document. The rel and rev attributes are applied in RDFa; nevertheless class attribute is not, RDFa uses the attribute property for describing the resource instead. In the following example, the contact information of a person is described using hCard Microformat[14]:
    • 10 <div class="vcard"> <div class="fn">Toby Segaran</div> <div class="org">The Semantic Programmers</div> <div class="tel">919-555-1234</div> <a class="url" href="http://kiwitobes.com/">http://kiwitobes.com/</a> </div> As we can see in the example, a set of classes were embedded in a class vCard meaning that all belong to the Microformat hCard and with this, some software can extract this information using hCard Microformat structure and add it to your address book. Web crawlers can use Microformat hCard to build a database of contacts with their names, telephones, locations, etc. They extract that information from web sites using Microformats. For instance, the Microformat hCalendar can be used to create a timeline based on all the historical data about past events [14], these applications can also be done through RDFa. Microformats are predefined, each of them deal with a different purpose. Among them we can mention hCard that is used for contact information, hResume for CVs, hNews for news content, etc. RDFa allows developers to define a namespace. Meaning that publishers are not restricted only to official vocabularies, this feature makes possible to define their own vocabulary [14]. For example, if there is a specific domain like chemical data and there is no Microformat structure in that domain, in this case it is necessary to use RDFa [18]. Other differences between Microformats and RDFa are: 1- In RDFa it is possible to identify a resource by IRI (Internationalized Resource Identifier) making easier to locate a specific resource, unlike Microformats that does not support it. 2- Microformats do not support typed literal properties, which means that is not possible to specify things like units of measurement, such as kilogram or pound, or some specific numbers, for instance, “+323243453” whether it is an integer or a phone number. RDFa does support these properties. 3- RDFa allows specifying multiple IRI types per item, in which the web developers can indicate that a resource on a page is associated with more than one type, for instance, “AutoPartsStore” and “RepairShop” can be both a business. Microformats do not support this feature [19]. In conclusion, RDFa is more complex and effective. However there are more services using Microformats rather than RDFa at the present time. This is because of its simplicity, since it is not necessary in Microformat to specify an XML Schema[14].
    • 11 Microformats are heavenly used specially to extract events, contact information geographical coordinates and social relationships since 2010 [18]. 4.3Comparing RDFa with Microdata Microdata is another XHTML specification which its main goal is to add semantics to the web content. Microdata follows a similar approach like RDFa, in which it is possible to define custom vocabulary by the web developers [21]. Moreover in Microdata it is necessary to follow a standard body when designing vocabularies, which leads to better designed vocabularies than RDFa. However RDFa can be more complete since it does not need to follow a standard body [19]. Microdata was designed to be a subset of RDFa with the intention to make it simpler, for this reason, most of the functions in RDFa are equivalent, just the names of the attributes are different. It is important to mention that almost 99% of the code expressed with Microformat can easily be shifted to RDFa by just using its equivalent functions. There are some reasons why it is better to use RDFa approach rather than Microdata: 1- RDFa is supported by most of the common search engines, unlike Microdata. 2- RDFa supports some advanced features that still are not available in Microdata. For instance, it does not support defining units of measurement. 3- RDFa features are improving constantly, unlike Microdata. In summary, Microdata is just an attempt to do the same as RDFa, with the idea of reducing complexity, even though is not up till now a standard like RDFa, for this reason, it lacks compatibility with many applications [22]. 4.4 Conclusions In summary, all three approaches have their own advantages and disadvantages. While microformats are more used in the market, it lacks some important features. For instance, it is not possible to create our own vocabularies; one needs to use the vocabularies that were developed only for microformats. On the other hand, Microdata can have custom vocabularies; nevertheless some properties are missing, for instance, advanced features presented in RDFa like units of measurement. Even if RDFa is not the most used nowadays, it is the one that has more features and cover more domains. In other words, RDFa is the most powerful among these three approaches.
    • 12 5 Using RDFa 5.1 Using RDFa to improve websites RDFa is not supported by schema.org, a shared markup vocabulary defined in collaboration by Google, Microsoft and Yahoo!, Google [2] has defined specific vocabulary for reviews, people, products, businesses, organizations, recipes, events and videos. For instance, in the following picture we can observe how they use the metadata stored in RDF attributes to improve the result of website reviews, they call those results "rich snippets". This is an example from the W3C blog [1] that uses RDFa 1.0 to add metadata to a review, helping Google to index it:
    • 13 In 2009 the Central Office of Information had to face a big problem: organise the job vacancies and they needed to find the way of doing it without changing the websites of the different public agencies, because they use diverse web technologies. [3] For this purpose they defined a vocabulary that could also be usable by others. With this vocabulary they are able to define the details of the job vacancy: the title, the type, the description, the requirements, language, etc. [4] After that they started to use this vocabulary implementing RDFa in different websites. Another case of successful use of RDFa is GoodRelations [5], a vocabulary for e- commerce that helps to standarise the metadata of different vendors. It helps vertical searchs, for instance users that look for products in different websites or companies that need different suppliers. Multiple shop applications like Magent have already included it in their software solutions and it is possible to define it using RDFa, for example BestBuy use RDFa to define information about their stores like the opening hours, the location, the telephone number, etc. [6] 5.2 Extracting the data embedded in RDFa As we have already mentioned, Google defines a specific vocabulary for people.To add metadata in the RDF attributes of a XHTML document we can use any text or source code editor. However it is tricky to check a whole document for extracting only the metadata from it. In this case, we can use multiple tools to make this task easier, for instance we can install an extension for Chrome called RDFa Triples Lister that extracts the metadata of the website we at visiting with this browser:
    • 14 We can use RDF parsing tools that exact the RDFa embedded in a web page, for example with the rdfquery [7] tool we can read the RDFa information of BBC programmes and use it to create links to Spotify and stream the songs[8]. The following graph, created with RDFa play [9], shows the RDF information extracted from a programme of the BBC [10]: Finally, the W3C has defined a mechanism to extract data compatible with the Resource Description Framework, including RDFa. For this purpose we have to define transformations that are instructions for extracting any embedded data properly [24]. For RDFa we can find a style sheet that defines the transformations that has to be done to a XHTML+RDFa document to extract the RDF data.[26] For instance, the following image illustrates an interesting example: we can find in the web different calendars and probably the metadata is defined using different techniques (microformats, RDFa, etc.). The GRDDL transformations specify how to extract the RDF data from each document. Once we have extracted the RDF triples we can process them using for example SPARQL (query language for RDF).[25]
    • 15 5.3 Exercise As we have already mentioned, Google defines a specific vocabulary for people. This vocabulary is very useful to make our social networking information accessible. The properties define in the vocabulary are: We need to use the vocabulary presented above, to modify this webpage adding metadata to it with RDFa:
    • 16 A possible solution could be:
    • 17 REFERENCES [1] W3C. RDFa 1.1 with a rich snippet example. Retrieved from W3C org: http://www.w3.org/QA/2011/05/rdfa_11_with_a_rich_snippet_ex.html [2] Google. Rich snippets (microdata, microformats, RDFa, and Data Highlighter). Retrieved from Google support: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=99170 [3] Birbeck, M. More RDFa goodness from UK government web-sites. Retrieved from Internet-Apps : http://internet-apps.blogspot.fr/2009/04/more-rdfa-goodness-from-uk-government.html [4] Birbeck Mark. (n.d.).Argot Vacancy. Retrieved from Google code: https://code.google.com/p/argot- hub/wiki/ArgotVacancy [5] GoodRelations Wiki. (n.d.).The Web Vocabulary for E-Commerce. Retrieved from Good Relations Vocabulary: http://wiki.goodrelations-vocabulary.org/Quickstart [6] Myers, J. CREATING LOCAL VISIBILITY TO OPEN BOX PRODUCTS WITH FRONT-END SEMANTIC WEB. Retrieved from Beweep: http://jay.beweep.com/2010/03/30/creating-local-visibility-to- open-box-products-with-front-end-semantic-web/ [7] Google. rdfquery. Retrieved from Google Code: https://code.google.com/p/rdfquery/ [8] Adding Spotify links to BBC Radio playlists, via RDFa, using GreasemonkeyandrdfQuery. Retrieved from http://hublog.hubmed.org/archives/001913.html [9] RDFa Group. RDFa Info. Retrieved from http://rdfa.info/play/ [10] Use of Semantic Web technologies on the BBC Web Sites. Retrieved from: http://www.cmswire.com/cms/information-management/bbcs-adoption-of-semantic-web-technologies-an- interview-017981.php [11] Rich snipplets – People. Retrieved from http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146646 [12] Wikipedia. RDFa. Retrieved from http://en.wikipedia.org/wiki/RDFa [13] Prodromou, E. (2008, 10 12). RDFavsmicroformats . Retrieved from http://evan.prodromou.name: http://evan.prodromou.name/RDFa_vs_microformats [14] Toby Segaran, C. E. (2009). Programming the semantic web. O'Reilly. [15] W3C. (2004, February 10). Resource Description Framework (RDF):. Retrieved from W3.org: http://www.w3.org/TR/rdf-concepts/#section-triples [16] W3C. (2004, February 10). W3.org. Retrieved from RDF Vocabulary Description Language 1.0: RDF Schema: http://www.w3.org/TR/2004/REC-rdf-schema-20040210/ [17] W3C Working Group . (2012, June 07). RDFa 1.1 Primer. Retrieved from W3C : http://www.w3.org/TR/xhtml-rdfa-primer/ [18] Wikipedia. Microformat. Retrieved from: http://en.wikipedia.org/wiki/Microformats#cite_note- Wharton000-2
    • 18 [19] Sporny, M. An Uber-comparison of RDFa, Microdata and Microformats. Retrieved from many sporny organization: http://manu.sporny.org/2011/uber-comparison-rdfa-md-uf/ [20] W3C Group. RDFa Syntax. Retrieved from: http://www.w3.org/TR/rdfa-syntax/ [21] Wikipedia. Microdata. Retrieved from http://en.wikipedia.org/wiki/Microdata_(HTML)#cite_note- DIVE-4 [22] Sporny, M. (n.d.). Mythical Differences: RDFaLite vs. Microdata. Retrieved from Manu Sporny Organization: http://manu.sporny.org/2012/mythical-differences/ [23] Wikipedia. Semantic Web. Retrieved from: http://en.wikipedia.org/wiki/Semantic_Web [24] Gleaning Resource Descriptions from Dialects of Languages (GRDDL). http://www.w3.org/TR/grddl/ [25] GRDDL Use Cases: Scenarios of extracting RDF data from XML documentshttp://www.w3.org/TR/2007/NOTE-grddl-scenarios-20070406/ [26] RDFa2RDFXML style sheet. http://www.w3.org/TR/grddl-primer/RDFa2RDFXML.xsl [28] W3C Markup Validation Service http://validator.w3.org/