The main trends in the use and development


Published on

Presentation for KESW conference

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Hi, my name is Yuliya. I am working for Yandex at Semantic Web Project. Today I intend to discuss The Main Trends in the Use and Development of Semantic Markup
  • Firstly I want to talk about the reasons for using semantic markup in Yandex. Then we'll talk a little bit about the basic terms. Finally in general discuss the development of semantic markup an example
  • So, why do we need all this stuff?
  • There is a huge pile of raw data in the Internet. But it's not enough for give an answer to our users. To give them good answer we need knowledge rather than raw data.
  • We can extract knowledge automatically (using machine learning, language technologies or specialized parsers). And we can get knowledge about content of web pages directly from the webmasters. Both methods have their advantages and disadvantages.
  • Self data mining allows us not be dependent on webmasters. Furthermore, this method is more is technological. But sometimes we need special parser for each web site. An important disadvantage of this method is the lack of webmasters the opportunity to influence our knowledge of their site.
  • On the other side the receipt of data from the webmasters also have advantages and disadvantages. It is good that we get information about the contents of pages from the people who really know what is written on it. In addition, we need to make less effort to use those knowledge in search. But from the other hand many people is not so honest as I'd wish to. And they may try to fraud the system. And, of course, not all webmasters want to make an effort to give us any information.
  • In view of the above at the end of 2009 we started to use in our services the additional information sent by webmasters.
  • How we can collect information from webmasters? First of all by using special tools. Second, by using XML-files special formats. And other files. Even excel. Another variance does not involve something other than HTML code of pages. Semantic markup is included directly in page's source code.
  • Let's talk about semantic markup.
  • I want to say some words about syntax and vocabulary, tell about usage of semantic markup and bring some statistics.
  • Semantic markup consist of syntax and vocabulary. First is about how we put information into pages. Second is about what information we give.
  • There are for main syntax of semantic markup: RDFa, Microformats, Microdata and the newest - JSON-LD. And then there are some dictionaries that can be used with these syntaxes. The oldest one is DublinCore. Originally it was created in 1995. In Russia there is even a Standard, describing the Dublin Core. It is very simple and contains only 15 elements. Do not be surprised that microformats are listed as a vocabulary.This is because there are mixed form and meaning. GoodRelations is a specialized vocabulary that describes the goods and services. Open Graph Protocol is an initiative of Facebook. It is a simple way to convey the most important information about content of page. is the most promising dictionary, supported by Google, Bing, Yahoo, and by Yandex.
  • Some history. A long long time ago far far away in the Galaxy... wait! It's another story. We begun using semantic markup in late 2009. We start makin rich snippet and services based on semantic markup. In the next year W3C announced HTML5 and microdata. And we started usage this method in our products. We even wrote a dictionary of data about encyclopedias. Than Facebook has announced The Open Graph Protocol. The following year was created And the world has changed. We came up with new ways to use this markup. As well as changes in the The first Yandex proposal in was PeopleAudience. Now it is accepted and published, but it takes a lot of time to do this. From the outside it seems that there is nothing easier than to add a few new properties. But you should predict what people think and what they might think. How will webmasters and consumers use this data. Isn't it too difficult? Do you want to specify the gender of the target audience? Be ready to think about that it might offend people belonging to one sex but identify themselves with the other. Do you want to specify the age of the target audience of the content? It's might to offend adults who love to read children's books. To date, we have actions and JSON-LD syntax . And we use it in Yandex.Islands.
  • According to our base 24% documents in the internet contains some semantic markup. A lot or a little? Of course, this is far from 100%, but over the past three years, the number has risen to more than twice.
  • Here you can see our statistics of semantic markup distribution. The most popular vocabulary is The Open Graph Protocol. Next is And those small bar is GoodRelations.
  • How can this data be used? The major consumer is Search Engines. It uses this data for creation rich snippets and reception content from webmasters to some services. For example, Yandex creates rich snippets for recipes, dictionary articles, movies, chords, etc. And uses information extracted from microdata in Video, Auto, Images and other services. But not only search engines consume semantic markup. Other internet companies also can do this. For example, pinterest uses OG and for creating Rich Pins. Facebook, Google , twitter and other social network can create rich snippets for shared links.
  • does not stand still. There two level of changing: 1) Public feedback and discussion. The most important point from publick discussion goes to work group 2) Work group consist of delegate from 4 search engines (Yandex, Google, Bing and Yahoo). They decide wether to make changes or not.
  • If you have some idea, problem or question you can send it to You also can read this mail list and reply to the questions and help to solves someone's problems.
  • If the idea has sense it will work through the working group. First of all we explore the idea. What the idea is? Where we sould place this change? How often is this use case? What are the challenges we face? Than we should discuss this idea. When all are agreed formulated idea sends to Next step is collecting feedback from community. If there is a significant comments we need to repeat the cycle. It seems that no idea will never be accepted. But it is not true. And here are some new updates.
  • Actions - it's like a verb in the vocabulary
  • GoodRelations - this is about integration between and GoodRelations
  • Integration with vocabulary for learning resources metadata
  • Health and Medical vocabulary - this is about including Health and Medical vocabulary
  • JSON-LD - it's about using in new syntax.
  • And there are some future work
  • Potential actions - how describe an action that will happened in future
  • The main trends in the use and development

    1. 1. Yuliya Tikhokhod Project Manager, Yandex, Russia The Main Trends in the Use and Development of Semantic Markup
    2. 2. • Why does Yandex need semantic markup? • Basic facts about semantic markup • Markup development ( example) Agenda
    3. 3. If you're so smart, why do you need someone to help? Why does Yandex need semantic markup?
    4. 4. 4
    5. 5. 5 Yande x
    6. 6. 6 Data Mining
    7. 7. 7 Data from webmasters
    8. 8. Since late 2009, we have been using structured data from webmasters
    9. 9. 9 Collecting data Affiliate program Forms XML Other file Semantic markup
    10. 10. • Why does Yandex need semantic markup? • Basic facts about semantic markup • Markup development ( example) Agenda
    11. 11. • Syntax and Vocabulary • Usage • Statistics Semantic Markup
    12. 12. 12
    13. 13. 13
    14. 14. 14 andexbeganusingsemanticmarkup EnhancedSnippets Services OpenGraph Improvedsearchalgorithms Actions,JSON-LD YandexIslands Microdata 2009 2010 2011 2013 RDFa 2008 Microformats 2005
    15. 15. 15 24% of documents in the internet contain some semantic markup
    16. 16. 16 Statistics for September 2013
    17. 17. 17 Usage
    18. 18. • Why does Yandex need semantic markup? • Basic facts about semantic markup • Markup development ( example) Agenda
    19. 19. 19
    20. 20. 20 Explore • What? • Where? • How often? • Problems? Internal discuss Public-vocabs@w3.orgExternal comments
    21. 21. Latest updates
    22. 22. 22 Actions
    23. 23. 23 GoodRelations
    24. 24. 24 LRMI
    25. 25. 25 Health and Medical vocabulary
    26. 26. 26 New syntax for – JSON-LD
    27. 27. Future work
    28. 28. 28 Potential actions
    29. 29. 29 Civic services
    30. 30. 30 Reservations
    31. 31. 31 Event schema update
    32. 32. 32 Accessibility
    33. 33. 33 Other schemas
    34. 34. 34 • • • • • • Useful links
    35. 35. Yuliya Tikhokhod Project Manager @tihohodka Thank you
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.