- The document describes a semantic search system that analyzes social web content like blog postings.
- It converts conventional blog postings into "semantic blogs" by adding semantic tags to topics. This allows emerging relationships between bloggers and topics to be discovered.
- The system uses simple semantics from a large number of blog postings and connections between bloggers and topics. This "simple semantic" approach allows new knowledge to emerge from the connections in contrast to traditional ontologies.
This document summarizes a research paper that explores using Twitter data to improve the ranking of fresh, time-sensitive search results. The researchers extracted various content, aggregate, and Twitter-specific features to train machine learning models to rank regular web pages and Twitter URLs. In experiments, models trained on Twitter features in addition to standard features significantly outperformed models using only standard features in ranking fresh Twitter URLs higher for breaking news queries. Key Twitter features included textual similarity between tweets and URLs, social network properties of users sharing URLs, and metrics of Twitter accounts associated with shortened URLs. The findings suggest Twitter data can help search engines better satisfy users' immediate needs for up-to-date information on recent events.
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...Lucidworks
This document discusses using robots, search, and AI technologies together. It summarizes demonstrating a humanoid robot that can learn from its surroundings and interact with humans naturally. The robot will learn to recognize people by being introduced to them, just as humans meet and remember each other. The agenda includes introducing the InMoov robot platform and MyRobotLab framework, and demonstrating how to make a cognitive robot using technologies like speech recognition, computer vision, natural language understanding, memory storage in Solr, and deep learning with Deeplearning4j.
This document provides an introduction and overview of collaborative filtering in Python using the Surprise library. It begins with a taxonomy of recommendation system approaches, focusing on collaborative filtering which leverages user-item interactions. It then describes the neighborhood method of collaborative filtering, including computing similarity measures between users/items and aggregating neighbors' ratings. The document introduces the Surprise library, explaining its features like built-in algorithms, dataset handling, and evaluation tools. It provides code examples for basic usage, implementing custom algorithms, and evaluating models through cross-validation.
This document discusses different tools for searching the World Wide Web, including search engines, web directories, metasearch engines, and the invisible web. It provides examples of each type of tool and compares their advantages and disadvantages. The document emphasizes that no single search tool can search the entire web and that search results vary depending on the tool used. It also outlines limitations of web searching and provides tips on evaluating websites and effective searching strategies.
How to navigate the maze of evaluation methods and approachesSimon Hearn
This document discusses the overwhelming number of program monitoring and evaluation tools and approaches and provides steps to navigate this complexity. It introduces the BetterEvaluation podcast which aims to help navigate the many evaluation methods. It promotes focusing on logic models and outcome mapping. It also shares frameworks like the Rainbow Framework which maps over 200 methods across evaluation tasks, and lists competencies and standards for evaluation like the Program Evaluation Standards. The overall goal is to help people understand and select the most appropriate evaluation approaches.
Putting Data in Context: Timelining for Evaluators (HANDOUT)Innovation Network
This document provides tips and guidelines for creating effective timelines to visualize data and put it in context for evaluators. It discusses planning a timeline by determining the purpose, audience, format, data sources and other factors. It also provides tips on populating a timeline by categorizing data, selecting time parameters and ensuring alignment. Finally, it lists additional resources for creating timelines, working with color, and examples of visual timelines.
The document discusses human-computer collaboration and the challenges it presents. It notes that computers now have vast computational power and memory but also more complex systems that are difficult for unaided humans to manage. It also mentions that much information is now stored in formats like dynamically combined data rather than human-readable text. Infrastructure is needed to provide background knowledge and put information in a machine-understandable form.
The document discusses various methods for evaluating mobile learning. It describes studies that have evaluated the effectiveness of mobile technologies for classroom response systems, group learning, simulations, and connecting formal and informal learning. It notes challenges in evaluating mobile learning given its mobile, distributed, informal, and extended nature. The document then provides details on evaluation methods for usability, usefulness, attitudes, and case studies that have utilized questionnaires, interviews, observations, logbooks, and video recordings.
This document summarizes a research paper that explores using Twitter data to improve the ranking of fresh, time-sensitive search results. The researchers extracted various content, aggregate, and Twitter-specific features to train machine learning models to rank regular web pages and Twitter URLs. In experiments, models trained on Twitter features in addition to standard features significantly outperformed models using only standard features in ranking fresh Twitter URLs higher for breaking news queries. Key Twitter features included textual similarity between tweets and URLs, social network properties of users sharing URLs, and metrics of Twitter accounts associated with shortened URLs. The findings suggest Twitter data can help search engines better satisfy users' immediate needs for up-to-date information on recent events.
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...Lucidworks
This document discusses using robots, search, and AI technologies together. It summarizes demonstrating a humanoid robot that can learn from its surroundings and interact with humans naturally. The robot will learn to recognize people by being introduced to them, just as humans meet and remember each other. The agenda includes introducing the InMoov robot platform and MyRobotLab framework, and demonstrating how to make a cognitive robot using technologies like speech recognition, computer vision, natural language understanding, memory storage in Solr, and deep learning with Deeplearning4j.
This document provides an introduction and overview of collaborative filtering in Python using the Surprise library. It begins with a taxonomy of recommendation system approaches, focusing on collaborative filtering which leverages user-item interactions. It then describes the neighborhood method of collaborative filtering, including computing similarity measures between users/items and aggregating neighbors' ratings. The document introduces the Surprise library, explaining its features like built-in algorithms, dataset handling, and evaluation tools. It provides code examples for basic usage, implementing custom algorithms, and evaluating models through cross-validation.
This document discusses different tools for searching the World Wide Web, including search engines, web directories, metasearch engines, and the invisible web. It provides examples of each type of tool and compares their advantages and disadvantages. The document emphasizes that no single search tool can search the entire web and that search results vary depending on the tool used. It also outlines limitations of web searching and provides tips on evaluating websites and effective searching strategies.
How to navigate the maze of evaluation methods and approachesSimon Hearn
This document discusses the overwhelming number of program monitoring and evaluation tools and approaches and provides steps to navigate this complexity. It introduces the BetterEvaluation podcast which aims to help navigate the many evaluation methods. It promotes focusing on logic models and outcome mapping. It also shares frameworks like the Rainbow Framework which maps over 200 methods across evaluation tasks, and lists competencies and standards for evaluation like the Program Evaluation Standards. The overall goal is to help people understand and select the most appropriate evaluation approaches.
Putting Data in Context: Timelining for Evaluators (HANDOUT)Innovation Network
This document provides tips and guidelines for creating effective timelines to visualize data and put it in context for evaluators. It discusses planning a timeline by determining the purpose, audience, format, data sources and other factors. It also provides tips on populating a timeline by categorizing data, selecting time parameters and ensuring alignment. Finally, it lists additional resources for creating timelines, working with color, and examples of visual timelines.
The document discusses human-computer collaboration and the challenges it presents. It notes that computers now have vast computational power and memory but also more complex systems that are difficult for unaided humans to manage. It also mentions that much information is now stored in formats like dynamically combined data rather than human-readable text. Infrastructure is needed to provide background knowledge and put information in a machine-understandable form.
The document discusses various methods for evaluating mobile learning. It describes studies that have evaluated the effectiveness of mobile technologies for classroom response systems, group learning, simulations, and connecting formal and informal learning. It notes challenges in evaluating mobile learning given its mobile, distributed, informal, and extended nature. The document then provides details on evaluation methods for usability, usefulness, attitudes, and case studies that have utilized questionnaires, interviews, observations, logbooks, and video recordings.
This document discusses the Digital Public Library of America (DPLA) and linked library data. It begins by asking questions about what the DPLA is, where its materials and metadata are coming from, and what problems it may encounter. It then discusses that libraries have metadata in many forms beyond catalogs and that standards need to account for computers' abilities while allowing flexibility. Unique identifiers, controlled vocabularies, and machine-readable data are important. The document proposes several ways to connect library data, such as metadata standards or metasearch, and discusses issues with each. It introduces linked data using URIs, RDF triples, and vocabularies as a way to integrate data while allowing different implementations.
The document discusses the social semantic web, which aims to interconnect social web platforms using semantic technologies. It describes how people interact and share content on the social web through activities like creating content, tagging content, stating relationships, and exchanging messages. It outlines some limitations of isolated social platforms and databases. It also covers folksonomies for representing user-generated tags, and vocabularies like SIOC, FOAF, and MOAT that can be used to semantically describe social web data. Finally, it proposes a vision for semantic social web tools to provide personalized learning recommendations and connections by leveraging social relationships and feedback.
The document discusses metadata standards and practices. It begins by asking questions about how digital information is organized and found. It then discusses challenges like having to do new tasks without full knowledge and learning from others. The document provides overviews of various metadata standards like MODS, MIX, PREMIS, METS, and TEI. It also discusses topics such as metadata schemas, subject metadata, indexing metadata, and search relevance. Throughout, it offers advice on evaluating and implementing metadata standards.
This document summarizes a talk given by Andraz Tori about his company Zemanta. It discusses how Zemanta started as a system for closed captioning Slovenian television, which led Tori to start a startup. Zemanta provides a personal writing assistant that suggests images, related articles, in-text links and tags to bloggers as they write. It analyzes text using natural language processing and information retrieval against a database containing Wikipedia, Freebase and other web data. Tori discusses Zemanta's technology, growth serving over 80,000 bloggers monthly, and plans to open its API to more users. He emphasizes lessons like accelerators being beneficial, the importance of monetizing early, and focusing on one
1) Semantic publishing involves enhancing publications with semantic metadata to capture meaning in a computationally accessible way. This can include tagging entities, relating concepts, and linking to semantic datasets.
2) True semantic publications fulfill the desiderata of scholarly works while being enhanced by semantics. They can link to semantic knowledge bases and data to strengthen arguments.
3) Achieving fully semantic authoring and publication from the start presents challenges but taking small steps like using identifiers and linking to datasets can begin imparting semantics to works.
1) Semantic publishing involves enhancing publications with semantic metadata to capture meaning in a computationally accessible way. This can include tagging entities, relating data and knowledge, and linking to other semantic resources.
2) Fully semantic publications could be written logically in first-order logic, though narrative is also important. Semantic publishing aims to fulfill scholarly criteria like context, evidence, and argumentation while adding semantics.
3) Publications exist on a spectrum from non-semantic to fully semantic. Even small amounts of semantics like citations or links can begin to make a publication semantic.
Robert Stevens defines key terms related to semantic publishing such as publication, authoring, and the semantic web. He discusses how publications, ontologies, and semantic data can all be considered semantic publications. Stevens proposes challenges for semantic publishing, such as authoring a born semantic narrative publication on the topic of semantic publishing. The summary captures the key topics and concepts discussed in the document.
Slides from guest presentation at Aron Lindberg's Computational-Qualitative Field Research seminar: http://aronlindberg.github.io/computational_field_research/ Needed readings at https://www.dropbox.com/sh/1gx9s2zlnxvumbz/AAAV9uSAJHsiPeJhSsNnnM9Pa?dl=0
George Oates introduces himself as the new leader of the Open Library project. He discusses his first steps in the role, including listening to user feedback, streamlining processes, and assessing competition. Oates describes plans to better understand how different parts of the library interrelate and reach out to other networks. He addresses challenges like improving metadata and engaging more users. Oates envisions connecting library records to more external sources and allowing easier contribution to help the library grow.
This document provides step-by-step instructions for conducting academic library research. It outlines choosing a topic and keywords, constructing a search strategy, choosing appropriate research tools like books, articles, primary sources, and datasets, running searches and evaluating results. Key tips include using synonyms, limiting or expanding search terms, combining terms with "and" or "or", trying different databases and subject headings, and getting full text or requesting items through interlibrary loan when not available locally.
Lipstick on a Pig: Integrated Library SystemsDorothea Salo
This document discusses communication strategies for library colleagues, including writing concisely and focusing on the purpose and audience. It also summarizes tools like Creative Commons licensing and maintaining an online presence. Finally, it examines software development models and challenges facing library systems, including integrating library data and collections onto the open web.
The document discusses various approaches for solving the "more like this" problem beyond traditional library catalogs, including FRBR which aims to make relationships between entities explicit, user-generated tagging on sites like Delicious and LibraryThing to organize content, and recommender systems that attempt to predict what a user might want based on their history and that of similar users. It also covers library systems that are incorporating community contributions and tags into their catalogs to provide more personalized recommendations to patrons.
OPAC 2.0 and beyond aims to improve online library catalogs by incorporating more modern web features. Second generation OPACs include relevance ranking, faceted browsing, spell checking, and social features. However, OPACs still lack serendipity and recommendations seen in commercial sites. Future library systems may provide unified search across different content silos and local indexing of web resources. The role of the traditional OPAC interface is uncertain as staff and user systems continue to decouple.
INFORMATION SKILLS: NAVIGATING RESEARCH IN LIBRARYChris Okiki
This document provides information and guidance about navigating research in library facilities. It discusses developing information literacy skills like improving discovery of resources, teaching information literacy courses, and deepening faculty collaboration. The document also addresses shifts in the library profession toward more of a focus on services, people, and enabling users rather than just products, facilities, and mediation. It provides examples of free online resources like Khan Academy and Omeka that libraries can offer. Finally, it offers tips for effective search strategies when using databases and electronic sources, including defining information needs, choosing appropriate sources, and using techniques like keyword searching, limiters, and Boolean operators.
Олександр Обєдніков “Рекомендательные системы”Dakiry
This document provides an overview of recommender systems. It begins with definitions and examples of recommender systems and their business value. It then discusses the problem formulation and history, including the Netflix Prize competition. Traditional collaborative filtering and latent factor models are explained. The document also covers content-based recommendations and novel approaches like learning to rank, sequence recommendation using deep learning, and social/trust-based systems. It concludes with a discussion of hybrid recommendation approaches.
This document discusses knowledge engineering and the use of knowledge on the web. It covers web data representation using standards like RDF, HTML5 and SKOS. It discusses categorizing knowledge from different sources and aligning categories. It also discusses using knowledge through techniques like visualization, graph-based search across linked data, and improving search through vocabulary alignment and location-based queries.
Beyond document retrieval using semantic annotations Roi Blanco
Traditional information retrieval approaches deal with retrieving full-text document as a response to a user's query. However, applications that go beyond the "ten blue links" and make use of additional information to display and interact with search results are becoming increasingly popular and adopted by all major search engines. In addition, recent advances in text extraction allow for inferring semantic information over particular items present in textual documents. This talks presents how enhancing a document with structures derived from shallow parsing is able to convey a different user experience in search and browsing scenarios, and what challenges we face as a consequence.
This document discusses the Digital Public Library of America (DPLA) and linked library data. It begins by asking questions about what the DPLA is, where its materials and metadata are coming from, and what problems it may encounter. It then discusses that libraries have metadata in many forms beyond catalogs and that standards need to account for computers' abilities while allowing flexibility. Unique identifiers, controlled vocabularies, and machine-readable data are important. The document proposes several ways to connect library data, such as metadata standards or metasearch, and discusses issues with each. It introduces linked data using URIs, RDF triples, and vocabularies as a way to integrate data while allowing different implementations.
The document discusses the social semantic web, which aims to interconnect social web platforms using semantic technologies. It describes how people interact and share content on the social web through activities like creating content, tagging content, stating relationships, and exchanging messages. It outlines some limitations of isolated social platforms and databases. It also covers folksonomies for representing user-generated tags, and vocabularies like SIOC, FOAF, and MOAT that can be used to semantically describe social web data. Finally, it proposes a vision for semantic social web tools to provide personalized learning recommendations and connections by leveraging social relationships and feedback.
The document discusses metadata standards and practices. It begins by asking questions about how digital information is organized and found. It then discusses challenges like having to do new tasks without full knowledge and learning from others. The document provides overviews of various metadata standards like MODS, MIX, PREMIS, METS, and TEI. It also discusses topics such as metadata schemas, subject metadata, indexing metadata, and search relevance. Throughout, it offers advice on evaluating and implementing metadata standards.
This document summarizes a talk given by Andraz Tori about his company Zemanta. It discusses how Zemanta started as a system for closed captioning Slovenian television, which led Tori to start a startup. Zemanta provides a personal writing assistant that suggests images, related articles, in-text links and tags to bloggers as they write. It analyzes text using natural language processing and information retrieval against a database containing Wikipedia, Freebase and other web data. Tori discusses Zemanta's technology, growth serving over 80,000 bloggers monthly, and plans to open its API to more users. He emphasizes lessons like accelerators being beneficial, the importance of monetizing early, and focusing on one
1) Semantic publishing involves enhancing publications with semantic metadata to capture meaning in a computationally accessible way. This can include tagging entities, relating concepts, and linking to semantic datasets.
2) True semantic publications fulfill the desiderata of scholarly works while being enhanced by semantics. They can link to semantic knowledge bases and data to strengthen arguments.
3) Achieving fully semantic authoring and publication from the start presents challenges but taking small steps like using identifiers and linking to datasets can begin imparting semantics to works.
1) Semantic publishing involves enhancing publications with semantic metadata to capture meaning in a computationally accessible way. This can include tagging entities, relating data and knowledge, and linking to other semantic resources.
2) Fully semantic publications could be written logically in first-order logic, though narrative is also important. Semantic publishing aims to fulfill scholarly criteria like context, evidence, and argumentation while adding semantics.
3) Publications exist on a spectrum from non-semantic to fully semantic. Even small amounts of semantics like citations or links can begin to make a publication semantic.
Robert Stevens defines key terms related to semantic publishing such as publication, authoring, and the semantic web. He discusses how publications, ontologies, and semantic data can all be considered semantic publications. Stevens proposes challenges for semantic publishing, such as authoring a born semantic narrative publication on the topic of semantic publishing. The summary captures the key topics and concepts discussed in the document.
Slides from guest presentation at Aron Lindberg's Computational-Qualitative Field Research seminar: http://aronlindberg.github.io/computational_field_research/ Needed readings at https://www.dropbox.com/sh/1gx9s2zlnxvumbz/AAAV9uSAJHsiPeJhSsNnnM9Pa?dl=0
George Oates introduces himself as the new leader of the Open Library project. He discusses his first steps in the role, including listening to user feedback, streamlining processes, and assessing competition. Oates describes plans to better understand how different parts of the library interrelate and reach out to other networks. He addresses challenges like improving metadata and engaging more users. Oates envisions connecting library records to more external sources and allowing easier contribution to help the library grow.
This document provides step-by-step instructions for conducting academic library research. It outlines choosing a topic and keywords, constructing a search strategy, choosing appropriate research tools like books, articles, primary sources, and datasets, running searches and evaluating results. Key tips include using synonyms, limiting or expanding search terms, combining terms with "and" or "or", trying different databases and subject headings, and getting full text or requesting items through interlibrary loan when not available locally.
Lipstick on a Pig: Integrated Library SystemsDorothea Salo
This document discusses communication strategies for library colleagues, including writing concisely and focusing on the purpose and audience. It also summarizes tools like Creative Commons licensing and maintaining an online presence. Finally, it examines software development models and challenges facing library systems, including integrating library data and collections onto the open web.
The document discusses various approaches for solving the "more like this" problem beyond traditional library catalogs, including FRBR which aims to make relationships between entities explicit, user-generated tagging on sites like Delicious and LibraryThing to organize content, and recommender systems that attempt to predict what a user might want based on their history and that of similar users. It also covers library systems that are incorporating community contributions and tags into their catalogs to provide more personalized recommendations to patrons.
OPAC 2.0 and beyond aims to improve online library catalogs by incorporating more modern web features. Second generation OPACs include relevance ranking, faceted browsing, spell checking, and social features. However, OPACs still lack serendipity and recommendations seen in commercial sites. Future library systems may provide unified search across different content silos and local indexing of web resources. The role of the traditional OPAC interface is uncertain as staff and user systems continue to decouple.
INFORMATION SKILLS: NAVIGATING RESEARCH IN LIBRARYChris Okiki
This document provides information and guidance about navigating research in library facilities. It discusses developing information literacy skills like improving discovery of resources, teaching information literacy courses, and deepening faculty collaboration. The document also addresses shifts in the library profession toward more of a focus on services, people, and enabling users rather than just products, facilities, and mediation. It provides examples of free online resources like Khan Academy and Omeka that libraries can offer. Finally, it offers tips for effective search strategies when using databases and electronic sources, including defining information needs, choosing appropriate sources, and using techniques like keyword searching, limiters, and Boolean operators.
Олександр Обєдніков “Рекомендательные системы”Dakiry
This document provides an overview of recommender systems. It begins with definitions and examples of recommender systems and their business value. It then discusses the problem formulation and history, including the Netflix Prize competition. Traditional collaborative filtering and latent factor models are explained. The document also covers content-based recommendations and novel approaches like learning to rank, sequence recommendation using deep learning, and social/trust-based systems. It concludes with a discussion of hybrid recommendation approaches.
This document discusses knowledge engineering and the use of knowledge on the web. It covers web data representation using standards like RDF, HTML5 and SKOS. It discusses categorizing knowledge from different sources and aligning categories. It also discusses using knowledge through techniques like visualization, graph-based search across linked data, and improving search through vocabulary alignment and location-based queries.
Beyond document retrieval using semantic annotations Roi Blanco
Traditional information retrieval approaches deal with retrieving full-text document as a response to a user's query. However, applications that go beyond the "ten blue links" and make use of additional information to display and interact with search results are becoming increasingly popular and adopted by all major search engines. In addition, recent advances in text extraction allow for inferring semantic information over particular items present in textual documents. This talks presents how enhancing a document with structures derived from shallow parsing is able to convey a different user experience in search and browsing scenarios, and what challenges we face as a consequence.
Similar to Learning Emergent Knowledge from Blog Postings (20)
The document discusses using geo-semantics and hybrid reasoning for developing smart mobile services, focusing on how combining location data, social networks, and semantics can enable new mobile applications and services. It provides examples of modeling geo-data and ontologies for representing location information and describes how semantic queries can be used to retrieve relevant points-of-interest data. The document also outlines some mobile APIs for accessing semantic geo-data from Android and iPhone applications.
The document discusses geo-social semantics and hybrid reasoning for smart mobile services. It covers topics like linked data networks, representing knowledge in different formats, and an example ontology showing relationships between concepts like employees, companies, and business trips. Tony Lee from Saltlux presents on using semantics and hybrid reasoning to power intelligent location-based applications.
Web Scale Reasoning and the LarKC ProjectSaltlux Inc.
The LarKC project aims to build an integrated pluggable platform for large-scale reasoning. It supports parallelization, distribution, and remote execution. The LarKC platform provides a lightweight core that gives standardized interfaces for combining plug-in components, while the real work is done in the plug-ins. There are three types of LarKC users: those building plug-ins, configuring workflows, and using workflows.
The Semantic Technology Business: EuropeSaltlux Inc.
This document summarizes the state of semantic technology and artificial intelligence. It notes that while there has been enormous research, companies applying these technologies are just beginning to emerge. Semantic technology is being used increasingly for shallow semantics like tagging on websites and social networks. Deeper uses involving ontologies and rules are still emerging. The document compares the European focus on enabling corporate and public services versus the US consumer focus. It outlines a potential platform for large-scale semantic computing to enable startups and discusses opportunities in both business and consumer applications.
2. 이이이이 세션은세션은세션은세션은
• Social Web (and Social Web search) is a great thing, but…
– 수많은 사람들의 경험을, 아주 쉽게 검색할 수 있다
• 영화, 책, 여행지, 음악, …
– 아직도, 소셜웹에는 더 많은 것이 (검색 되지 않고 남아) 있다.
– 개별 경험 이상의 것: 많은 숫자의 다양한 “경험”들이 모이면
• 트랜드, 숨어있던 관계, 새로운 지식, …
• Social Web + Semantic Web Technology
– A prototype “Experience Search” system
– 새로운 종류의 정보 요구
• 여성 얼리어댑터들이 좋아하는 MP3 플레이어들은?
• 젊은이들이, 시대를 타지 않고 꾸준히 읽는 책을 리스트 해 달라.
• 폴 오스터의 책을 좋아 사람들이 요즘 읽은 책과, 그들의 관련 포스팅
을 보고 싶다.
• 남자는 스릴러, 여자는 로맨스를 읽는다는데, 정말 그럴까?
3. Overview
• A Semantic Search System on Social Web Content
• Social Web + Semantic Web
– Social Web Content
• Blog postings
• Experiences of Web users
– Semantic Web Technology– Semantic Web Technology
• Publishing portable-data
• Accessing web-based open knowledge
4. Overview
• By the term “Semantic Search”…
– Not by “text matching”
– But by satisfying the “conditions” given in the query.
• By “Experience Search”…
– On the “topics” of Bloggers
– Example queries
• “20대가 선호하는 mp3 플레이어는?” (mp3 players that are favored by
20s.)
• “폴 오스터 팬들이 요즘 읽는 책은?” (List the books that paul auster fans
read these days.)
• “애플제품 매니아들이 요즘 이야기 하는 최신 전자제품은?” (List the
devices that are being talked by apple-lovers.)
• “남자는 스릴러, 여자는 로맨스를 읽는다는데, 정말 그럴까?” (Men
read thrillers, women read romances. Is it true in Blogosphere?)
5. Overview
• Challenges!
– 1) Blog postings are free-text.
• No semantics
• No explicit/machine-readable topics
– 2) Database/Ontology does not have such information.
• For example, our book ontology does not know that a book• For example, our book ontology does not know that a book
is favored by some group or not.
• How to draw such a “previously unknown”, “not recorded
in the DB” type of knowledge?
6. The Idea
• Answer for Challenge 1 : Semantic Blogs
– A little semantics from blog postings.
– Topic: what is the topic of this posting?
• Semantic Blog with Semantic Tags
– Converting conventional blogs to semantic blogs
– Blogger: who is the blogger?– Blogger: who is the blogger?
• Basic information about for each bloggers
– Age group, gender, job
– Published in FOAF (Friend-of-a-Friend)
– Manually published + predicted by maching-learning
7. The Idea
• Answer for Challenge 2 : Emergent Knowledge
– Connections make new information
– Some blog postings are about specific topic-items.
• They draws a new connection between the author (blogger)
and the topic item (book, IT-device, movie, etc)
• New tendency/relationships can be found from this• New tendency/relationships can be found from this
connections,
• If large number of such connections are available.
8. Emerging Information from
Connections
• Sci-fi Fan Example
Book Ontology
-Book Title
-ISBN
-Book Author
-Genre
Blog Postings
(SemTag)
-Topic (->)
-Date/Time
-Blogger (->)
Personal Info
(FOAF)
-age
-gender
-address -Genre
-Publisher
-Blogger (->)-address
topic
Blogger
22
Female
Daegu
-> (uri)
2010.03.
<- (uri)
The Vor Game
9788989571506
Lois Bujold
Sci-fi
Baen Books
9. Emerging Information from
Connections
• Sci-fi Fan Example
Personal Info
(FOAF)
-age
-gender
-address
Blog Postings
(SemTag)
-Topic (->)
-Date/Time
-Blogger (->)
Book Ontology
-Book Title
-ISBN
-Book Author
-Genre-address -Blogger (->) -Genre
-Publisher
topic
genre
Sci-Fi
Blogger
SciSciSciSci----Fi fanFi fanFi fanFi fan
10. Emerging Information from
Connections
• Examples: Emerging information from connections
– 20대가 선호하는 기기 (favored by age-group 20s)
– “반지의 제왕”을 읽은 사람들이 (bloggers who have read
the book “Lord of the rings”. )
– 올해의 베스트셀러 탑 50 (top 50 books of this year)
– 폴 오스터 책을 많이 읽은 블로거 (bloggers who have– 폴 오스터 책을 많이 읽은 블로거 (bloggers who have
read many books of author paul auster)
13. Implementation: Converting
Conventional postings to SemBlogs
• Problem
– To acquire “emergent knowledge”, we need a lot of postings
with semantic tags.
– There aren’t many semantic blogs, yet.
• Answer
– There are a large number of “topic-known” blog postings.– There are a large number of “topic-known” blog postings.
– Let’s convert such postings to semantic blog postings
14. Implementation: Converting
Conventional postings to SemBlogs
• DB-links in conventional blogs
– DB-links: Ability to explicitly mark the topic by making a
link to Database Item of portal services.
• Naver (DB-attachment), Daum (DB-link): movie, books,
• Yes24/Alladdin blogs: books, IT-devices
– In essential, they are “semantic tags” in limited domain– In essential, they are “semantic tags” in limited domain
• Postings
– Collected nearly 100,000 Blog postings with DB-link
– Converted into Semantic Blog postings (event instances)
– Postings about “movies”, “books”, “IT devices”, “travel
locations”.
15. Implementation: Converting
Conventional postings to SemBlogs
• Blogger information
– Among the collected postings, 2000 bloggers have been
selected.
• Who posted more than 20 topic-known postings.
– Manually tagged FOAF info for 2000 bloggers
• Age, Gender, Home location (city level), Occupation.• Age, Gender, Home location (city level), Occupation.
– Their blog texts are then become the training data
– Classification methods have been applied to other bloggers
– In total 5000+ bloggers have been collected for search data
• The data
– 5000+ bloggers, 100,000+ postings, over 3 years.
16. Implementation: Selecting Domain
Ontologies
• Domain ontologies are needed
– DBPedia could provide good topic-vocabulary…
• However, not enough Korean books and locations in the
DBPedia.
– Domain ontologies are separately prepared for the search
systemsystem
– Travel locations
• GeoNames ontology (geonames.org)
– Books
• Book ontology (bizier et al.)
– IT devices
• IT ontology (Kaist CoreOnto Ontology)
17.
18. Implementation: The Main Idea (again),
and Semantic Labels
• “Simple and large (instances)” is better than “rich and few”
• Simple semantics from texts/blog postings
– Relatively easy to achieve in large numbers
• From Large number of Instances
– Large number of “connections” can be found
– Knowledge that are not described in the ontology can be
found from the connections
• How normal users can explicitly use/find such connections?
– Name the patterns: Semantic Labels
19. Implementation: Semantic Labels
• Semantic Labels
– Connect human concepts to graph-patterns
– Graph patterns are described in SPARQL
• SPARQL is query language, which can also be used as a
rule language
– With additions of Aggregation functions, etc.– With additions of Aggregation functions, etc.
– Name the “Findings”
• In the implementation, new findings are attached to
instances as a label
• This label can be used in the semantic search.
• Rule-based findings of meaningful patterns
21. Search System Architecture
advanced
users Semantic
Label
Definitions
Rule Process
Module
Query
Search
Module
keyword
Search
Inference and modify
SPARQL queries
Rule
authoring
RDF Store
People
Event
Domain
Ontologies
users
User
Interface
Process
Module
Module
Analysis
Module
keyword
queries
Search result
in XML
Analysis
request
Analysis
request
query
Analysis result
in XML
Event
Ontology
FOAF
Instances
Ontologies
26. 결론결론결론결론
• 블로그스피어에서 찾는 창발적 지식(Emergent
Knowledge)
– 블로그 포스팅을 연결삼아 (Blog postings as
“Connections”)
– 새로운 지식 발견이 가능
• “Simple Semantic goes a long way”• “Simple Semantic goes a long way”
– 단순한 Semantic (data), 다양한 사례 (Instances)
• Social Web + Semantic Web
27. Additional Information
about the system
• Detailed information about the system, and its evaluation can
be found in the paper, doi:10.1016/j.websem.2010.05.001
TG Noh et al., Learning the emergent knowledge from annotated blog postings,
Web Semantics: Science, Services and Agents on the World Wide Web, 2010
• You can access the paper, data and prototype demo and its
video in
– http://nweb.knu.ac.kr/