SlideShare a Scribd company logo
When ECM Meets the
                             Semantic Web


                             20 Oct 2011 - Olivier Grisel & Stefane Fermigier



           Open Source ECM


Thursday, October 20, 2011
Business Motivations


                               2




Thursday, October 20, 2011
Source: Wikipedia
Thursday, October 20, 2011
Source: Wikipedia
Thursday, October 20, 2011
The DIKW hierarchy




                             5




Thursday, October 20, 2011
But every coin has
                               another side


Thursday, October 20, 2011
Infobesity!




Thursday, October 20, 2011
A few figures
                     • 50% more data / content / information
                             produced every year
                     • 1.8 zettabytes of data produced in 2011
                             (= 1 billion terabytes)
                     • Employees are drowning in a sea of email,
                             status messages, etc., and spend on average
                             more than 6 hours / weeks unsuccessfully
                             searching for or recreating lost documents


Thursday, October 20, 2011
A Solution: the Semantic
        Web


                                   9




Thursday, October 20, 2011
A Brief History of the Web
            • Web 1.0 (1990-now): web of sites and pages,
              aka the World Wide Web
            • Web 2.0 (2000-now): web of people and of
              participation, aka the Social Web (Blogs, RSS,
              tags, Facebook, Wikipedia, etc.)
            • Web 3.0 (2010-now): web of data, of meaning
              and connected knowledge, aka the Semantic
              Web
                                                               10




Thursday, October 20, 2011
11




Thursday, October 20, 2011
“To a computer, then, the web is a flat,
                              boring world devoid of meaning”


          Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/           12




Thursday, October 20, 2011
“This is a pity, as in fact documents on the
                   web describe real objects and imaginary
                 concepts, and give particular relationships
                                   between them”
          Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/     13




Thursday, October 20, 2011
“Adding semantics to the web involves two things:
       allowing documents which have information in
      machine-readable forms, and allowing links to be
            created with relationship values.”
          Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
                                                               14




Thursday, October 20, 2011
“The Semantic Web is not a separate Web but an
        extension of the current one, in which information
           is given well-defined meaning, better enabling
        computers and people to work in cooperation.”

          Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
                                                               15




Thursday, October 20, 2011
Means and Tools


                             16




Thursday, October 20, 2011
4 stages


            • Extract meaning from raw data / content
            • Connect information to form knowledge
            • Reason about this knowledge
            • Present this knowledge in actionable form

                                                          17




Thursday, October 20, 2011
Extracting

            • Leverage metadata embedded in or associated with
              documents (when they exist)
            • Or use machine learning, NLP (Natural Language
              Processing) and image processing algorithms to
              extract meaning from text / images
            • Examples include: named entities extraction,
              automatic categorization / tagging, sentiment
              analysis, etc.
                                                                 18




Thursday, October 20, 2011
Interlude:
        Linked Open Data


                             19




Thursday, October 20, 2011
2007
                                    2008




                             2009   2010




                                       20




Thursday, October 20, 2011
2011!                 21




Thursday, October 20, 2011
Linking

            • Many Linked Open Data repositories have been
              made available over the last 10 years
            • RDF and graph database systems are now available
              to manage this huge mass of information (billions of
              triples)
            • Match information extracted from content with
              these public (or internal) data/knowledge bases
                                                                     22




Thursday, October 20, 2011
Reasoning

            • When you are working on reliable metadata (ex:
              RDFa embedded in web pages), you can use rule /
              inference engines to infer actionable knowledge
              from your content (ex: shopping recommendation
              engine)
            • Rules can also be used to clean up / flag errors
              when working with unreliable (e.g. automatically
              extracted) information
                                                                 23




Thursday, October 20, 2011
Presenting

            • Allow the users of your system to interact with the
              knowledge thus extracted or produced, in a way
              that allows them to do their jobs better
            • A smart presentation system solves the information
              overload issue by contextualizing the information,
              i.e. presenting only information relevant to what the
              user is currently doing

                                                                      24




Thursday, October 20, 2011
R&D Projects
        Involving Nuxeo


                             25




Thursday, October 20, 2011
IKS project

            • European R&D project under the FP7, with 13
                    partners (6 SMEs) and a 8.5M EUR budget

            • Goal: create a semantic software “stack” that
                    will be used by CMS vendors to add semantic
                    features to their products

            • Started in Jan. 2009, will last until Dec. 2012
            • First tangible result: Apache Stanbol
                    (more about this later)                       26




Thursday, October 20, 2011
SAMAR project
            • French collaborative R&D project with 10
              partners, and a 4.5M EUR budget
            • Goal: create a platform for managing
              multimedia content in arabic, for news agencies
              such as AFP
            • Will include: automated translation, named
              entities extraction, content classification
            • First results: integration between Nuxeo and
              Temis (more later)                                27




Thursday, October 20, 2011
State of the Art
        Semantic ECM at Nuxeo


                                28




Thursday, October 20, 2011
The Semantic Engine

           • From unstructured content to Knowledge

           • Language guessing

           • Topic classification (Business, Sports, Media, ...)

           • Named Entities extraction and linking

           • Relationships and properties extraction

                                                                  29




Thursday, October 20, 2011
Demo time!



                             30




Thursday, October 20, 2011
31




Thursday, October 20, 2011
32




Thursday, October 20, 2011
33




Thursday, October 20, 2011
RESTful
                                is
                             Beautiful




                                         34




Thursday, October 20, 2011
35




Thursday, October 20, 2011
36




Thursday, October 20, 2011
=
                                  Semantic Engines
                                 (Apache OpenNLP)
                                          +
                             Fast Linked Data local index
                                    (Apache Solr)
                                          +
                                Semantic Rule Engine        37


                                    (Apache Jena)
Thursday, October 20, 2011
Apache Stanbol

                                                                     Engine 1          DBpedia
                                                                     Engine 2


                                 2
           1                                                         Engine 3



                                                                                       Freebase

                    Nuxeo DM
                                                              3
                             addon
                                                                                       Geonames
                                                                                LDAP
                                     Local IT infrastructure (LAN)                                38




Thursday, October 20, 2011
How to build engines?



                                39




Thursday, October 20, 2011
Training statistical models for NER with
        Wikipedia and DBpedia

           •       Extract sentences with link positions in Wikipedia articles

           •       DBPedia to the find type of the target entity (Person,
                   Location, Organization)

           •       Apache Pig scripts to compute the join + format the result as
                   training files for OpenNLP

           •       Apache OpenNLP to build and evaluate the models

           •       Apache Hadoop for distributed processing

           •       Apache Whirr for deployment and management on Amazon
                   EC2 cluster
                                                                                   40




Thursday, October 20, 2011
41




Thursday, October 20, 2011
42




Thursday, October 20, 2011
43




Thursday, October 20, 2011
44




Thursday, October 20, 2011
Training statistical models for topic
        classification from Wikipedia and DBpedia


           •       Filter category tree from DBpedia SKOS entries (~500k)

           •       Pig scripts to compute the joins with articles abstracts for all
                   the articles categorized in Wikipedia

           •       Export as 2.8GB TSV file to be indexed in Apache Solr

           •       Use Solr MoreLikeThisHandler to find the top 3 most related
                   Wikipedia category for any kind of text

           •       Apache Whirr & Hadoop for deployment and management on
                   Amazon EC2 cluster
                                                                                      45




Thursday, October 20, 2011
Wrap Up on Recent Work

            • Full offline mode: Stanbol EntityHub
            • Multi-lingual Indexes
            • New UI for occurrences reviews
            • Temis Luxid Annotation Factory integration

                                                           46




Thursday, October 20, 2011
What’s next?

           • Stanbol and Temis connection in Admin Center

           • Embedded Stanbol mode for easy deployment

           • More OpenNLP models for more languages

           • Finalize topic classification - handle hierarchy

           • Tight integration with Nuxeo DM search features

                                                               47




Thursday, October 20, 2011
Thank you for your attention!




                                        48




Thursday, October 20, 2011

More Related Content

What's hot

LIBER and its EU projects
LIBER and its EU projectsLIBER and its EU projects
LIBER and its EU projects
LIBER Europe
 
Computing and Linguistics: A cognitive approach
Computing and Linguistics: A cognitive approachComputing and Linguistics: A cognitive approach
Computing and Linguistics: A cognitive approach
Steve Pepper
 
20101015 linked openeuropeanafi
20101015 linked openeuropeanafi20101015 linked openeuropeanafi
20101015 linked openeuropeanafi
Stefan Gradmann
 
Connecting Smart Things through Web services Orchestrations
Connecting Smart Things through Web services OrchestrationsConnecting Smart Things through Web services Orchestrations
Connecting Smart Things through Web services Orchestrations
Antonio Pintus
 
Nuxeo World Session: Semantic Technologies - Update on Recent Research
Nuxeo World Session: Semantic Technologies - Update on Recent ResearchNuxeo World Session: Semantic Technologies - Update on Recent Research
Nuxeo World Session: Semantic Technologies - Update on Recent Research
Nuxeo
 
Kick-off meeting Linkflows project
Kick-off meeting Linkflows projectKick-off meeting Linkflows project
The Future of Business Intelligence
The Future of Business IntelligenceThe Future of Business Intelligence
The Future of Business Intelligence
Tim O'Reilly
 
Social web & linked data
Social web & linked dataSocial web & linked data
Social web & linked data
Serge Garlatti
 
When the Wikipedians talk: network and tree structure of Wikipedia discussion...
When the Wikipedians talk: network and tree structure of Wikipedia discussion...When the Wikipedians talk: network and tree structure of Wikipedia discussion...
When the Wikipedians talk: network and tree structure of Wikipedia discussion...
David Laniado
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
Jon Voss
 
Open Source in the Cloud Computing Era
Open Source in the Cloud Computing EraOpen Source in the Cloud Computing Era
Open Source in the Cloud Computing Era
Tim O'Reilly
 
Accessibility: introduction
Accessibility: introduction  Accessibility: introduction
Accessibility: introduction
Andres Baravalle
 
NISO BISG Forum: Bibliographic Roadmap
NISO BISG Forum: Bibliographic RoadmapNISO BISG Forum: Bibliographic Roadmap
NISO BISG Forum: Bibliographic Roadmap
National Information Standards Organization (NISO)
 
Enhancing user access to european digital heritage
Enhancing user access to european digital heritageEnhancing user access to european digital heritage
Enhancing user access to european digital heritage
EuropeanaConnect
 
Twenty Years of Metadata: Lessons from the First Two Decades of the Web
Twenty Years of Metadata: Lessons from the First Two Decades of the WebTwenty Years of Metadata: Lessons from the First Two Decades of the Web
Twenty Years of Metadata: Lessons from the First Two Decades of the Web
Stuart Weibel
 
Drupalcon keynote: Open Source and Open Data in the age of the cloud
Drupalcon keynote: Open Source and Open Data in the age of the cloudDrupalcon keynote: Open Source and Open Data in the age of the cloud
Drupalcon keynote: Open Source and Open Data in the age of the cloud
Tim O'Reilly
 
Everything is a Subject: The vision of subject-centric computing
Everything is a Subject: The vision of subject-centric computingEverything is a Subject: The vision of subject-centric computing
Everything is a Subject: The vision of subject-centric computing
Steve Pepper
 
Brink uksg 2013 3
Brink uksg 2013 3Brink uksg 2013 3
Localbysocial sunderland
Localbysocial sunderlandLocalbysocial sunderland
Localbysocial sunderland
localgovuk
 

What's hot (19)

LIBER and its EU projects
LIBER and its EU projectsLIBER and its EU projects
LIBER and its EU projects
 
Computing and Linguistics: A cognitive approach
Computing and Linguistics: A cognitive approachComputing and Linguistics: A cognitive approach
Computing and Linguistics: A cognitive approach
 
20101015 linked openeuropeanafi
20101015 linked openeuropeanafi20101015 linked openeuropeanafi
20101015 linked openeuropeanafi
 
Connecting Smart Things through Web services Orchestrations
Connecting Smart Things through Web services OrchestrationsConnecting Smart Things through Web services Orchestrations
Connecting Smart Things through Web services Orchestrations
 
Nuxeo World Session: Semantic Technologies - Update on Recent Research
Nuxeo World Session: Semantic Technologies - Update on Recent ResearchNuxeo World Session: Semantic Technologies - Update on Recent Research
Nuxeo World Session: Semantic Technologies - Update on Recent Research
 
Kick-off meeting Linkflows project
Kick-off meeting Linkflows projectKick-off meeting Linkflows project
Kick-off meeting Linkflows project
 
The Future of Business Intelligence
The Future of Business IntelligenceThe Future of Business Intelligence
The Future of Business Intelligence
 
Social web & linked data
Social web & linked dataSocial web & linked data
Social web & linked data
 
When the Wikipedians talk: network and tree structure of Wikipedia discussion...
When the Wikipedians talk: network and tree structure of Wikipedia discussion...When the Wikipedians talk: network and tree structure of Wikipedia discussion...
When the Wikipedians talk: network and tree structure of Wikipedia discussion...
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
Open Source in the Cloud Computing Era
Open Source in the Cloud Computing EraOpen Source in the Cloud Computing Era
Open Source in the Cloud Computing Era
 
Accessibility: introduction
Accessibility: introduction  Accessibility: introduction
Accessibility: introduction
 
NISO BISG Forum: Bibliographic Roadmap
NISO BISG Forum: Bibliographic RoadmapNISO BISG Forum: Bibliographic Roadmap
NISO BISG Forum: Bibliographic Roadmap
 
Enhancing user access to european digital heritage
Enhancing user access to european digital heritageEnhancing user access to european digital heritage
Enhancing user access to european digital heritage
 
Twenty Years of Metadata: Lessons from the First Two Decades of the Web
Twenty Years of Metadata: Lessons from the First Two Decades of the WebTwenty Years of Metadata: Lessons from the First Two Decades of the Web
Twenty Years of Metadata: Lessons from the First Two Decades of the Web
 
Drupalcon keynote: Open Source and Open Data in the age of the cloud
Drupalcon keynote: Open Source and Open Data in the age of the cloudDrupalcon keynote: Open Source and Open Data in the age of the cloud
Drupalcon keynote: Open Source and Open Data in the age of the cloud
 
Everything is a Subject: The vision of subject-centric computing
Everything is a Subject: The vision of subject-centric computingEverything is a Subject: The vision of subject-centric computing
Everything is a Subject: The vision of subject-centric computing
 
Brink uksg 2013 3
Brink uksg 2013 3Brink uksg 2013 3
Brink uksg 2013 3
 
Localbysocial sunderland
Localbysocial sunderlandLocalbysocial sunderland
Localbysocial sunderland
 

Viewers also liked

The Nuxeo Way: leveraging open source to build a world-class ECM platform
The Nuxeo Way: leveraging open source to build a world-class ECM platformThe Nuxeo Way: leveraging open source to build a world-class ECM platform
The Nuxeo Way: leveraging open source to build a world-class ECM platform
Nuxeo
 
Challenges du recrutement pour un editeur de logiciel libre
Challenges du recrutement pour un editeur de logiciel libreChallenges du recrutement pour un editeur de logiciel libre
Challenges du recrutement pour un editeur de logiciel libre
Stefane Fermigier
 
Nuxeo World Session: Mobile ECM Apps with Nuxeo EP
Nuxeo World Session: Mobile ECM Apps with Nuxeo EPNuxeo World Session: Mobile ECM Apps with Nuxeo EP
Nuxeo World Session: Mobile ECM Apps with Nuxeo EP
Nuxeo
 
Lessons learned Building Nuxeo EP - Component-based, open source ECM platform
Lessons learned Building Nuxeo EP - Component-based, open source ECM platformLessons learned Building Nuxeo EP - Component-based, open source ECM platform
Lessons learned Building Nuxeo EP - Component-based, open source ECM platform
Nuxeo
 
Eclipse Apogee and Nuxeo RCP
Eclipse Apogee and Nuxeo RCPEclipse Apogee and Nuxeo RCP
Eclipse Apogee and Nuxeo RCP
Stefane Fermigier
 
Nuxeo at 10
Nuxeo at 10Nuxeo at 10
Nuxeo at 10
Stefane Fermigier
 
Le Marché du Logiciel Libre en France en 2010
Le Marché du Logiciel Libre en France en 2010Le Marché du Logiciel Libre en France en 2010
Le Marché du Logiciel Libre en France en 2010
Stefane Fermigier
 
What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009
Stefane Fermigier
 
GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011
Stefane Fermigier
 
A Quick Tour of JVM Languages
A Quick Tour of JVM LanguagesA Quick Tour of JVM Languages
A Quick Tour of JVM Languages
Stefane Fermigier
 
Nuxeo on the Cloud - Nuxeo World 2011
Nuxeo on the Cloud - Nuxeo World 2011Nuxeo on the Cloud - Nuxeo World 2011
Nuxeo on the Cloud - Nuxeo World 2011
Stefane Fermigier
 

Viewers also liked (13)

The Nuxeo Way: leveraging open source to build a world-class ECM platform
The Nuxeo Way: leveraging open source to build a world-class ECM platformThe Nuxeo Way: leveraging open source to build a world-class ECM platform
The Nuxeo Way: leveraging open source to build a world-class ECM platform
 
Challenges du recrutement pour un editeur de logiciel libre
Challenges du recrutement pour un editeur de logiciel libreChallenges du recrutement pour un editeur de logiciel libre
Challenges du recrutement pour un editeur de logiciel libre
 
Open Cloud Computing @ GTLL
Open Cloud Computing @ GTLLOpen Cloud Computing @ GTLL
Open Cloud Computing @ GTLL
 
Nuxeo World Session: Mobile ECM Apps with Nuxeo EP
Nuxeo World Session: Mobile ECM Apps with Nuxeo EPNuxeo World Session: Mobile ECM Apps with Nuxeo EP
Nuxeo World Session: Mobile ECM Apps with Nuxeo EP
 
Lessons learned Building Nuxeo EP - Component-based, open source ECM platform
Lessons learned Building Nuxeo EP - Component-based, open source ECM platformLessons learned Building Nuxeo EP - Component-based, open source ECM platform
Lessons learned Building Nuxeo EP - Component-based, open source ECM platform
 
Eclipse Apogee and Nuxeo RCP
Eclipse Apogee and Nuxeo RCPEclipse Apogee and Nuxeo RCP
Eclipse Apogee and Nuxeo RCP
 
Nuxeo at 10
Nuxeo at 10Nuxeo at 10
Nuxeo at 10
 
Le Marché du Logiciel Libre en France en 2010
Le Marché du Logiciel Libre en France en 2010Le Marché du Logiciel Libre en France en 2010
Le Marché du Logiciel Libre en France en 2010
 
What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009
 
GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011
 
A Quick Tour of JVM Languages
A Quick Tour of JVM LanguagesA Quick Tour of JVM Languages
A Quick Tour of JVM Languages
 
Nuxeo on the Cloud - Nuxeo World 2011
Nuxeo on the Cloud - Nuxeo World 2011Nuxeo on the Cloud - Nuxeo World 2011
Nuxeo on the Cloud - Nuxeo World 2011
 
Cours ECM à l'EPITA
Cours ECM à l'EPITACours ECM à l'EPITA
Cours ECM à l'EPITA
 

Similar to ECM Meets the Semantic Web - Nuxeo World 2011

6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)
6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)
6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)
bisg
 
OpenAIRE at e-infrastructures DC-NET Brussels, October 2010
OpenAIRE at e-infrastructures DC-NET Brussels, October 2010OpenAIRE at e-infrastructures DC-NET Brussels, October 2010
OpenAIRE at e-infrastructures DC-NET Brussels, October 2010
OpenAIRE
 
Semantic Technologies for Cultural Heritage
Semantic Technologies for Cultural HeritageSemantic Technologies for Cultural Heritage
Semantic Technologies for Cultural Heritage
Vladimir Alexiev, PhD, PMP
 
APIs and URLs for Social TV
APIs and URLs for Social TVAPIs and URLs for Social TV
APIs and URLs for Social TV
Dan Brickley
 
DCI - Data, Context and Interaction @ Jug Genova April 2011
DCI - Data, Context and Interaction @ Jug Genova April 2011DCI - Data, Context and Interaction @ Jug Genova April 2011
DCI - Data, Context and Interaction @ Jug Genova April 2011
Fabrizio Giudici
 
06 making information pay 2011 -- solomon, madi (pearson)
06   making information pay 2011 -- solomon, madi (pearson)06   making information pay 2011 -- solomon, madi (pearson)
06 making information pay 2011 -- solomon, madi (pearson)
bisg
 
Antonio Pintus- TouchTheWeb 2010
Antonio Pintus- TouchTheWeb 2010Antonio Pintus- TouchTheWeb 2010
Antonio Pintus- TouchTheWeb 2010
CRS4 Research Center in Sardinia
 
Trove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian ParliamentTrove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian Parliament
Rose Holley
 
Web heresies
Web heresiesWeb heresies
Web heresies
James Aylett
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria
 
Intro to Linked Data: Context
Intro to Linked Data: ContextIntro to Linked Data: Context
Intro to Linked Data: Context
David Wood
 
Going Global - Workshop Version - Fall 2011
Going Global - Workshop Version - Fall 2011Going Global - Workshop Version - Fall 2011
Going Global - Workshop Version - Fall 2011
Lucy Gray
 
Sharing knowledge 2011
Sharing knowledge 2011Sharing knowledge 2011
Sharing knowledge 2011
Rudolf Mumenthaler
 
Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...
Rose Holley
 
Linked Open Data
Linked Open DataLinked Open Data
Collaborative Culture
Collaborative   CultureCollaborative   Culture
Collaborative Culture
roger Pitiot
 
Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...
Olaf Janssen
 
Authorities as Linked Data Hubs
Authorities  as Linked Data HubsAuthorities  as Linked Data Hubs
Authorities as Linked Data Hubs
Richard Wallis
 
Migration from FAST ESP to Solr
Migration from FAST ESP to SolrMigration from FAST ESP to Solr
Migration from FAST ESP to Solr
TNR Global
 
In the land of the blind the squinter rules
In the land of the blind the squinter rulesIn the land of the blind the squinter rules
In the land of the blind the squinter rules
wremes
 

Similar to ECM Meets the Semantic Web - Nuxeo World 2011 (20)

6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)
6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)
6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)
 
OpenAIRE at e-infrastructures DC-NET Brussels, October 2010
OpenAIRE at e-infrastructures DC-NET Brussels, October 2010OpenAIRE at e-infrastructures DC-NET Brussels, October 2010
OpenAIRE at e-infrastructures DC-NET Brussels, October 2010
 
Semantic Technologies for Cultural Heritage
Semantic Technologies for Cultural HeritageSemantic Technologies for Cultural Heritage
Semantic Technologies for Cultural Heritage
 
APIs and URLs for Social TV
APIs and URLs for Social TVAPIs and URLs for Social TV
APIs and URLs for Social TV
 
DCI - Data, Context and Interaction @ Jug Genova April 2011
DCI - Data, Context and Interaction @ Jug Genova April 2011DCI - Data, Context and Interaction @ Jug Genova April 2011
DCI - Data, Context and Interaction @ Jug Genova April 2011
 
06 making information pay 2011 -- solomon, madi (pearson)
06   making information pay 2011 -- solomon, madi (pearson)06   making information pay 2011 -- solomon, madi (pearson)
06 making information pay 2011 -- solomon, madi (pearson)
 
Antonio Pintus- TouchTheWeb 2010
Antonio Pintus- TouchTheWeb 2010Antonio Pintus- TouchTheWeb 2010
Antonio Pintus- TouchTheWeb 2010
 
Trove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian ParliamentTrove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian Parliament
 
Web heresies
Web heresiesWeb heresies
Web heresies
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)
 
Intro to Linked Data: Context
Intro to Linked Data: ContextIntro to Linked Data: Context
Intro to Linked Data: Context
 
Going Global - Workshop Version - Fall 2011
Going Global - Workshop Version - Fall 2011Going Global - Workshop Version - Fall 2011
Going Global - Workshop Version - Fall 2011
 
Sharing knowledge 2011
Sharing knowledge 2011Sharing knowledge 2011
Sharing knowledge 2011
 
Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Collaborative Culture
Collaborative   CultureCollaborative   Culture
Collaborative Culture
 
Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...
 
Authorities as Linked Data Hubs
Authorities  as Linked Data HubsAuthorities  as Linked Data Hubs
Authorities as Linked Data Hubs
 
Migration from FAST ESP to Solr
Migration from FAST ESP to SolrMigration from FAST ESP to Solr
Migration from FAST ESP to Solr
 
In the land of the blind the squinter rules
In the land of the blind the squinter rulesIn the land of the blind the squinter rules
In the land of the blind the squinter rules
 

More from Stefane Fermigier

Pitch Abilian - Paris Open Source Summit 2015
Pitch Abilian - Paris Open Source Summit 2015Pitch Abilian - Paris Open Source Summit 2015
Pitch Abilian - Paris Open Source Summit 2015
Stefane Fermigier
 
15 ans de politiques publiques du logiciel libre en France
15 ans de politiques publiques du logiciel libre en France15 ans de politiques publiques du logiciel libre en France
15 ans de politiques publiques du logiciel libre en France
Stefane Fermigier
 
Créer une communauté open source: pourquoi ? comment ?
Créer une communauté open source: pourquoi ? comment ?Créer une communauté open source: pourquoi ? comment ?
Créer une communauté open source: pourquoi ? comment ?
Stefane Fermigier
 
L'open source professionnel - un business model open source
L'open source professionnel - un business model open sourceL'open source professionnel - un business model open source
L'open source professionnel - un business model open source
Stefane Fermigier
 
Roadmap du GT Logiciel Libre 2013-2020
Roadmap du GT Logiciel Libre 2013-2020Roadmap du GT Logiciel Libre 2013-2020
Roadmap du GT Logiciel Libre 2013-2020
Stefane Fermigier
 
Le MOOC powered by Abilian - Plateforme open source de MOOC
Le MOOC powered by Abilian - Plateforme open source de MOOCLe MOOC powered by Abilian - Plateforme open source de MOOC
Le MOOC powered by Abilian - Plateforme open source de MOOC
Stefane Fermigier
 
Open Innovation in Action
Open Innovation in ActionOpen Innovation in Action
Open Innovation in Action
Stefane Fermigier
 
Pourquoi le big data open source ?
Pourquoi le big data open source ?Pourquoi le big data open source ?
Pourquoi le big data open source ?
Stefane Fermigier
 
Save the date OWF 2013
Save the date OWF 2013Save the date OWF 2013
Save the date OWF 2013
Stefane Fermigier
 
Pleniere du GT Logiciel Libre, janvier 2013
Pleniere du GT Logiciel Libre, janvier 2013Pleniere du GT Logiciel Libre, janvier 2013
Pleniere du GT Logiciel Libre, janvier 2013
Stefane Fermigier
 
OWF 2012 Outcome
OWF 2012 OutcomeOWF 2012 Outcome
OWF 2012 Outcome
Stefane Fermigier
 
Four Python Pains
Four Python PainsFour Python Pains
Four Python Pains
Stefane Fermigier
 
Nuxeo, an open source platform for content-centric business applications
Nuxeo, an open source platform for content-centric business applicationsNuxeo, an open source platform for content-centric business applications
Nuxeo, an open source platform for content-centric business applications
Stefane Fermigier
 
Open World Forum 2011 - Overview
Open World Forum 2011 - OverviewOpen World Forum 2011 - Overview
Open World Forum 2011 - Overview
Stefane Fermigier
 
Plénière du GT Logiciel Libre - Février 2011
Plénière du GT Logiciel Libre - Février 2011Plénière du GT Logiciel Libre - Février 2011
Plénière du GT Logiciel Libre - Février 2011
Stefane Fermigier
 
Samar - Premier bilan d'étape - Oct. 2010
Samar - Premier bilan d'étape - Oct. 2010Samar - Premier bilan d'étape - Oct. 2010
Samar - Premier bilan d'étape - Oct. 2010
Stefane Fermigier
 
Pleniere du GTLL - Septembre 2010
Pleniere du GTLL - Septembre 2010Pleniere du GTLL - Septembre 2010
Pleniere du GTLL - Septembre 2010
Stefane Fermigier
 

More from Stefane Fermigier (20)

Pitch Abilian - Paris Open Source Summit 2015
Pitch Abilian - Paris Open Source Summit 2015Pitch Abilian - Paris Open Source Summit 2015
Pitch Abilian - Paris Open Source Summit 2015
 
15 ans de politiques publiques du logiciel libre en France
15 ans de politiques publiques du logiciel libre en France15 ans de politiques publiques du logiciel libre en France
15 ans de politiques publiques du logiciel libre en France
 
Créer une communauté open source: pourquoi ? comment ?
Créer une communauté open source: pourquoi ? comment ?Créer une communauté open source: pourquoi ? comment ?
Créer une communauté open source: pourquoi ? comment ?
 
L'open source professionnel - un business model open source
L'open source professionnel - un business model open sourceL'open source professionnel - un business model open source
L'open source professionnel - un business model open source
 
Roadmap du GT Logiciel Libre 2013-2020
Roadmap du GT Logiciel Libre 2013-2020Roadmap du GT Logiciel Libre 2013-2020
Roadmap du GT Logiciel Libre 2013-2020
 
Le MOOC powered by Abilian - Plateforme open source de MOOC
Le MOOC powered by Abilian - Plateforme open source de MOOCLe MOOC powered by Abilian - Plateforme open source de MOOC
Le MOOC powered by Abilian - Plateforme open source de MOOC
 
Pitch Abilian mai 2013
Pitch Abilian mai 2013Pitch Abilian mai 2013
Pitch Abilian mai 2013
 
Open Innovation in Action
Open Innovation in ActionOpen Innovation in Action
Open Innovation in Action
 
Pourquoi le big data open source ?
Pourquoi le big data open source ?Pourquoi le big data open source ?
Pourquoi le big data open source ?
 
Save the date OWF 2013
Save the date OWF 2013Save the date OWF 2013
Save the date OWF 2013
 
Ecosystemes logiciel libre
Ecosystemes logiciel libreEcosystemes logiciel libre
Ecosystemes logiciel libre
 
Pleniere du GT Logiciel Libre, janvier 2013
Pleniere du GT Logiciel Libre, janvier 2013Pleniere du GT Logiciel Libre, janvier 2013
Pleniere du GT Logiciel Libre, janvier 2013
 
OWF 2012 Outcome
OWF 2012 OutcomeOWF 2012 Outcome
OWF 2012 Outcome
 
Demo Cup 2012
Demo Cup 2012Demo Cup 2012
Demo Cup 2012
 
Four Python Pains
Four Python PainsFour Python Pains
Four Python Pains
 
Nuxeo, an open source platform for content-centric business applications
Nuxeo, an open source platform for content-centric business applicationsNuxeo, an open source platform for content-centric business applications
Nuxeo, an open source platform for content-centric business applications
 
Open World Forum 2011 - Overview
Open World Forum 2011 - OverviewOpen World Forum 2011 - Overview
Open World Forum 2011 - Overview
 
Plénière du GT Logiciel Libre - Février 2011
Plénière du GT Logiciel Libre - Février 2011Plénière du GT Logiciel Libre - Février 2011
Plénière du GT Logiciel Libre - Février 2011
 
Samar - Premier bilan d'étape - Oct. 2010
Samar - Premier bilan d'étape - Oct. 2010Samar - Premier bilan d'étape - Oct. 2010
Samar - Premier bilan d'étape - Oct. 2010
 
Pleniere du GTLL - Septembre 2010
Pleniere du GTLL - Septembre 2010Pleniere du GTLL - Septembre 2010
Pleniere du GTLL - Septembre 2010
 

Recently uploaded

The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
Anant Gupta
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
ldtexsolbl
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
digitalxplive
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
Steven Carlson
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
SAI KAILASH R
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
Edge AI and Vision Alliance
 

Recently uploaded (20)

The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
 

ECM Meets the Semantic Web - Nuxeo World 2011

  • 1. When ECM Meets the Semantic Web 20 Oct 2011 - Olivier Grisel & Stefane Fermigier Open Source ECM Thursday, October 20, 2011
  • 2. Business Motivations 2 Thursday, October 20, 2011
  • 5. The DIKW hierarchy 5 Thursday, October 20, 2011
  • 6. But every coin has another side Thursday, October 20, 2011
  • 8. A few figures • 50% more data / content / information produced every year • 1.8 zettabytes of data produced in 2011 (= 1 billion terabytes) • Employees are drowning in a sea of email, status messages, etc., and spend on average more than 6 hours / weeks unsuccessfully searching for or recreating lost documents Thursday, October 20, 2011
  • 9. A Solution: the Semantic Web 9 Thursday, October 20, 2011
  • 10. A Brief History of the Web • Web 1.0 (1990-now): web of sites and pages, aka the World Wide Web • Web 2.0 (2000-now): web of people and of participation, aka the Social Web (Blogs, RSS, tags, Facebook, Wikipedia, etc.) • Web 3.0 (2010-now): web of data, of meaning and connected knowledge, aka the Semantic Web 10 Thursday, October 20, 2011
  • 12. “To a computer, then, the web is a flat, boring world devoid of meaning” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/ 12 Thursday, October 20, 2011
  • 13. “This is a pity, as in fact documents on the web describe real objects and imaginary concepts, and give particular relationships between them” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/ 13 Thursday, October 20, 2011
  • 14. “Adding semantics to the web involves two things: allowing documents which have information in machine-readable forms, and allowing links to be created with relationship values.” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/ 14 Thursday, October 20, 2011
  • 15. “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/ 15 Thursday, October 20, 2011
  • 16. Means and Tools 16 Thursday, October 20, 2011
  • 17. 4 stages • Extract meaning from raw data / content • Connect information to form knowledge • Reason about this knowledge • Present this knowledge in actionable form 17 Thursday, October 20, 2011
  • 18. Extracting • Leverage metadata embedded in or associated with documents (when they exist) • Or use machine learning, NLP (Natural Language Processing) and image processing algorithms to extract meaning from text / images • Examples include: named entities extraction, automatic categorization / tagging, sentiment analysis, etc. 18 Thursday, October 20, 2011
  • 19. Interlude: Linked Open Data 19 Thursday, October 20, 2011
  • 20. 2007 2008 2009 2010 20 Thursday, October 20, 2011
  • 21. 2011! 21 Thursday, October 20, 2011
  • 22. Linking • Many Linked Open Data repositories have been made available over the last 10 years • RDF and graph database systems are now available to manage this huge mass of information (billions of triples) • Match information extracted from content with these public (or internal) data/knowledge bases 22 Thursday, October 20, 2011
  • 23. Reasoning • When you are working on reliable metadata (ex: RDFa embedded in web pages), you can use rule / inference engines to infer actionable knowledge from your content (ex: shopping recommendation engine) • Rules can also be used to clean up / flag errors when working with unreliable (e.g. automatically extracted) information 23 Thursday, October 20, 2011
  • 24. Presenting • Allow the users of your system to interact with the knowledge thus extracted or produced, in a way that allows them to do their jobs better • A smart presentation system solves the information overload issue by contextualizing the information, i.e. presenting only information relevant to what the user is currently doing 24 Thursday, October 20, 2011
  • 25. R&D Projects Involving Nuxeo 25 Thursday, October 20, 2011
  • 26. IKS project • European R&D project under the FP7, with 13 partners (6 SMEs) and a 8.5M EUR budget • Goal: create a semantic software “stack” that will be used by CMS vendors to add semantic features to their products • Started in Jan. 2009, will last until Dec. 2012 • First tangible result: Apache Stanbol (more about this later) 26 Thursday, October 20, 2011
  • 27. SAMAR project • French collaborative R&D project with 10 partners, and a 4.5M EUR budget • Goal: create a platform for managing multimedia content in arabic, for news agencies such as AFP • Will include: automated translation, named entities extraction, content classification • First results: integration between Nuxeo and Temis (more later) 27 Thursday, October 20, 2011
  • 28. State of the Art Semantic ECM at Nuxeo 28 Thursday, October 20, 2011
  • 29. The Semantic Engine • From unstructured content to Knowledge • Language guessing • Topic classification (Business, Sports, Media, ...) • Named Entities extraction and linking • Relationships and properties extraction 29 Thursday, October 20, 2011
  • 30. Demo time! 30 Thursday, October 20, 2011
  • 34. RESTful is Beautiful 34 Thursday, October 20, 2011
  • 37. = Semantic Engines (Apache OpenNLP) + Fast Linked Data local index (Apache Solr) + Semantic Rule Engine 37 (Apache Jena) Thursday, October 20, 2011
  • 38. Apache Stanbol Engine 1 DBpedia Engine 2 2 1 Engine 3 Freebase Nuxeo DM 3 addon Geonames LDAP Local IT infrastructure (LAN) 38 Thursday, October 20, 2011
  • 39. How to build engines? 39 Thursday, October 20, 2011
  • 40. Training statistical models for NER with Wikipedia and DBpedia • Extract sentences with link positions in Wikipedia articles • DBPedia to the find type of the target entity (Person, Location, Organization) • Apache Pig scripts to compute the join + format the result as training files for OpenNLP • Apache OpenNLP to build and evaluate the models • Apache Hadoop for distributed processing • Apache Whirr for deployment and management on Amazon EC2 cluster 40 Thursday, October 20, 2011
  • 45. Training statistical models for topic classification from Wikipedia and DBpedia • Filter category tree from DBpedia SKOS entries (~500k) • Pig scripts to compute the joins with articles abstracts for all the articles categorized in Wikipedia • Export as 2.8GB TSV file to be indexed in Apache Solr • Use Solr MoreLikeThisHandler to find the top 3 most related Wikipedia category for any kind of text • Apache Whirr & Hadoop for deployment and management on Amazon EC2 cluster 45 Thursday, October 20, 2011
  • 46. Wrap Up on Recent Work • Full offline mode: Stanbol EntityHub • Multi-lingual Indexes • New UI for occurrences reviews • Temis Luxid Annotation Factory integration 46 Thursday, October 20, 2011
  • 47. What’s next? • Stanbol and Temis connection in Admin Center • Embedded Stanbol mode for easy deployment • More OpenNLP models for more languages • Finalize topic classification - handle hierarchy • Tight integration with Nuxeo DM search features 47 Thursday, October 20, 2011
  • 48. Thank you for your attention! 48 Thursday, October 20, 2011