SlideShare a Scribd company logo

The Need for and fundamentals of an Open Web Index

The Need for and fundamentals of an Open Web Index

1 of 17
Download to read offline
THE NEED FOR AND FUNDAMENTALS OF
AN OPEN WEB INDEX
Prof. Dr. Dirk Lewandowski
Hamburg University of Applied Sciences, Hamburg, Germany
dirk.lewandowski@haw-hamburg.de
First International Symposium on Open Search Technology
Garching, 23 October, 2019
Proposal for an Open Web Index (OWI)
Prof. Dr. Dirk Lewandowski
ABOUT ME
• Professor of Information Research and
Information Retrieval at Hamburg
University of Applied Sciences
• Author of 100+ scholarly articles on
search engines
• German-language book “Suchmaschinen
verstehen” (Springer, 2nd edition, 2018)
• Editor, Aslib Journal of Information
Management (Emerald Publishing)
• Served as expert for the High Court of
Justice (UK) and Deutscher Bundestag
(German parliament)
1
https://searchstudies.org/dirk
WHY WE NEED AN OPEN WEB INDEX
GOOGLE SERVES MORE THAN
2.000.000.000.000 QUERIES PER YEAR.
Proposal for an Open Web Index (OWI)
Prof. Dr. Dirk Lewandowski
PROBLEM STATEMENT
• As there is no central directory of the Web, private search engine companies
have built large indexes of its contents
• Companies operating Web-scale indexes do not allow sufficient access to
their data to other parties interested
• The difficulties in building a Web index lie in technical issues, operating costs,
Web size, and freshness
• Due to these difficulties, there is no Web index built by a European company
(or other entity)
4
Proposal for an Open Web Index (OWI)
Prof. Dr. Dirk Lewandowski
IDEA
5
VISION
To build a public library of the Web
TECHNICAL IDEA
Separate the index from the services that are built on the index
PUBLIC VS. PRIVATE
While the index should be public, the services can be proprietary
Separate the index from the services that are built on the index
TECHNICAL IDEA
Separate the index from the services that are built on the index
TECHNICAL IDEA
Separate the index from the services that are built on the index
PUBLIC VS. PRIVATE
While the index should be public, the services can be proprietary
TECHNICAL IDEA
Separate the index from the services that are built on the index

Recommended

THOR Workshop - Introduction
THOR Workshop - IntroductionTHOR Workshop - Introduction
THOR Workshop - IntroductionMaaike Duine
 
ICIC 2013 Conference Proceedings Uwe Rosemann TIB
ICIC 2013 Conference Proceedings Uwe Rosemann TIBICIC 2013 Conference Proceedings Uwe Rosemann TIB
ICIC 2013 Conference Proceedings Uwe Rosemann TIBDr. Haxel Consult
 
An introduction to Linked Open Data
An introduction to Linked Open DataAn introduction to Linked Open Data
An introduction to Linked Open DataAli Khalili
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueHerbert Van de Sompel
 
THOR Workshop - Services PANGAEA
THOR Workshop - Services PANGAEATHOR Workshop - Services PANGAEA
THOR Workshop - Services PANGAEAMaaike Duine
 
The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about itHerbert Van de Sompel
 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open scienceSarah Jones
 

More Related Content

What's hot

Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersBrian Hole
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Oscar Corcho
 
THOR Workshop - Services EBI
THOR Workshop - Services EBITHOR Workshop - Services EBI
THOR Workshop - Services EBIMaaike Duine
 
ICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CASICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CASDr. Haxel Consult
 
From Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataFrom Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataDongpo Deng
 
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Humphrey Southall
 
Making Open the Default - Bjorn Brembs
Making Open the Default - Bjorn BrembsMaking Open the Default - Bjorn Brembs
Making Open the Default - Bjorn BrembsRight to Research
 
Imperial College ORCID project
Imperial College ORCID projectImperial College ORCID project
Imperial College ORCID projectTorsten Reimer
 
Introducing the Linked Data Research Centre
Introducing the Linked Data Research CentreIntroducing the Linked Data Research Centre
Introducing the Linked Data Research CentreMichael Hausenblas
 
ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy  ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy Dr. Haxel Consult
 
Introducing ORCID at Imperial College London
Introducing ORCID at Imperial College LondonIntroducing ORCID at Imperial College London
Introducing ORCID at Imperial College LondonTorsten Reimer
 
Publishing Open Research Data
Publishing Open Research DataPublishing Open Research Data
Publishing Open Research DataBrian Hole
 
ORCID - A University Perspective
ORCID - A University PerspectiveORCID - A University Perspective
ORCID - A University PerspectiveTorsten Reimer
 
Research Data Publishing
Research Data PublishingResearch Data Publishing
Research Data PublishingBrian Hole
 
The Shift to Open Access Publishing
The Shift to Open Access PublishingThe Shift to Open Access Publishing
The Shift to Open Access PublishingBrian Hole
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 

What's hot (20)

Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for Publishers
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
 
THOR Workshop - Services EBI
THOR Workshop - Services EBITHOR Workshop - Services EBI
THOR Workshop - Services EBI
 
ICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CASICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CAS
 
Code4 lib2012
Code4 lib2012Code4 lib2012
Code4 lib2012
 
From Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataFrom Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental Data
 
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...
 
Making Open the Default - Bjorn Brembs
Making Open the Default - Bjorn BrembsMaking Open the Default - Bjorn Brembs
Making Open the Default - Bjorn Brembs
 
Imperial College ORCID project
Imperial College ORCID projectImperial College ORCID project
Imperial College ORCID project
 
Open data and linked data
Open data and linked dataOpen data and linked data
Open data and linked data
 
Introducing the Linked Data Research Centre
Introducing the Linked Data Research CentreIntroducing the Linked Data Research Centre
Introducing the Linked Data Research Centre
 
ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy  ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy
 
Introducing ORCID at Imperial College London
Introducing ORCID at Imperial College LondonIntroducing ORCID at Imperial College London
Introducing ORCID at Imperial College London
 
II-SDV 2016 RightsDirect
II-SDV 2016 RightsDirectII-SDV 2016 RightsDirect
II-SDV 2016 RightsDirect
 
Publishing Open Research Data
Publishing Open Research DataPublishing Open Research Data
Publishing Open Research Data
 
ORCID - A University Perspective
ORCID - A University PerspectiveORCID - A University Perspective
ORCID - A University Perspective
 
Research Data Publishing
Research Data PublishingResearch Data Publishing
Research Data Publishing
 
The Shift to Open Access Publishing
The Shift to Open Access PublishingThe Shift to Open Access Publishing
The Shift to Open Access Publishing
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 

Similar to The Need for and fundamentals of an Open Web Index

From Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsFrom Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsSimeon Warner
 
Wiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School PkuWiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School Pkuwiser pku
 
Wiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School PkuWiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School Pkuguest8ed46d
 
OpenAIRE: Science. Set Free, Iryna Kuchma, EIFL
OpenAIRE: Science. Set Free, Iryna Kuchma, EIFLOpenAIRE: Science. Set Free, Iryna Kuchma, EIFL
OpenAIRE: Science. Set Free, Iryna Kuchma, EIFLPlatforma Otwartej Nauki
 
Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Anna Fensel
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for BiopharmaTom Plasterer
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13Kristi Holmes
 
Open ILRI
Open ILRIOpen ILRI
Open ILRIILRI
 
Open Access Tracking Project How it work ?
Open Access Tracking Project How it work ?Open Access Tracking Project How it work ?
Open Access Tracking Project How it work ?Vrushali Basarkar
 
Keeping up to date & comparing journal apps. the stockholm workshop 2016
Keeping up to date &  comparing journal apps. the stockholm workshop 2016Keeping up to date &  comparing journal apps. the stockholm workshop 2016
Keeping up to date & comparing journal apps. the stockholm workshop 2016Guus van den Brekel
 
#ALAAC15 Linked Data Love
#ALAAC15 Linked Data Love #ALAAC15 Linked Data Love
#ALAAC15 Linked Data Love Kristi Holmes
 
SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware ac.uk
 
Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualiza...
Session 1 and 2  "Challenges and Opportunities with Big Linked Data Visualiza...Session 1 and 2  "Challenges and Opportunities with Big Linked Data Visualiza...
Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualiza...Laura Po
 
ODIN: Connecting research and researchers
ODIN: Connecting research and researchersODIN: Connecting research and researchers
ODIN: Connecting research and researchersSergio Ruiz
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...Pedro Príncipe
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE
 

Similar to The Need for and fundamentals of an Open Web Index (20)

From Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsFrom Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and Collaborations
 
Wiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School PkuWiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School Pku
 
Wiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School PkuWiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School Pku
 
OpenAIRE: Science. Set Free, Iryna Kuchma, EIFL
OpenAIRE: Science. Set Free, Iryna Kuchma, EIFLOpenAIRE: Science. Set Free, Iryna Kuchma, EIFL
OpenAIRE: Science. Set Free, Iryna Kuchma, EIFL
 
W3 c semantic web activity
W3 c semantic web activityW3 c semantic web activity
W3 c semantic web activity
 
Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)
 
Semantic web & structured data - #SMT Search Marketing Thursday - Jan-Willem ...
Semantic web & structured data - #SMT Search Marketing Thursday - Jan-Willem ...Semantic web & structured data - #SMT Search Marketing Thursday - Jan-Willem ...
Semantic web & structured data - #SMT Search Marketing Thursday - Jan-Willem ...
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 
Open ILRI
Open ILRIOpen ILRI
Open ILRI
 
20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong
 
Open Access Tracking Project How it work ?
Open Access Tracking Project How it work ?Open Access Tracking Project How it work ?
Open Access Tracking Project How it work ?
 
How OATP Work?
How OATP Work?How OATP Work?
How OATP Work?
 
Keeping up to date & comparing journal apps. the stockholm workshop 2016
Keeping up to date &  comparing journal apps. the stockholm workshop 2016Keeping up to date &  comparing journal apps. the stockholm workshop 2016
Keeping up to date & comparing journal apps. the stockholm workshop 2016
 
#ALAAC15 Linked Data Love
#ALAAC15 Linked Data Love #ALAAC15 Linked Data Love
#ALAAC15 Linked Data Love
 
SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers
 
Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualiza...
Session 1 and 2  "Challenges and Opportunities with Big Linked Data Visualiza...Session 1 and 2  "Challenges and Opportunities with Big Linked Data Visualiza...
Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualiza...
 
ODIN: Connecting research and researchers
ODIN: Connecting research and researchersODIN: Connecting research and researchers
ODIN: Connecting research and researchers
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
 

More from Dirk Lewandowski

In a World of Biased Search Engines
In a World of Biased Search EnginesIn a World of Biased Search Engines
In a World of Biased Search EnginesDirk Lewandowski
 
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...Dirk Lewandowski
 
Künstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei SuchmaschinenKünstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei SuchmaschinenDirk Lewandowski
 
Analysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topicsAnalysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topicsDirk Lewandowski
 
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändertGoogle Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändertDirk Lewandowski
 
Suchverhalten und die Grenzen von Suchdiensten
Suchverhalten und die Grenzen von SuchdienstenSuchverhalten und die Grenzen von Suchdiensten
Suchverhalten und die Grenzen von SuchdienstenDirk Lewandowski
 
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?Dirk Lewandowski
 
Are Ads on Google search engine results pages labeled clearly enough?
Are Ads on Google search engine results pages labeled clearly enough?Are Ads on Google search engine results pages labeled clearly enough?
Are Ads on Google search engine results pages labeled clearly enough?Dirk Lewandowski
 
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?Dirk Lewandowski
 
Wie Suchmaschinen die Inhalte des Web interpretieren
Wie Suchmaschinen die Inhalte des Web interpretierenWie Suchmaschinen die Inhalte des Web interpretieren
Wie Suchmaschinen die Inhalte des Web interpretierenDirk Lewandowski
 
Perspektiven eines Open Web Index
Perspektiven eines Open Web IndexPerspektiven eines Open Web Index
Perspektiven eines Open Web IndexDirk Lewandowski
 
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?Dirk Lewandowski
 
Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?Dirk Lewandowski
 
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und EntwicklungsperspektivenInternet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und EntwicklungsperspektivenDirk Lewandowski
 
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...Dirk Lewandowski
 
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von SuchmaschinenVerwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von SuchmaschinenDirk Lewandowski
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)Dirk Lewandowski
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)Dirk Lewandowski
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)Dirk Lewandowski
 

More from Dirk Lewandowski (20)

In a World of Biased Search Engines
In a World of Biased Search EnginesIn a World of Biased Search Engines
In a World of Biased Search Engines
 
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
 
Künstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei SuchmaschinenKünstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei Suchmaschinen
 
Analysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topicsAnalysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topics
 
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändertGoogle Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
 
Suchverhalten und die Grenzen von Suchdiensten
Suchverhalten und die Grenzen von SuchdienstenSuchverhalten und die Grenzen von Suchdiensten
Suchverhalten und die Grenzen von Suchdiensten
 
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
 
Are Ads on Google search engine results pages labeled clearly enough?
Are Ads on Google search engine results pages labeled clearly enough?Are Ads on Google search engine results pages labeled clearly enough?
Are Ads on Google search engine results pages labeled clearly enough?
 
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
 
Wie Suchmaschinen die Inhalte des Web interpretieren
Wie Suchmaschinen die Inhalte des Web interpretierenWie Suchmaschinen die Inhalte des Web interpretieren
Wie Suchmaschinen die Inhalte des Web interpretieren
 
Perspektiven eines Open Web Index
Perspektiven eines Open Web IndexPerspektiven eines Open Web Index
Perspektiven eines Open Web Index
 
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
 
Suchmaschinen verstehen
Suchmaschinen verstehenSuchmaschinen verstehen
Suchmaschinen verstehen
 
Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?
 
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und EntwicklungsperspektivenInternet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
 
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
 
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von SuchmaschinenVerwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
 

Recently uploaded

Augmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical ProfessionalsAugmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical Professionalsthirdeyegen65
 
UGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptxUGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptxRitesh Sahu
 
Augmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & DefenseAugmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & Defensethirdeyegen65
 
Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Damar Juniarto
 
Model Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfModel Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfgalfinprihardiputra0
 
AWS Overview of AWS Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS  Clarify, Feature Store, Hyper parameter TuningAWS Overview of AWS  Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS Clarify, Feature Store, Hyper parameter TuningVarun Garg
 
Red shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's CyberspaceRed shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's Cyberspacesttyk
 
Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...ssuser7b7f4e
 

Recently uploaded (8)

Augmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical ProfessionalsAugmented and Mixed Reality Solutions for Frontline Medical Professionals
Augmented and Mixed Reality Solutions for Frontline Medical Professionals
 
UGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptxUGB INTERNETBANKING FACILITY LAUNCHED.pptx
UGB INTERNETBANKING FACILITY LAUNCHED.pptx
 
Augmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & DefenseAugmented and Mixed Reality Solutions for Aerospace & Defense
Augmented and Mixed Reality Solutions for Aerospace & Defense
 
Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023Regulation is Coming - Trusted Media Summit 2023
Regulation is Coming - Trusted Media Summit 2023
 
Model Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdfModel Jaringan network jaringan komputer.pdf
Model Jaringan network jaringan komputer.pdf
 
AWS Overview of AWS Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS  Clarify, Feature Store, Hyper parameter TuningAWS Overview of AWS  Clarify, Feature Store, Hyper parameter Tuning
AWS Overview of AWS Clarify, Feature Store, Hyper parameter Tuning
 
Red shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's CyberspaceRed shadows ringing in Japan's Cyberspace
Red shadows ringing in Japan's Cyberspace
 
Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...Obstructive jaundice is a medical condition characterized by the yellowing of...
Obstructive jaundice is a medical condition characterized by the yellowing of...
 

The Need for and fundamentals of an Open Web Index

  • 1. THE NEED FOR AND FUNDAMENTALS OF AN OPEN WEB INDEX Prof. Dr. Dirk Lewandowski Hamburg University of Applied Sciences, Hamburg, Germany dirk.lewandowski@haw-hamburg.de First International Symposium on Open Search Technology Garching, 23 October, 2019
  • 2. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski ABOUT ME • Professor of Information Research and Information Retrieval at Hamburg University of Applied Sciences • Author of 100+ scholarly articles on search engines • German-language book “Suchmaschinen verstehen” (Springer, 2nd edition, 2018) • Editor, Aslib Journal of Information Management (Emerald Publishing) • Served as expert for the High Court of Justice (UK) and Deutscher Bundestag (German parliament) 1 https://searchstudies.org/dirk
  • 3. WHY WE NEED AN OPEN WEB INDEX
  • 4. GOOGLE SERVES MORE THAN 2.000.000.000.000 QUERIES PER YEAR.
  • 5. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski PROBLEM STATEMENT • As there is no central directory of the Web, private search engine companies have built large indexes of its contents • Companies operating Web-scale indexes do not allow sufficient access to their data to other parties interested • The difficulties in building a Web index lie in technical issues, operating costs, Web size, and freshness • Due to these difficulties, there is no Web index built by a European company (or other entity) 4
  • 6. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski IDEA 5 VISION To build a public library of the Web TECHNICAL IDEA Separate the index from the services that are built on the index PUBLIC VS. PRIVATE While the index should be public, the services can be proprietary Separate the index from the services that are built on the index TECHNICAL IDEA Separate the index from the services that are built on the index TECHNICAL IDEA Separate the index from the services that are built on the index PUBLIC VS. PRIVATE While the index should be public, the services can be proprietary TECHNICAL IDEA Separate the index from the services that are built on the index
  • 7. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski STRUCTURE 6 OWI Crawler OWI Basic Indexer OWI Advanced Indexer OWI Web Index OWI Usage Data Index Service 1 Service 2 Service 3 User User User OWI Interface / API User User UserUser User UserUser User User User Service 4
  • 8. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski POSSIBLE APPLICATIONS N.B.: This list of ideas is far from being complete and only serves illustrative purposes. 7 SEARCH SCIENCE / RESEARCH • Web Search • Vertical Search, e.g.,video or scholarly content • Trend analysis, e.g., political trends • Language use on the Web • Research evaluation, e.g., Altmetrics DATA ANALYSIS • Data aggregation, e.g., company or person dossiers • Opinion mining (“Who says what about whom?”) • Market researc SCIENCE / RESEARCH • Web Search • Vertical Search, e.g.,video or scholarly content • Trend analysis, e.g., political trends • Language use on the Web • Research evaluation, e.g., Altmetrics DATA ANALYSIS • Data aggregation, e.g., company or person • Opinion mining (“Who says what about who • Market researc DATA ANALYSIS • Data aggregation, e.g., company or person dossiers • Opinion mining (“Who says what about whom?”) • Market research ARTIFICAL INTELLIGENCE OWI could build the foundation for large-scale AI applications, e.g., • Machine translation • Question answering DATA ANALYSIS • Data aggregation, e.g., company or person dossiers • Opinion mining (“Who says what about whom?”) • Market research COMBINING OWI DATA WITH PROPRIETARY DATA • Company profiles + OWI data = enriched company dossiers • Product data + OWI data = enriched product descriptions • Geospatial data + OWI data = enriched map applicatio DATA ANALYSIS • Data aggregation, e.g., company or person dossiers • Opinion mining (“Who says what about whom?”) • Market research COMBINING OWI DATA WITH PROPRIETARY DATA • Company profiles + OWI data = enriched company dossiers • Product data + OWI data = enriched product descriptions • Geospatial data + OWI data = enriched map applications
  • 9. WHY DON’T WE JUST START BUILDING IT?
  • 10. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski WHAT SIZE SHOULD A WEB INDEX HAVE? • 1.71 billion websites • How many pages/URLs does this mean? à There is no such thing as a complete index. à However, without representing a major part of the Web, an index is more or less useless. 9
  • 11. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski WHY ARE INITIATIVES LIKE COMMON CRAWL NOT ENOUGH? They are not comprehensive - CommonCrawl: 2.6 billion pages (not websites!) They are static - Crawling once a month is very different from keeping an index current at any time They do not provide search functionality - No (basic) indexing as needed to build applications on top of the index - No SPAM control as needed to build applications - No human raters to control for the quality of the index à The use of initiatives like Common Crawl is more or less restricted to analysing Web content. Due to the sampling procedure applied, it may not even be too useful for that. 10
  • 12. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski CRAWLING IS NOT THE PROBLEM, ANYWAY Crawling is just the beginning of a long process. Indexing is required for making the index searchable. The real problems are 1) Avoiding SPAM (= excluding it from the index) – SPAM makes up A LOT of the Web’s content 2) Keeping the index fresh 3) Providing indexing (basic and advanced) 4) Making the index searchable 11
  • 13. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski BIAS ON THE WEB 12Baeza-Yates, R. (2018). Bias on the web. Communications of the ACM, 61(6), 54–61. https://doi.org/10.1145/3209581
  • 14. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski WHO CONTROLS THE RESULT RANKINGS? 13 Search Engine Providers Search Engine Result Page Content ProvidersUsers Search Engine Optimizers
  • 16. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski HOW TO PROCEED - A comprehensible and fresh Web index is a societal/political project, not a mere technical problem. - Therefore, we need to approach politics. They should decide for building the index (and financing it) - To make the index independent from governments, a European foundation should be built to govern it. - The technical implementation of the Index should lie in the hands of those (companies/institution) best capable of building it. 15
  • 17. THANK YOU Dirk Lewandowski Hamburg University of Applied Sciences, Hamburg, Germany dirk.lewandowski@haw-hamburg.de www.searchstudies.org/dirk