SlideShare a Scribd company logo
1 of 18
Download to read offline
Why we need an independent index of the Web
Dirk Lewandowski
dirk.lewandowski@haw-hamburg.de
http://www.bui.haw-hamburg.de/lewandowski.html
@Dirk_Lew
Society of the Query Conference, Amsterdam, 7/11/2013
The “local copy” of the Web
•  Web Indexing
–  New, changed, deleted document
–  “Holy grail” of keeping the index complete and current
Risvik, K. M., & Michelsen, R. (2002). Search engines and web dynamics. Computer Networks, 39(3), 289–302.
Representation of documents in a search engine
Referring documents à Document à Metadata (examplex)
heading1
heading 2
Anchor text
Anchor text
Anchor text
From the source code
- Title
- Description
- Keywords
- Author
From the document
(document info)
- Length
- Date
- Decay
- Name of the author
From the Web
- PageRank
- Number of citations
The User’s Perspective
•  Everyone uses search engines (Purcell, Brenner & Raine, 2012; van Eimeren &
Frees, 2012)
•  Market is dominated by Google (ComScore data)
•  Users rely on
–  Google’s method of ordering results
–  Google’s method of collecting data
à If Google hasn’t seen it — and indexed it — or kept it up to date, it
can’t be found with a search query.
Freshness of Web search engines
(see Lewandowski, Wahlig & Meyer-Bautor, 2006; Lewandowski, 2008)
Original (as of yesterday) Google‘s copy (as of yesterday)
What about the alternatives to Google?
•  Many “seems to be” search engines
–  Accessing the data of another search engine
–  Representing nothing more than an alternative user interface to one of the more
well-known engines
–  In many cases, that turns out to be Google
–  E.g., in Germany, we can see that the major internet portals T-Online, GMX,
AOL, and web.de all display results obtained from Google
Why is one search engine not enough?
•  We need more than one search engine to ensure that a broad range of
opinions are represented in the search market.
•  Users should have the choice between different worldviews which originate
as a product of algorithm-based search result generation
•  Ideology-free search algorithms are simply not possible
Alternative Search Engine Indexes
•  There are only a handful of search engines that operate their own indexes,
due to costs and technical complexity
•  Search engines start-ups
–  Use an existing external index
–  Focus on a specialised topic (which requires only a small index)
–  Aggregate data from different search engines (meta search engine)
•  Actual search engine startups like Blekko and Duck Duck Go are more the
exception than the rule
Partner model
•  “Real” search engine providers such as Google and Bing operate their own
search engines but also provide their search results to partners
•  All the major web portals have now embraced this model.
•  Income through ads; revenue-sharing
•  Attractiveness of the model
–  The search engine provider encounters only minimal costs
–  The operator of the portal no longer needs to go to the great expense of running
its own search engine.
–  The partner index model has served to thin out the competition in the search
industry.
Access to Search Engine Indexes
•  Application programming interfaces (APIs)
–  No direct access to the search engine index
–  Limited number of top results which have already been ranked by the search
engine provider
–  Access via APIs is similar to what is occurring at the meta-search engines
–  The representation of the document in the source search engine is also not
included
Alternative Search Engines
•  What constitutes an “alternative search engine”?
–  All search engines that are not Google? (“Google Killers“, e.g., Cuil)
–  Some alternatives are not perceived as such because they are considered to be
simply the same as Google (e.g., Bing)
–  Search engines which explicitly position themselves as an alternative to Google
through a regional approach (e.g., Seekport)
–  New approaches to search / “Real alternatives”: Alternative approaches to
gathering and representing web content
Public Support for Search Engine Technology?
•  Quaero/Theseus: Funding a “Google Killer”?
–  Quaero: Technologies for multimedia searching.
–  Theseus: Semantic technologies for business-to-business applications (without
focusing exclusively on search).
•  The proposal to provide government funding for search engine technology
has been subject to intense criticism in the past
•  Establish a single alternative?
•  A number of factors which would cause it to fail
–  Poor marketing
–  Graphic design of the user interface
–  ...
•  Regardless of the reason, a failure of the new search engine would result in
the entire publicly funded initiative failing.
Economic perspective
•  Only the largest internet companies are able to afford large indexes.
•  Microsoft is the only company besides Google to possess a comprehensive
search engine index.
•  Yahoo gave up on its own index several years ago
•  It appears as though operating a dedicated index is attractive to practically
no one — and there are hardly any candidates with the necessary financial
resources in any case
The Solution
•  Create the conditions that will make establishing alternative search engines
possible
•  We can expect that the possibilities it presents would benefit a number of
different companies, individuals, and institutions.
•  The result will be fair competition to develop the best concepts for using the
data provided by the index.
Vision
•  “An index of the web that can be accessed at fair conditions for
everyone”
–  “Everyone” means that anyone who is interested can access the index.
–  “Fair conditions” does not mean that access to the index must be free of
charge for everyone. A certain number of document requests per day
should be available at no cost in order to promote non-profit projects.
–  “Access” to the index can be defined as the ability to automatically
query the index with ease.
–  The concept “index of the web” is intended to cover as much of the web
as possible
Funding and operation
•  Funding
–  This type of project cannot be supported by any one country alone. The only
feasible option is a pan-European initiative.
•  Who would operate the index?
–  Existing research institution or newly-founded institution
–  The operator of the index should not obtain the exclusive right to determine the
way in which the documents are used or made available (à Board of trustees)
Conclusion: Advantages of an independent index of the web
•  Motivate companies, institutions, and developers pursuing personal projects
to create their own search applications.
•  The data available on the web is so boundless that it lends itself to
countless applications in a broad range of fields.
•  Enable applications we are not yet capable of even imagining.
•  An open structure, transparency with respect to access, and the assurance
of permanent availability thanks to state sponsorship would lay the
groundwork for innovation.
Thank you
Prof. Dr. Dirk Lewandowski
Hochschule für Angewandte Wissenschaften
Hamburg
dirk.lewandowski@haw-hamburg,de
Twitter: Dirk_Lew
http://www.bui.haw-hamburg.de/lewandowski.html
http://www.searchstudies.org

More Related Content

What's hot

International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
IJDKP
 

What's hot (12)

Call for Papers - International Journal of Data Mining & Knowledge Management...
Call for Papers - International Journal of Data Mining & Knowledge Management...Call for Papers - International Journal of Data Mining & Knowledge Management...
Call for Papers - International Journal of Data Mining & Knowledge Management...
 
MOVING presentation at JSI
MOVING presentation at JSIMOVING presentation at JSI
MOVING presentation at JSI
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
General Introduction to the Oxford e-Research Centre
General Introduction to the Oxford e-Research CentreGeneral Introduction to the Oxford e-Research Centre
General Introduction to the Oxford e-Research Centre
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
Understanding Open Access
Understanding Open AccessUnderstanding Open Access
Understanding Open Access
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 

Viewers also liked

Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Dirk Lewandowski
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Dirk Lewandowski
 
Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?
Dirk Lewandowski
 

Viewers also liked (7)

Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
 
Perspektiven eines Open Web Index
Perspektiven eines Open Web IndexPerspektiven eines Open Web Index
Perspektiven eines Open Web Index
 
Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?
 
Wie Suchmaschinen die Inhalte des Web interpretieren
Wie Suchmaschinen die Inhalte des Web interpretierenWie Suchmaschinen die Inhalte des Web interpretieren
Wie Suchmaschinen die Inhalte des Web interpretieren
 
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
 
Suchmaschinen verstehen
Suchmaschinen verstehenSuchmaschinen verstehen
Suchmaschinen verstehen
 

Similar to Why we need an independent index of the Web

Google Case Analysis
Google Case AnalysisGoogle Case Analysis
Google Case Analysis
Lior Agassi
 
Optus improves customer experience
Optus improves customer experienceOptus improves customer experience
Optus improves customer experience
Sushant Arora
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
butest
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
butest
 
KB Seminars: Working with Technology - Product Management; 10/13
KB Seminars: Working with Technology - Product Management; 10/13KB Seminars: Working with Technology - Product Management; 10/13
KB Seminars: Working with Technology - Product Management; 10/13
MDIF
 
talk for HK SME center about web3.0 , AI, mobile apps
talk for HK SME center about web3.0 , AI, mobile appstalk for HK SME center about web3.0 , AI, mobile apps
talk for HK SME center about web3.0 , AI, mobile apps
Alex Hung
 

Similar to Why we need an independent index of the Web (20)

Alternatives to Google
Alternatives to GoogleAlternatives to Google
Alternatives to Google
 
Web-Oriented Architecture (WOA)
Web-Oriented Architecture (WOA)Web-Oriented Architecture (WOA)
Web-Oriented Architecture (WOA)
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Design Issues for Search Engines and Web Crawlers: A Review
Design Issues for Search Engines and Web Crawlers: A ReviewDesign Issues for Search Engines and Web Crawlers: A Review
Design Issues for Search Engines and Web Crawlers: A Review
 
Google Case Analysis
Google Case AnalysisGoogle Case Analysis
Google Case Analysis
 
Social shopping with semantic power
Social shopping with semantic powerSocial shopping with semantic power
Social shopping with semantic power
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
 
Optus improves customer experience
Optus improves customer experienceOptus improves customer experience
Optus improves customer experience
 
Google Whitepaper - Project Border
Google Whitepaper - Project BorderGoogle Whitepaper - Project Border
Google Whitepaper - Project Border
 
PPT 3 Web Analytics (1).pptx
PPT 3 Web Analytics (1).pptxPPT 3 Web Analytics (1).pptx
PPT 3 Web Analytics (1).pptx
 
Digital Marketing Course Week 6: Search Engine Optimization (SEO)
Digital Marketing Course Week 6: Search Engine Optimization (SEO)Digital Marketing Course Week 6: Search Engine Optimization (SEO)
Digital Marketing Course Week 6: Search Engine Optimization (SEO)
 
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
 
Keyword research tools for Search Engine Optimisation (SEO)
Keyword research tools for Search Engine Optimisation (SEO)Keyword research tools for Search Engine Optimisation (SEO)
Keyword research tools for Search Engine Optimisation (SEO)
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a Nutshell
 
Google Analytics SDDU Seminar
Google Analytics SDDU SeminarGoogle Analytics SDDU Seminar
Google Analytics SDDU Seminar
 
KB Seminars: Working with Technology - Product Management; 10/13
KB Seminars: Working with Technology - Product Management; 10/13KB Seminars: Working with Technology - Product Management; 10/13
KB Seminars: Working with Technology - Product Management; 10/13
 
talk for HK SME center about web3.0 , AI, mobile apps
talk for HK SME center about web3.0 , AI, mobile appstalk for HK SME center about web3.0 , AI, mobile apps
talk for HK SME center about web3.0 , AI, mobile apps
 
Search and Social Media Marketing Course Slides - Salford Universtiy
Search and Social Media Marketing Course Slides - Salford UniverstiySearch and Social Media Marketing Course Slides - Salford Universtiy
Search and Social Media Marketing Course Slides - Salford Universtiy
 

More from Dirk Lewandowski

In a World of Biased Search Engines
In a World of Biased Search EnginesIn a World of Biased Search Engines
In a World of Biased Search Engines
Dirk Lewandowski
 
Künstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei SuchmaschinenKünstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei Suchmaschinen
Dirk Lewandowski
 
Analysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topicsAnalysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topics
Dirk Lewandowski
 
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und EntwicklungsperspektivenInternet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Dirk Lewandowski
 
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Dirk Lewandowski
 
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von SuchmaschinenVerwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Dirk Lewandowski
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Dirk Lewandowski
 
Medientage 2013: Die Zukunft der Suche
Medientage 2013: Die Zukunft der SucheMedientage 2013: Die Zukunft der Suche
Medientage 2013: Die Zukunft der Suche
Dirk Lewandowski
 
Suchmaschinen: Googlerisierung der Gesellschaft
Suchmaschinen: Googlerisierung der GesellschaftSuchmaschinen: Googlerisierung der Gesellschaft
Suchmaschinen: Googlerisierung der Gesellschaft
Dirk Lewandowski
 
Wie beeinflussen Suchmaschinen den Informationsmarkt?
Wie beeinflussen Suchmaschinen den Informationsmarkt?Wie beeinflussen Suchmaschinen den Informationsmarkt?
Wie beeinflussen Suchmaschinen den Informationsmarkt?
Dirk Lewandowski
 
Warum wir Alternativen zu Google benötigen
Warum wir Alternativen zu Google benötigenWarum wir Alternativen zu Google benötigen
Warum wir Alternativen zu Google benötigen
Dirk Lewandowski
 

More from Dirk Lewandowski (20)

The Need for and fundamentals of an Open Web Index
The Need for and fundamentals of an Open Web IndexThe Need for and fundamentals of an Open Web Index
The Need for and fundamentals of an Open Web Index
 
In a World of Biased Search Engines
In a World of Biased Search EnginesIn a World of Biased Search Engines
In a World of Biased Search Engines
 
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
 
Künstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei SuchmaschinenKünstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei Suchmaschinen
 
Analysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topicsAnalysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topics
 
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändertGoogle Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
 
Suchverhalten und die Grenzen von Suchdiensten
Suchverhalten und die Grenzen von SuchdienstenSuchverhalten und die Grenzen von Suchdiensten
Suchverhalten und die Grenzen von Suchdiensten
 
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
 
Are Ads on Google search engine results pages labeled clearly enough?
Are Ads on Google search engine results pages labeled clearly enough?Are Ads on Google search engine results pages labeled clearly enough?
Are Ads on Google search engine results pages labeled clearly enough?
 
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
 
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und EntwicklungsperspektivenInternet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
 
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
 
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von SuchmaschinenVerwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
 
Nutzer verstehen
Nutzer verstehenNutzer verstehen
Nutzer verstehen
 
Medientage 2013: Die Zukunft der Suche
Medientage 2013: Die Zukunft der SucheMedientage 2013: Die Zukunft der Suche
Medientage 2013: Die Zukunft der Suche
 
Suchmaschinen: Googlerisierung der Gesellschaft
Suchmaschinen: Googlerisierung der GesellschaftSuchmaschinen: Googlerisierung der Gesellschaft
Suchmaschinen: Googlerisierung der Gesellschaft
 
Wie beeinflussen Suchmaschinen den Informationsmarkt?
Wie beeinflussen Suchmaschinen den Informationsmarkt?Wie beeinflussen Suchmaschinen den Informationsmarkt?
Wie beeinflussen Suchmaschinen den Informationsmarkt?
 
Web-Index-Workshop 2014
Web-Index-Workshop 2014Web-Index-Workshop 2014
Web-Index-Workshop 2014
 
Warum wir Alternativen zu Google benötigen
Warum wir Alternativen zu Google benötigenWarum wir Alternativen zu Google benötigen
Warum wir Alternativen zu Google benötigen
 

Recently uploaded

Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
ChloeMeadows1
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
lolsDocherty
 

Recently uploaded (16)

Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's Guide
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
 
I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirt
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
 
GOOGLE Io 2024 At takes center stage.pdf
GOOGLE Io 2024 At takes center stage.pdfGOOGLE Io 2024 At takes center stage.pdf
GOOGLE Io 2024 At takes center stage.pdf
 
Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirts
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of apps
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirts
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdf
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdf
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
 

Why we need an independent index of the Web

  • 1. Why we need an independent index of the Web Dirk Lewandowski dirk.lewandowski@haw-hamburg.de http://www.bui.haw-hamburg.de/lewandowski.html @Dirk_Lew Society of the Query Conference, Amsterdam, 7/11/2013
  • 2. The “local copy” of the Web •  Web Indexing –  New, changed, deleted document –  “Holy grail” of keeping the index complete and current Risvik, K. M., & Michelsen, R. (2002). Search engines and web dynamics. Computer Networks, 39(3), 289–302.
  • 3. Representation of documents in a search engine Referring documents à Document à Metadata (examplex) heading1 heading 2 Anchor text Anchor text Anchor text From the source code - Title - Description - Keywords - Author From the document (document info) - Length - Date - Decay - Name of the author From the Web - PageRank - Number of citations
  • 4. The User’s Perspective •  Everyone uses search engines (Purcell, Brenner & Raine, 2012; van Eimeren & Frees, 2012) •  Market is dominated by Google (ComScore data) •  Users rely on –  Google’s method of ordering results –  Google’s method of collecting data à If Google hasn’t seen it — and indexed it — or kept it up to date, it can’t be found with a search query.
  • 5. Freshness of Web search engines (see Lewandowski, Wahlig & Meyer-Bautor, 2006; Lewandowski, 2008) Original (as of yesterday) Google‘s copy (as of yesterday)
  • 6. What about the alternatives to Google? •  Many “seems to be” search engines –  Accessing the data of another search engine –  Representing nothing more than an alternative user interface to one of the more well-known engines –  In many cases, that turns out to be Google –  E.g., in Germany, we can see that the major internet portals T-Online, GMX, AOL, and web.de all display results obtained from Google
  • 7. Why is one search engine not enough? •  We need more than one search engine to ensure that a broad range of opinions are represented in the search market. •  Users should have the choice between different worldviews which originate as a product of algorithm-based search result generation •  Ideology-free search algorithms are simply not possible
  • 8. Alternative Search Engine Indexes •  There are only a handful of search engines that operate their own indexes, due to costs and technical complexity •  Search engines start-ups –  Use an existing external index –  Focus on a specialised topic (which requires only a small index) –  Aggregate data from different search engines (meta search engine) •  Actual search engine startups like Blekko and Duck Duck Go are more the exception than the rule
  • 9. Partner model •  “Real” search engine providers such as Google and Bing operate their own search engines but also provide their search results to partners •  All the major web portals have now embraced this model. •  Income through ads; revenue-sharing •  Attractiveness of the model –  The search engine provider encounters only minimal costs –  The operator of the portal no longer needs to go to the great expense of running its own search engine. –  The partner index model has served to thin out the competition in the search industry.
  • 10. Access to Search Engine Indexes •  Application programming interfaces (APIs) –  No direct access to the search engine index –  Limited number of top results which have already been ranked by the search engine provider –  Access via APIs is similar to what is occurring at the meta-search engines –  The representation of the document in the source search engine is also not included
  • 11. Alternative Search Engines •  What constitutes an “alternative search engine”? –  All search engines that are not Google? (“Google Killers“, e.g., Cuil) –  Some alternatives are not perceived as such because they are considered to be simply the same as Google (e.g., Bing) –  Search engines which explicitly position themselves as an alternative to Google through a regional approach (e.g., Seekport) –  New approaches to search / “Real alternatives”: Alternative approaches to gathering and representing web content
  • 12. Public Support for Search Engine Technology? •  Quaero/Theseus: Funding a “Google Killer”? –  Quaero: Technologies for multimedia searching. –  Theseus: Semantic technologies for business-to-business applications (without focusing exclusively on search). •  The proposal to provide government funding for search engine technology has been subject to intense criticism in the past •  Establish a single alternative? •  A number of factors which would cause it to fail –  Poor marketing –  Graphic design of the user interface –  ... •  Regardless of the reason, a failure of the new search engine would result in the entire publicly funded initiative failing.
  • 13. Economic perspective •  Only the largest internet companies are able to afford large indexes. •  Microsoft is the only company besides Google to possess a comprehensive search engine index. •  Yahoo gave up on its own index several years ago •  It appears as though operating a dedicated index is attractive to practically no one — and there are hardly any candidates with the necessary financial resources in any case
  • 14. The Solution •  Create the conditions that will make establishing alternative search engines possible •  We can expect that the possibilities it presents would benefit a number of different companies, individuals, and institutions. •  The result will be fair competition to develop the best concepts for using the data provided by the index.
  • 15. Vision •  “An index of the web that can be accessed at fair conditions for everyone” –  “Everyone” means that anyone who is interested can access the index. –  “Fair conditions” does not mean that access to the index must be free of charge for everyone. A certain number of document requests per day should be available at no cost in order to promote non-profit projects. –  “Access” to the index can be defined as the ability to automatically query the index with ease. –  The concept “index of the web” is intended to cover as much of the web as possible
  • 16. Funding and operation •  Funding –  This type of project cannot be supported by any one country alone. The only feasible option is a pan-European initiative. •  Who would operate the index? –  Existing research institution or newly-founded institution –  The operator of the index should not obtain the exclusive right to determine the way in which the documents are used or made available (à Board of trustees)
  • 17. Conclusion: Advantages of an independent index of the web •  Motivate companies, institutions, and developers pursuing personal projects to create their own search applications. •  The data available on the web is so boundless that it lends itself to countless applications in a broad range of fields. •  Enable applications we are not yet capable of even imagining. •  An open structure, transparency with respect to access, and the assurance of permanent availability thanks to state sponsorship would lay the groundwork for innovation.
  • 18. Thank you Prof. Dr. Dirk Lewandowski Hochschule für Angewandte Wissenschaften Hamburg dirk.lewandowski@haw-hamburg,de Twitter: Dirk_Lew http://www.bui.haw-hamburg.de/lewandowski.html http://www.searchstudies.org