Meta Buscadores

1,591 views

Published on

Published in: Education, Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,591
On SlideShare
0
From Embeds
0
Number of Embeds
552
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Meta Buscadores

  1. 1. Metabuscadores Fabricio Echeverría pechever@espol.edu.ecJoseph Brodsky
  2. 2. Agenda • Índices de palabras • Web Search Engine • Retrieval Information Systems • Metabuscadores • Preguntas
  3. 3. En busca de la memoria dinámica extendida
  4. 4. Índice de Palabras: Onomástica de los nombres en Catalán
  5. 5. Web Search Engine • Lenguaje de programación: Python • Manejo de Alta RAM • Almacenamiento Compartido • Procesamiento en Paralelo
  6. 6. Web Search Enginehttp://nlp.stanford.edu/IR-book/pdf/19web.pdf Pag.434
  7. 7. Código Python – Web Search Engine def union(a, b): cache = {def crawl_web(seed): # returns index, graph of for e in b: http://www.udacity.com/cs101x/final/multi.html: """<html>inlinks <body> if e not in a: tocrawl = [seed] a.append(e) crawled = [] <a href="http://www.udacity.com/cs101x/final/a.html">A</a><br> graph = {} # <url>, [list of pages it links to] <a href="http://www.udacity.com/cs101x/final/b.html">B</a><br> def add_page_to_index(index, url, content): index = {} words = content.split() </body> while tocrawl: pos=0 """, page = tocrawl.pop() for word in words: http://www.udacity.com/cs101x/final/b.html: """<html> if page not in crawled: <body> pos=content.find(word, pos) content = get_page(page) add_to_index(index, word, url,pos) add_page_to_index(index, page, content) Monty likes the Python programming language outlinks = get_all_links(content) Thomas Jefferson founded the University of Virginia def add_to_index(index, keyword, url,pos): When Mandela was in London, he visited Nelsons Column. graph[page] = outlinks if keyword in index: union(tocrawl, outlinks) index[keyword].append([url,pos]) </body> crawled.append(page) else: </html> return index, graph """, index[keyword] = [[url,pos]] http://www.udacity.com/cs101x/final/a.html: """<html>def get_next_target(page): <body> def lookup(index, keyword): start_link = page.find(<a href=) if keyword in index: Monty Python is not about a programming language if start_link == -1: return index[keyword] Udacity was not founded by Thomas Jefferson return None, 0 else: Nelson Mandela said "Education is the most powerful weapon start_quote = page.find(", start_link) return None which you can end_quote = page.find(", start_quote + 1) use to change the world." url = page[start_quote + 1:end_quote] </body> return url, end_quote </html> """,def get_all_links(page): } links = [] def get_page(url): while True: if url in cache: url, endpos = get_next_target(page) return cache[url] if url: else: links.append(url) print "Page not in cache: " + url page = page[endpos:] return None else: break return linkshttp://www.udacity.com/cs101
  8. 8. Information Retrieval Systems
  9. 9. Metabuscadores• Es la unión de búsquedas(query) en varios buscadores(Search Engine) – Índices de Búsquedas -
  10. 10. http://dg3rtljvitrle.cloudfront.net/slides/chap10.pdf
  11. 11. http://dg3rtljvitrle.cloudfront.net/slides/chap10.pdf

×