Your SlideShare is downloading. ×
0
Meta Buscadores
Meta Buscadores
Meta Buscadores
Meta Buscadores
Meta Buscadores
Meta Buscadores
Meta Buscadores
Meta Buscadores
Meta Buscadores
Meta Buscadores
Meta Buscadores
Meta Buscadores
Meta Buscadores
Meta Buscadores
Meta Buscadores
Meta Buscadores
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Meta Buscadores

1,340

Published on

Published in: Education, Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,340
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Metabuscadores Fabricio Echeverría pechever@espol.edu.ecJoseph Brodsky
  • 2. Agenda • Índices de palabras • Web Search Engine • Retrieval Information Systems • Metabuscadores • Preguntas
  • 3. En busca de la memoria dinámica extendida
  • 4. Índice de Palabras: Onomástica de los nombres en Catalán
  • 5. Web Search Engine • Lenguaje de programación: Python • Manejo de Alta RAM • Almacenamiento Compartido • Procesamiento en Paralelo
  • 6. Web Search Enginehttp://nlp.stanford.edu/IR-book/pdf/19web.pdf Pag.434
  • 7. Código Python – Web Search Engine def union(a, b): cache = {def crawl_web(seed): # returns index, graph of for e in b: http://www.udacity.com/cs101x/final/multi.html: """<html>inlinks <body> if e not in a: tocrawl = [seed] a.append(e) crawled = [] <a href="http://www.udacity.com/cs101x/final/a.html">A</a><br> graph = {} # <url>, [list of pages it links to] <a href="http://www.udacity.com/cs101x/final/b.html">B</a><br> def add_page_to_index(index, url, content): index = {} words = content.split() </body> while tocrawl: pos=0 """, page = tocrawl.pop() for word in words: http://www.udacity.com/cs101x/final/b.html: """<html> if page not in crawled: <body> pos=content.find(word, pos) content = get_page(page) add_to_index(index, word, url,pos) add_page_to_index(index, page, content) Monty likes the Python programming language outlinks = get_all_links(content) Thomas Jefferson founded the University of Virginia def add_to_index(index, keyword, url,pos): When Mandela was in London, he visited Nelsons Column. graph[page] = outlinks if keyword in index: union(tocrawl, outlinks) index[keyword].append([url,pos]) </body> crawled.append(page) else: </html> return index, graph """, index[keyword] = [[url,pos]] http://www.udacity.com/cs101x/final/a.html: """<html>def get_next_target(page): <body> def lookup(index, keyword): start_link = page.find(<a href=) if keyword in index: Monty Python is not about a programming language if start_link == -1: return index[keyword] Udacity was not founded by Thomas Jefferson return None, 0 else: Nelson Mandela said "Education is the most powerful weapon start_quote = page.find(", start_link) return None which you can end_quote = page.find(", start_quote + 1) use to change the world." url = page[start_quote + 1:end_quote] </body> return url, end_quote </html> """,def get_all_links(page): } links = [] def get_page(url): while True: if url in cache: url, endpos = get_next_target(page) return cache[url] if url: else: links.append(url) print "Page not in cache: " + url page = page[endpos:] return None else: break return linkshttp://www.udacity.com/cs101
  • 8. Information Retrieval Systems
  • 9. Metabuscadores• Es la unión de búsquedas(query) en varios buscadores(Search Engine) – Índices de Búsquedas -
  • 10. http://dg3rtljvitrle.cloudfront.net/slides/chap10.pdf
  • 11. http://dg3rtljvitrle.cloudfront.net/slides/chap10.pdf

×