Your SlideShare is downloading. ×
Web-Mining
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Web-Mining

3,288
views

Published on


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,288
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
49
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Web Mining Lecturer: Dr. Edel Garcia A First Course in Web Mining, Search Engines, and Business Intelligence Dr. Garcia holds a PhD from Arizona State University, Tempe and a B.S. from Interamerican University of Puerto Rico. He currently conducts research in the area of Web Mining (search engines), Information Retrieval (vector space models, LSI, and co-occurrence theory), and Fractal Geometry (fractals in information sciences). Dr. Garcia is a reviewer for JASIST (Journal of the American Society for Information Science and Technology) and IBM Computer and Graphics, an international journal on fractals, chaos, and graph theory. In addition, he is in the Program Committee of W3C’s AIRWeb (Adversarial Information Retrieval on the Web Workshops) and has participated as speaker and co-chair at local and international conferences on Web Mining and search engine technologies. At Polytechnic University, he advices graduate student research projects and theses in Data Mining and Web Mining. He is the founder of http://www.miislita.com, an online resource on information retrieval and search engine technologies.
  • 2. Web Mining Lecturer: Dr. Edel Garcia Description: This is a hands-on, one-full semester course on Web Mining, search engines, and business intelligence. Students will learn by doing: (a) how search engines index and rank web documents, (b) how to conduct business intelligence from online resources, and (c) apply Web Mining strategies and algorithms in their workplace or research careers. Target: Students in Business, Engineering, and Computer Sciences and well as from other disciplines are encouraged to register for this special course. Requirements: Permission from advisor or department. Grading: Take-home work and a final exam. Topics: The following topics will be covered, not necessarily in this order: • Document Indexing: Indexing of web sites and text operations used by search engines including document linearization, tokenization, stop word filtration, stemming, and parsing. • Search Engine Optimization: Mining search engine relevance algorithms to rank high Web pages. • Intelligence Searching: Covers undocumented (smart) searches in Google and other search engines; includes Hacking and Penetration through customized searches. • Keyword Research and Clustering: Discovery of word patterns and keywords for branding and marketing through Association, Scalar, and Metric Clusters. • Term Matching Algorithms: Vector Space Models used by search engines to rank documents. Scoring of local, global, and entropy term weights. • Concept Matching Algorithms: Singular Value Decomposition (SVD) and Latent Semantic Indexing (LSI) models for clustering and ranking. • Link Analysis Models: Google’s PageRank, Hubs & Authorities, and other link-based models. • Spam Intelligence: Tools and techniques used to spam search engines and web sites. Includes techniques based on scripts, cloacking, keyword spamming, link-bombs, email marketing, viral marketing, Web 2.0, and Web 3.0. • Introduction to Business Dashboards (BDs): Overview of dashboard technology, including open source, and customized add-on components. • Special Topics: On-Topic Analysis, Co-Occurrence Theory, and Latent Graphs. This is my own area of research. Textbook: Web Mining evolves on a daily basis; thus, there is no official textbook. However, the following reference books are recommended for literature research purposes. Additional references will be provided in class. 1. Modern Information Retrieval (Baeza-Yates and Ribeiro-Neto; Addison Wesley). 2. Information Retrieval – Algorithms and Heuristics (Grossman and Frieder; Springer).