CSE509 Lecture 1


Published on

Lecture 1 of CSE509:Web Science and Technology Summer Course

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Nigel Shadbolt – Prof. at Univ. of Southampton who had initiated the Web Science program in collaboration with MIT
  • Processing the enormous quantities of data necessary for these advances requires largeclusters, making distributed computing paradigms more crucial than ever. MapReduce is a programmingmodel for expressing distributed computations on massive datasets and an execution frameworkfor large-scale data processing on clusters of commodity servers. The programming model providesan easy-to-understand abstraction for designing scalable algorithms, while the execution frameworktransparently handles many system-level details, ranging from scheduling to synchronization to faulttolerance.MapReduce+Cloud Computing Debate
  • CSE509 Lecture 1

    1. 1. CSE509: Introduction to Web Science and Technology<br />Lecture 1: Introduction<br />Muhammad AtifQureshi<br />Web Science Research Group<br />Institute of Business Administration (IBA)<br />
    2. 2. Outline<br />What is Web Science?<br />Why We Need Web Science?<br />Implications of Web Science<br />CSE509 Adminstrivia<br />Course Contents<br />July 09, 2011<br />
    3. 3. Science of the Web<br />Introduction<br />Why we need Web Science as a research field? Because we need a systems-level understanding of the Web. <br />– Prof. Nigel Shadbolt,<br />One of pioneers of Web Science program,<br />University of Southampton<br />July 09, 2011<br />
    4. 4. Web Science<br />Social and engineering dimensions (New York Times at launch of Web Science Program at Univ. of Southampton and MIT in 2006)<br />Extends well beyond traditional Computer Science<br />Introduction<br />The Web isn’t about what you can do with computers. It’s people and, yes, they are connected by computers. But computer science, as the study of what happens in a computer, doesn’t tell you about what happens on the Web. <br />–Tim Berners-Lee<br />One of the founder of WWW<br />July 09, 2011<br />
    5. 5. What is the Web?<br />A distributed document delivery system implemented through application-level protocols on the Internet<br />A tool for collaborative writing and community building<br />A framework of protocols that support e-commerce<br />A network of co-operating computers<br />A large, cylindrical, directed graph made up of Web pages and links<br />July 09, 2011<br />Introduction<br />
    6. 6. Science (in a nutshell)<br />July 09, 2011<br />Introduction<br />Existence<br />Does X exist?<br />Description and Classification<br />What is X like?<br />What are its properties?<br />How can it be categorized?<br />How can we measure it?<br />What are its components?<br />Descriptive Process<br />How does X work?<br />What is the process by which X happens?<br />What are the steps as X evolves?<br />How does X achieve its purpose?<br /><ul><li>Descriptive-Comparative
    7. 7. How does X differ from Y?
    8. 8. Relationship
    9. 9. Are X and Y related?
    10. 10. Do occurrences of X co-relate with occurrences of Y?
    11. 11. Casuality
    12. 12. Does X cause Y?
    13. 13. Does X prevent Y?
    14. 14. What causes X?
    15. 15. What effect does X have on Y?
    16. 16. Design
    17. 17. What is an effective way to achieve X?
    18. 18. How can we improve X?</li></li></ul><li>Perspectives of “Science”<br /><ul><li>Physical/biological science perspectives
    19. 19. Analytic disciplines that aim to find laws/processes that generate or explain observed phenomena
    20. 20. Social science perspective
    21. 21. Scholarly or scientific disciplines that deal with the study of human society and of individual relationships in and to society
    22. 22. Computer science perspective
    23. 23. Synthetic discipline that creates mechanisms (e.g., formalisms, algorithms, etc.) in order to support particular desired behavior</li></ul>July 09, 2011<br />Introduction<br />
    24. 24. Which Science Explains the Web?<br />Given<br />Neither the Web nor the world is static<br />The Web evolves in response to various pressures from<br />Science<br />Commerce<br />The public<br />Politics<br />Etc.<br />July 09, 2011<br />Introduction<br />
    25. 25. Web Science<br />The Web is a new technical and social phenomenon and a growing organism<br />The Web needs to be studied and understood as an entity in its own right<br />Web Science is a new field of science that involves a multi-disciplinary study and inquiry for the understanding of the Web and its relationships to us<br />July 09, 2011<br />Introduction<br />
    26. 26. Why Web Science?<br />Dynamics and evolution<br />The “deep (or dark) Web”<br />Sampling, lack of complete enumeration<br />Scale (e.g., What is the percentage of Web pages updated daily?)<br />Search (e.g., What percentage of Web pages are indexed by search engines?)<br />Web topology<br />Artifacts of social interactions (blogs, etc.), Web sociology<br />July 09, 2011<br />Importance<br />
    27. 27. Web Science vs. Computer Science<br />Metrics<br />Computer Science: Moore’s Law, O(n) algorithm analysis, Gigabytes<br />Web Science: Page views, Unique visitors/month, No. of songs/videos<br />Topics<br />Computer Science: Computer networks, Programming languages, Database systems, Operating systems, Compilers, Graphics<br />Web Science: Social networks, Relationships (users, web pages, etc.), Web 2.0 applications, E-*, Creating/sharing multimedia<br />Focus<br />Computer Science: Technology, Computers, HPC, Proficient programmers<br />Web Science: Applications, Users, Mobile interactivity, Universal accessibility<br />July 09, 2011<br />
    28. 28. What Could Scientific Theories for the Web Look Like?<br />Every page on the Web can be reached by following less than 10 links<br />The average number of words per search query is greater than 3<br />A wikipedia page on average contains 0.03 false facts<br />The Web is a “scale-free” graph<br />July 09, 2011<br />Importance<br />
    29. 29. Intersection of Disciplines<br />July 09, 2011<br />Importance<br />
    30. 30. July 09, 2011<br />Proper discipline of interest is not only Web Science<br />But<br />“Web Science and Technology”<br />
    31. 31. Web’s Relation with Entrepreneurship<br />July 09, 2011<br />Implication<br />Web Science represents a pretty big next step in the evolution of information.  This kind of research is likely to have a lot of influence on the next generation of researchers, scientists and most importantly, the next generation of entrepreneurs who will build new companies from this.<br />– Eric Schmdt,<br />Ex-CEO, Google Inc.<br />
    32. 32. For Pakistan Web Science and Technology <br />Job market is heavily consumed by technology of Web solutions<br />Remote industry such as Google, Yahoo, Microsoft is heavily investing in it<br />Business is getting a good amount of share from the Web<br />Social Media reaches people massively than the traditional media<br />July 09, 2011<br />Implication<br />
    33. 33. Course Objectives<br />Have insight on the future direction of the Web<br />How technological changes affect the Web as a system<br />Learn design principles for complex Web applications and systems<br />Prepare for the new era of Web science and technology<br />July 09, 2011<br />
    34. 34. Course Information<br />Instructors<br />Muhammad AtifQureshi<br />ArjumandYounus<br />Class Hours<br />Saturdays 6:00 pm to 8:15 pm <br />Office Hours<br />Mondays 1:00 pm to 3:00 pm<br />Evaluation<br />Assignments (50%)<br />Mid-Term Exam (30%)<br />Research Project (20%)<br />July 09, 2011<br />
    35. 35. Course Organization<br />Session One<br />Information Retrieval<br />Session Two<br />Large-Scale Web Mining<br />Session Three<br />Social Web Mining<br />July 09, 2011<br />
    36. 36. Information Retrieval<br />Principles and Theories behind Web Search Engines<br />Basic IR models, data structures and algorithms<br />Topic-based models<br />Link-based ranking<br />Search engine architecture<br />July 09, 2011<br />
    37. 37. Large-Scale Web Mining<br />MapReduce Design Patterns<br />Big data<br />Larger amount of data means useful applications<br />Algorithms using MapReduce<br />Distributed File Systems (GFS)<br />July 09, 2011<br />There is substantial promise in this new paradigm of computing, but unwarranted hype by the media and popular sources threatens its credibility in the long run. In some ways, cloud computing is simply brilliant marketing <br />– Jimmy LinTwitter Scientist and Maryland Professor<br />
    38. 38. Social Web Mining<br />Social Web Crawling<br />Mining for Information in Social Networks<br />Trend analysis<br />Dynamics and evolution patterns<br />Temporal analysis<br />Community detection and analysis<br />Social Search<br />July 09, 2011<br />
    39. 39. EXAMPLE OF WEB SCIENCE PROJECT: <br />Diff-IE <br />(courtesy Jaime Teevan, Microsoft Research)<br />July 09, 2011<br />
    40. 40. DISCUSSION<br />July 09, 2011<br />