This presentation was the introduction to Flattening Classrooms, Expanding Minds, Growing Business given at the Center for Quality Teaching and Learning and covers an overview of why tools should be allowed in the classroom.
This presentation was the introduction to Flattening Classrooms, Expanding Minds, Growing Business given at the Center for Quality Teaching and Learning and covers an overview of why tools should be allowed in the classroom.
Presentation by Brett Baker, Web Manager at The Children's Aid Society given at Drupal Camp Atlanta 2010 on October 2, 2010. The talk discussed how a single person or small team can leverage the Drupal CMS to tackle difficult deliverables.
Slides for talk on "DataWeb: Three worlds Collide" given at IWMW 1998.
See http://www.ukoln.ac.uk/web-focus/events/workshops/webmaster-sep1998/materials/
The development of better library information systems will always remain the core business of any serious library organization, but a shift took place towards (freely) available web-based tools for creating and managing the information workflow.
End-users are not only using these heavily, but are also creating their own preferred tools. Today's students are incorporating Web 2.0 skills in daily life, in their social and learning environments. Tomorrow's academic staff will expect to be able to use their preferred tools and resources within their work environment. Today's ánd tomorrow's libraries should support students and staff in the learning and research process by integrating their services and resources into our patrons' environments.
This practical workshop will demonstrate the use of Web 2.0 technology to empower users and librarians. During a hands-on session, participants will work with these tools. They will develop tailor-made services via personal start page software like Netvibes, making use of RSS-feeds, Widgets and Browser extensions.
We will explore the use of Netvibes and Web 2.0 tools in library staff and/or library user education/instruction. We will focus on library services which can be created almost on-the-fly with low costs and high impact. The growing use of social networks justifies the development of a library presence within these networks to reach out to our users.
Paper, slides and recommended reading : http://www.tilburguniversity.nl/services/lis/ticer/08carte/recommendedreading.html#brekel
Web 2.0: Beyond the Hype.” Usability Professionals Association, Minneapolis M...Samantha Bailey
Presentation deconstructing the "web 2.0" meme that was feverishly taking over the web following the widespread adoption of AJAX programming techniques.
"Python web development combines the simplicity of the language with powerful...softwaretrainer2elys
Title: Exploring Web Development with Python: A Comprehensive Guide
Introduction:
Web development has become an integral part of the modern technological landscape, and Python has emerged as a versatile and powerful language for building web applications. In this comprehensive guide, we will delve into the various aspects of web development using Python, exploring frameworks, libraries, and best practices to create dynamic and scalable web applications.
I. Understanding the Basics of Web Development:
1.1 HTML, CSS, and JavaScript:
Before delving into Python-specific frameworks, it's essential to grasp the fundamentals of web development. HTML provides the structure, CSS adds styling, and JavaScript adds interactivity to web pages. These technologies form the backbone of web development regardless of the programming language used.
1.2 Introduction to Python for Web Development:
Python's readability, simplicity, and extensive libraries make it an excellent choice for web development. Familiarizing yourself with basic Python syntax, data structures, and control flow is crucial before diving into web-specific frameworks.
II. Python Web Frameworks:
2.1 Flask:
Flask is a lightweight and easy-to-use web framework that follows the WSGI (Web Server Gateway Interface) standard. It's ideal for small to medium-sized projects and encourages simplicity and flexibility. We'll explore how to set up a basic Flask application, define routes, and render dynamic templates.
2.2 Django:
Django, a high-level web framework, follows the "batteries-included" philosophy, providing a robust set of features out of the box. From database migrations to user authentication, Django simplifies complex tasks and promotes best practices. We'll cover creating a Django project, defining models, and building views and templates.
III. Frontend Development with Python:
3.1 JavaScript Integration:
While Python handles server-side logic, JavaScript is crucial for client-side interactivity. We'll explore methods to integrate JavaScript frameworks like React or Vue.js into Python-based web applications, allowing for a seamless user experience.
3.2 Template Engines:
Python web frameworks often use template engines to dynamically generate HTML. We'll delve into popular template engines like Jinja2, understanding how to create dynamic and reusable templates for rendering data.
IV. Database Integration:
4.1 Relational Databases (SQLAlchemy):
Python frameworks offer seamless integration with relational databases through libraries like SQLAlchemy. We'll cover database modeling, querying, and migrations, ensuring efficient data storage and retrieval.
4.2 NoSQL Databases (MongoDB with Flask):
For projects requiring flexibility in data storage, we'll explore integrating Flask with MongoDB, a popular NoSQL database. This section covers basic CRUD operations and demonstrates the advantages of using a document-oriented database.
V. RESTful APIs and Web Services:
5.1 Building RESTful API
React Native and the future of web technology (Mark Wilcox) - GreeceJS #15GreeceJS
What's all the hype about React Native? What is it? How does it work? Why does it matter and what clues does it give us about the future of web development? Did you know there's a React Native for the Web? What's that all about? It can't be all good, what's wrong with it? Where should you go to find out more?
Session at Mozilla Camp Europe 2011 in Berlin, Germany by Jay Patel & Jean-Yves Perrier about our work on the Mozilla Developer Network (MDN). Jay covers the evolution of MDN as a platform for developer engagement and Jean-Yves discusses our Web documentation efforts.
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for CypherpunksDavid Evans
David Evans
DC Area Crypto Day
Johns Hopkins University
30 October 2015
This (non-research) talk will start with a tutorial introduction to cryptocurrencies and how bitcoin works (and doesn’t work) today. We’ll touch on some of the legal, policy, and business aspects of bitcoin and discuss some potential research opportunities in cryptocurrencies.
Presentation by Brett Baker, Web Manager at The Children's Aid Society given at Drupal Camp Atlanta 2010 on October 2, 2010. The talk discussed how a single person or small team can leverage the Drupal CMS to tackle difficult deliverables.
Slides for talk on "DataWeb: Three worlds Collide" given at IWMW 1998.
See http://www.ukoln.ac.uk/web-focus/events/workshops/webmaster-sep1998/materials/
The development of better library information systems will always remain the core business of any serious library organization, but a shift took place towards (freely) available web-based tools for creating and managing the information workflow.
End-users are not only using these heavily, but are also creating their own preferred tools. Today's students are incorporating Web 2.0 skills in daily life, in their social and learning environments. Tomorrow's academic staff will expect to be able to use their preferred tools and resources within their work environment. Today's ánd tomorrow's libraries should support students and staff in the learning and research process by integrating their services and resources into our patrons' environments.
This practical workshop will demonstrate the use of Web 2.0 technology to empower users and librarians. During a hands-on session, participants will work with these tools. They will develop tailor-made services via personal start page software like Netvibes, making use of RSS-feeds, Widgets and Browser extensions.
We will explore the use of Netvibes and Web 2.0 tools in library staff and/or library user education/instruction. We will focus on library services which can be created almost on-the-fly with low costs and high impact. The growing use of social networks justifies the development of a library presence within these networks to reach out to our users.
Paper, slides and recommended reading : http://www.tilburguniversity.nl/services/lis/ticer/08carte/recommendedreading.html#brekel
Web 2.0: Beyond the Hype.” Usability Professionals Association, Minneapolis M...Samantha Bailey
Presentation deconstructing the "web 2.0" meme that was feverishly taking over the web following the widespread adoption of AJAX programming techniques.
"Python web development combines the simplicity of the language with powerful...softwaretrainer2elys
Title: Exploring Web Development with Python: A Comprehensive Guide
Introduction:
Web development has become an integral part of the modern technological landscape, and Python has emerged as a versatile and powerful language for building web applications. In this comprehensive guide, we will delve into the various aspects of web development using Python, exploring frameworks, libraries, and best practices to create dynamic and scalable web applications.
I. Understanding the Basics of Web Development:
1.1 HTML, CSS, and JavaScript:
Before delving into Python-specific frameworks, it's essential to grasp the fundamentals of web development. HTML provides the structure, CSS adds styling, and JavaScript adds interactivity to web pages. These technologies form the backbone of web development regardless of the programming language used.
1.2 Introduction to Python for Web Development:
Python's readability, simplicity, and extensive libraries make it an excellent choice for web development. Familiarizing yourself with basic Python syntax, data structures, and control flow is crucial before diving into web-specific frameworks.
II. Python Web Frameworks:
2.1 Flask:
Flask is a lightweight and easy-to-use web framework that follows the WSGI (Web Server Gateway Interface) standard. It's ideal for small to medium-sized projects and encourages simplicity and flexibility. We'll explore how to set up a basic Flask application, define routes, and render dynamic templates.
2.2 Django:
Django, a high-level web framework, follows the "batteries-included" philosophy, providing a robust set of features out of the box. From database migrations to user authentication, Django simplifies complex tasks and promotes best practices. We'll cover creating a Django project, defining models, and building views and templates.
III. Frontend Development with Python:
3.1 JavaScript Integration:
While Python handles server-side logic, JavaScript is crucial for client-side interactivity. We'll explore methods to integrate JavaScript frameworks like React or Vue.js into Python-based web applications, allowing for a seamless user experience.
3.2 Template Engines:
Python web frameworks often use template engines to dynamically generate HTML. We'll delve into popular template engines like Jinja2, understanding how to create dynamic and reusable templates for rendering data.
IV. Database Integration:
4.1 Relational Databases (SQLAlchemy):
Python frameworks offer seamless integration with relational databases through libraries like SQLAlchemy. We'll cover database modeling, querying, and migrations, ensuring efficient data storage and retrieval.
4.2 NoSQL Databases (MongoDB with Flask):
For projects requiring flexibility in data storage, we'll explore integrating Flask with MongoDB, a popular NoSQL database. This section covers basic CRUD operations and demonstrates the advantages of using a document-oriented database.
V. RESTful APIs and Web Services:
5.1 Building RESTful API
React Native and the future of web technology (Mark Wilcox) - GreeceJS #15GreeceJS
What's all the hype about React Native? What is it? How does it work? Why does it matter and what clues does it give us about the future of web development? Did you know there's a React Native for the Web? What's that all about? It can't be all good, what's wrong with it? Where should you go to find out more?
Session at Mozilla Camp Europe 2011 in Berlin, Germany by Jay Patel & Jean-Yves Perrier about our work on the Mozilla Developer Network (MDN). Jay covers the evolution of MDN as a platform for developer engagement and Jean-Yves discusses our Web documentation efforts.
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for CypherpunksDavid Evans
David Evans
DC Area Crypto Day
Johns Hopkins University
30 October 2015
This (non-research) talk will start with a tutorial introduction to cryptocurrencies and how bitcoin works (and doesn’t work) today. We’ll touch on some of the legal, policy, and business aspects of bitcoin and discuss some potential research opportunities in cryptocurrencies.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Large Language Model (LLM) and it’s Geospatial Applications
Class 39: ...and the World Wide Web
1. Lecture 39:
…and the
World Wide
Web
cs1120 Fall 2011
David Evans
http://www.cs.virginia.edu/evans
2. Announcements
Exam 2 due 61 seconds ago!
70
69
68
67
66
65
64
63
62
60
Friday: we will return graded Exam 2, along with
guidance about the Final
Must be present (or email me in advance) to win!
If you want to present your PS8 in class Monday, remember to email me!
2
3. Plan
The World Wide Web
Building Web Applications
How Google Works
(or, going back to pre-PS5 to make things
really fast again!)
cs1120 recap in one (heavily animated) slide!
3
7. Overview:
Many of the discussions of the
future at CERN and the LHC era
end with the question – “Yes, but
how will we ever keep track of
such a large project?” This
proposal provides an answer to
such questions. Firstly, it
discusses the problem of
information access at CERN.
Then, it introduces the idea of
linked information systems, and
compares them with less flexible
ways of finding information.
http://www.w3.org/History/1989/proposal-msw.html
10. WorldWideWeb
Established a common language for sharing
information on computers
Lots of previous attempts (Gopher, WAIS,
Archie, Xanadu, etc.) failed
10
11. Why the World Wide Web?
World Wide Web succeeded because it was simple!
Didn’t attempt to maintain links, just a common
way to name things
Uniform Resource Locators (URL)
http://www.cs.virginia.edu/cs1120/index.html
Service Hostname File Path
HyperText Transfer Protocol
12. HyperText Transfer Protocol
Server
GET /cs1120/index.html HTTP/1.0
<html>
<head> Contents
… of file
Client (Browser) HTML
HyperText Markup Language
13. HTML: HyperText Markup Language
Language for controlling display of web pages
Uses formatting tags: between < and >
Document ::= <html> Header Body </html>
Header ::= <head> HeadElements </head>
HeadElements ::= HeadElement HeadElements
HeadElements ::= ε | <title> Element </title>
Body ::= <body> Elements </body>
Elements ::= ε | Element Elements
Element ::= <p> Element </p>
Element ::= <center> Element </center>
…
14. Popular Web Site: Strategy 1
Static, Authored Web Site
Drawbacks:
•Have to do all the
work yourself
•The world may
already have enough
Twinkie-experiment
websites
Content Producer
http://www.twinkiesproject.com/
15. Popular Web Site: Strategy 2
Dynamic Web Applications
Attracts users
Seed content and
function
Web Programmer
Produce more
content
eBay in 1997
http://web.archive.org/web/19970614001443/http://www.ebay.com/
16. Popular Web Site: Strategy 2
Dynamic Web Applications
Attracts users
Seed content and
function
Advantages:
• Users do most of the work
• If you’re lucky, they might even pay you
for the privilege!
Disadvantages:
• Lose control over the content (you might
Produce more
get sued for things your users do)
content reddit.com today
• Have to know how to program a web
application
reddit.com in 2005
17. Dynamic Web Sites
Programs that run on the web server
Can be written in any language (often in Python or Java), just
need a way to connect the web server to the program
Program generates HTML (often JavaScript also now)
Every useful web site does this
Programs that run on the client’s machine
Java, JavaScript (aka, “Scheme for the Web”), Flash, etc.:
language must be supported by the client’s browser
Responsive interface: limited round-trips to server
20. Building a Web Search Engine
Database of web pages
Crawling the web collecting pages and links
Indexing them efficiently
Responding to Searches
Spell checking – edit distance
How to find documents that match a query
How to rank the “best” documents
21. Crawling Crawler
activeURLs = * “www.yahoo.com” +
while (len(activeURLs) > 0) :
newURLs = [ ]
for URL in activeURLs:
page = downloadPage (URL)
newURLs += extractLinks (page)
activeURLs = newURLs
Problems:
Will keep revisiting the same pages
Will take very long to get a good view of the web
Will annoy web server admins
downloadPage and extractLinks must be very robust
22. Building a Web Search Engine
Database of web pages
Crawling the web collecting pages and links
Indexing them efficiently
Responding to Searches
How to find documents that match a query
How to rank the “best” documents
23. Building an Index
What if we just stored all the pages?
Answering a query would be (size of the database)
(need to look at all characters in database)
Google: about 40 Billion pages (1 Trillion URLs, but number
actually indexed is a closely kept corporate secret)
* 60 KB (average web page size)
= ~2.4 Quadrillion bytes to search!
Linear is not nearly good enough when n is Quadrillions
24. Hash Table
Index Key-Value Pairs
0 , <“Colleen”, ? >, <“virginia”, ? >, … -
1 , <“Bob”, ? >, … -
2
3
…
[about a million bins?]
def lookup(key, table) : searchEntries(table[H(key, len(table))])
Finding a good H is difficult
You can download google’s from
http://code.google.com/p/google-sparsehash/
25. Google’s Lexicon
1998: 14 million words (billions today?)
Lookup word in H(word, nbins): maps to WordID
Key Words
0 *<“aardvark”, 1024235>, ... +
1 *<“aaa”, 224155>, ..., <“zzz”, 29543> +
... ...
nbins – 1 *<“abba”, 25583>, ..., <“zeit”, 50395> +
26. Google’s Reverse Index
(Based on 1998 paper…definitely changed some since then, but now they are secretive!)
WordId ndocs pointer
00000000 3
00000001 15
... “Inverted
Barrels”:
16777215 105 41 GB (1998)
Today: many TB?
Lexicon: 293 MB (1998)
Today: many GB?
27. Inverted Barrels
docid (27 bits) nhits (5 bits) hits (16 bits
each) plain hit:
capitalized: 1 bit
7630486927 23 font size: 3 bits
position: 12 bits
... first 4095 chars,
everything else
extra info for
anchors, titles
(less position bits)
Suggested experiment for winter break:
is the position field still only 12 bits?
28. Building a Web Search Engine
Database of web pages
Crawling the web collecting pages and links
Indexing them efficiently
Responding to Searches
Spell checking – edit distance
How to find documents that match a query
How to rank the “best” documents
29. Finding the “Best” Documents
Humans rate them
“Jerry and David’s Guide to the World Wide Web”
(became Yahoo!)
Machines rate them
Count number of occurrences of keyword
Easy for sites to rig this
Machine language understanding not good enough
Business Model
Whoever pays you the most is listed first
30. PageRank
If a site is important and interesting, other sites
will link to it.
Don’t ever take <a href=http://www.cs.virginia.edu/cs1120>cs1120</a>!
But…not all links are equal:
if a lot of highly-ranked sites link to this site,
this site should be highly-ranked.
30
31. PageRank
def pageRank (u):
rank = 0
for b in linksToPage (u)
rank = rank + PageRank (b) / Links (b)
return rank
Would this work?
32. Converging PageRank
Ranks of all pages depend on ranks of all other
pages
Keep recalculating ranks until they converge
def CalculatePageRanks (urls):
initially, every rank is 1
for as many times as necessary
calculate a new rank for each page (using old ranks)
replace the old ranks with the new ranks
How do initial ranks effect results?
How many iterations are necessary?
33. PageRank: 1998
Crawlable web (1998):
150 million pages, 1.7 Billion links
Database of 322 million links
Converges in about 50 iterations
Initialization matters
All pages = 1: very democratic, models browser
equally likely to start on random page
www.yahoo.com = 1, ..., all others = 0
More like what Google probably uses
34. Do we have a
search engine?
Theoretician: Sure!
Ali G: No way! It’ll blow up.
Google’s First Server
34
35. How do we make our service fast
enough to index the whole web
and serve billions of requests?
35
36. Counting Word Occurrences
“When in the Course of human events, it
* <“When”, 1>,
becomes necessary for one people to dissolve
<“in”, 1>,
the political bands which have connected them
<“the”, 2>
with another, …”
…+
“We the People of the United States, in Order * <“We”, 1>,
to form a more perfect Union, establish Justice, <“in”, 1>,
insure domestic Tranquility, provide for the …” <“the”, 2>
…+
map(doc, countWords)
If we have enough machines, can we do this fast for the whole web?
36
39. Key to Massive Parallel Execution
Get rid of state and mutation!
39
40. (define (count-matches p b) Functional Programming
(list-sum (map (lambda (v) (if (eq? v b) 1 0)) p))) (PS 1-4)
def meval(expr, env):
Interpreters
… return evalApplication(expr, env)
... # 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 #
...
Any Mechanical
1 3 Turing Machine
2 Computation
A B C R1 R0
(or a b)
0 0 0 0 0
(not (and (not a) 0 0 1 0 1 Any Discrete Function
(not b))) … … … … …
AND NOT Mechanical Logic
“Magic” Transistors
40
41. (define (count-matches p b) Functional Programming
(list-sum (map (lambda (v) (if (eq? v b) 1 0)) p))) (PS 1-4)
def meval(expr, env):
Interpreters
… return evalApplication(expr, env)
... # 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 #
...
Any Mechanical
1 3 Turing Machine
2 Computation
A B C R1 R0
(or a b)
0 0 0 0 0
(not (and (not a) 0 0 1 0 1 Any Discrete Function
(not b))) … … … … …
AND NOT Mechanical Logic
“Magic” Transistors
42. SimObject
PhysicalObject Objects
Place
MobileObject
m1: State and Mutation
1 2 3
(define (count-matches p b) Functional Programming
(list-sum (map (lambda (v) (if (eq? v b) 1 0)) p))) (PS 1-4)
def meval(expr, env):
Interpreters
… return evalApplication(expr, env)
... # 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 #
...
Any Mechanical
1 3 Turing Machine
2 Computation
A B C R1 R0
(or a b)
43. SimObject
PhysicalObject Objects
Place
MobileObject
m1: State and Mutation
1 2 3
(define (count-matches p b) Functional Programming
(list-sum (map (lambda (v) (if (eq? v b) 1 0)) p))) (PS 1-4)
def meval(expr, env):
Interpreters
… return evalApplication(expr, env)
... # 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 #
...
Any Mechanical
1 3 Turing Machine
2 Computation
A B C R1 R0
(or a b)
44. Objects
Recursive Definitions
State and Mutation
Functional Programming
Charge
(PS 1-4)
Universality
Abstraction
Now, you know
Interpreters
almost everything
you need to build the
Any Mechanical
Computation next reddit or
google!
Any Discrete Function
Mechanical Logic
“Magic” Transistors