Too few projects demand good API design as a critical goal. A clean and
extensible API will pay for itself many times over in fostering a community of
plugins. We certainly cannot anticipate the ways in which our users will bend
our modules, but designing an extensible system alleviates the pain. There are
many lessons to be learned from Moose, HTTP::Engine and IM::Engine,
Dist::Zilla, KiokuDB, Fey, and TAEB.
The most important lesson is to decouple the core functionality from the
"fluff" such as sugar and middleware. This forces you to have a solid API that
ordinary users can extend. This also lets users write their own sugar and
middleware. In a tightly-coupled system, there is little hope for
extensibility.
In this talk, you will learn how to make very productive use of Moose's roles
to form the foundation of a pluggable system. Roles provide excellent means of
code reuse and safe composition. I will also demonstrate how to use
Sub::Exporter to construct a more useful and flexible sugar layer.
The document outlines the six stages of incident response: 1) Preparation, 2) Identification, 3) Containment, 4) Eradication, 5) Recovery, and 6) Lessons Learned. It describes the key activities and goals at each stage, including establishing an incident response team and plan, identifying and containing incidents, removing malicious content, restoring systems, and documenting lessons to improve future response. The goal is to effectively manage security incidents by following best practices at each phase of the incident response lifecycle.
Wrapper induction construct wrappers automatically to extract information f...George Ang
Wrapper induction is a technique to automatically generate wrappers to extract information from web sources. It involves learning extraction rules from labeled examples to construct a wrapper as a finite state machine or set of delimiters. Two main wrapper induction systems are WIEN, which defines wrapper classes including LR, and STALKER, which uses a more expressive model with extraction rules and landmarks to handle structure hierarchically. Remaining challenges include selecting informative examples, generating label pages automatically, and developing more expressive models.
This document summarizes a tutorial given by Bing Liu on opinion mining and summarization. The tutorial covered several key topics in opinion mining including sentiment classification at the document and sentence level, feature-based opinion mining and summarization, comparative sentence extraction, and opinion spam detection. The tutorial provided an overview of the field of opinion mining and abstraction as well as summaries of various approaches to tasks such as sentiment classification using machine learning methods and feature scoring.
The document provides an overview of Huffman coding, a lossless data compression algorithm. It begins with a simple example to illustrate the basic idea of assigning shorter codes to more frequent symbols. It then defines key terms like entropy and describes the Huffman coding algorithm, which constructs an optimal prefix code from the frequency of symbols in the data. The document discusses how the algorithm works, its advantages in achieving compression close to the source entropy, and some limitations. It also covers applications of Huffman coding like image compression.
Too few projects demand good API design as a critical goal. A clean and
extensible API will pay for itself many times over in fostering a community of
plugins. We certainly cannot anticipate the ways in which our users will bend
our modules, but designing an extensible system alleviates the pain. There are
many lessons to be learned from Moose, HTTP::Engine and IM::Engine,
Dist::Zilla, KiokuDB, Fey, and TAEB.
The most important lesson is to decouple the core functionality from the
"fluff" such as sugar and middleware. This forces you to have a solid API that
ordinary users can extend. This also lets users write their own sugar and
middleware. In a tightly-coupled system, there is little hope for
extensibility.
In this talk, you will learn how to make very productive use of Moose's roles
to form the foundation of a pluggable system. Roles provide excellent means of
code reuse and safe composition. I will also demonstrate how to use
Sub::Exporter to construct a more useful and flexible sugar layer.
The document outlines the six stages of incident response: 1) Preparation, 2) Identification, 3) Containment, 4) Eradication, 5) Recovery, and 6) Lessons Learned. It describes the key activities and goals at each stage, including establishing an incident response team and plan, identifying and containing incidents, removing malicious content, restoring systems, and documenting lessons to improve future response. The goal is to effectively manage security incidents by following best practices at each phase of the incident response lifecycle.
Wrapper induction construct wrappers automatically to extract information f...George Ang
Wrapper induction is a technique to automatically generate wrappers to extract information from web sources. It involves learning extraction rules from labeled examples to construct a wrapper as a finite state machine or set of delimiters. Two main wrapper induction systems are WIEN, which defines wrapper classes including LR, and STALKER, which uses a more expressive model with extraction rules and landmarks to handle structure hierarchically. Remaining challenges include selecting informative examples, generating label pages automatically, and developing more expressive models.
This document summarizes a tutorial given by Bing Liu on opinion mining and summarization. The tutorial covered several key topics in opinion mining including sentiment classification at the document and sentence level, feature-based opinion mining and summarization, comparative sentence extraction, and opinion spam detection. The tutorial provided an overview of the field of opinion mining and abstraction as well as summaries of various approaches to tasks such as sentiment classification using machine learning methods and feature scoring.
The document provides an overview of Huffman coding, a lossless data compression algorithm. It begins with a simple example to illustrate the basic idea of assigning shorter codes to more frequent symbols. It then defines key terms like entropy and describes the Huffman coding algorithm, which constructs an optimal prefix code from the frequency of symbols in the data. The document discusses how the algorithm works, its advantages in achieving compression close to the source entropy, and some limitations. It also covers applications of Huffman coding like image compression.
Do not crawl in the dust different ur ls similar textGeorge Ang
The document describes the DustBuster algorithm for discovering DUST rules - rules that transform one URL into another URL that contains similar content. The algorithm takes as input a list of URLs from a website and finds valid DUST rules without requiring any page fetches. It detects likely DUST rules based on a large support principle and small buckets principle. It then eliminates redundant rules and validates the remaining rules using a sample of URLs to identify rules that transform URLs with similar content. Experimental results on logs from two websites show that DustBuster is able to discover DUST rules that can help improve crawling efficiency.
The document discusses techniques for optimizing front-end web performance. It provides examples of how much time is spent loading different parts of top websites, both with empty caches and full caches. The "performance golden rule" is that 80-90% of end-user response time is spent on the front-end. The document also outlines Yahoo's 14 rules for performance optimization, which include making fewer HTTP requests, using content delivery networks, adding Expires headers, gzipping components, script and CSS placement, and more.
Do not crawl in the dust different ur ls similar textGeorge Ang
The document describes the DustBuster algorithm for discovering DUST rules - rules that transform one URL into another URL that contains similar content. The algorithm takes as input a list of URLs from a website and finds valid DUST rules without requiring any page fetches. It detects likely DUST rules based on a large support principle and small buckets principle. It then eliminates redundant rules and validates the remaining rules using a sample of URLs to identify rules that transform URLs with similar content. Experimental results on logs from two websites show that DustBuster is able to discover DUST rules that can help improve crawling efficiency.
The document discusses techniques for optimizing front-end web performance. It provides examples of how much time is spent loading different parts of top websites, both with empty caches and full caches. The "performance golden rule" is that 80-90% of end-user response time is spent on the front-end. The document also outlines Yahoo's 14 rules for performance optimization, which include making fewer HTTP requests, using content delivery networks, adding Expires headers, gzipping components, script and CSS placement, and more.