This document proposes a framework called ALFRED for learning web wrappers from crowdsourced inputs. ALFRED uses a Bayesian model to evaluate candidate extraction rules based on worker responses to membership queries. It employs techniques like active learning to select the most informative queries. Additionally, it dynamically expands the expressiveness of the extraction language if needed, inspired by structural risk minimization, in order to find accurate rules while minimizing queries. The framework is evaluated on real datasets and is shown to significantly reduce the number of queries needed compared to static approaches, while maintaining high precision and recall.
Start programming in a more functional style in Java. This is the second in a two part series on lambdas and streams in Java 8 presented at the JoziJug.
The document outlines Java 8's Stream API. It discusses stream building blocks like default methods, functional interfaces, lambda expressions, and method references. It describes characteristics of streams like laziness and parallelization. It covers creating streams from collections, common functional interfaces, and the anatomy of a stream pipeline including intermediate and terminal operations. It provides examples of common stream API methods like forEach, map, filter, findFirst, toArray, collect, and reduce.
This document provides an overview and tutorial on querying DBpedia using the Jena framework. It introduces Jena and its capabilities for working with RDF data, describes how to set up a development environment in Netbeans or Eclipse, and provides examples of querying DBpedia's SPARQL endpoint to retrieve information about people and locations. Other APIs for working with RDF in languages like PHP, Python, and C are also briefly mentioned.
This document contains the notes from a presentation on best practices for Java 8. It discusses 10 topics: 1) general adoption of Java 8, 2) lambdas and method references, 3) functional interfaces, 4) Optional, 5) streams and collections, 6) streams outside of collections, 7) functional programming with strings, 8) parallel streams, 9) simplifying design patterns with functional programming, and 10) other Java 8 features. For each topic, it provides high-level best practices such as preferring method references to lambdas, avoiding null returns, and testing that parallel streams provide real performance benefits.
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsBertram Ludäscher
Presentation slides of paper by Shawn Bowers, Timothy McPhillips, and Bertram Ludäscher, given by Shawn at Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, King's College London, UK, July 9-10, 2018.
The paper won a the IPAW best paper award: https://twitter.com/kbelhajj/status/1017082775856467968
ABSTRACT. An advantage of scientific workflow systems is their ability to collect runtime provenance information as an execution trace. Traces include the computation steps invoked as part of the workflow run along with the corresponding data consumed and produced by each workflow step. The information captured by a trace is used to infer "lineage'' relationships among data items, which can help answer provenance queries to find workflow inputs that were involved in producing specific workflow outputs. Determining lineage relationships, however, requires an understanding of the dependency patterns that exist between each workflow step's inputs and outputs, and this information is often under-specified or generally assumed by workflow systems. For instance, most approaches assume all outputs depend on all inputs, which can lead to lineage "false positives''. In prior work, we defined annotations for specifying detailed dependency relationships between inputs and outputs of computation steps. These annotations are used to define corresponding rules for inferring fine-grained data dependencies from a trace. In this paper, we extend our previous work by considering the impact of dependency annotations on workflow specifications. In particular, we provide a reasoning framework to ensure the set of dependency annotations on a workflow specification is consistent. The framework can also infer a complete set of annotations given a partially annotated workflow. Finally, we describe an implementation of the reasoning framework using answer-set programming.
Java 8 came out early last year and Java 7 is now, at the end of life, making Java 8 the only Oracle supported option. However, since developers value stability over trendiness, many of us are still working with Java 7, or even 6. Let’s look at some features of Java 8, and provide some arguments to persuade your code to upgrade with best practices.
The document describes the Jena framework, which is a Java API for building semantic web and linked data applications. It allows for parsing, creating, querying and inferencing over RDF data. The key classes and interfaces in Jena include the Model interface for representing RDF graphs, classes for creating resources, properties and literals, interfaces for representing statements and querying models. Jena supports reading/writing RDF files, working with ontologies and rules, and includes a SPARQL query engine.
Start programming in a more functional style in Java. This is the second in a two part series on lambdas and streams in Java 8 presented at the JoziJug.
The document outlines Java 8's Stream API. It discusses stream building blocks like default methods, functional interfaces, lambda expressions, and method references. It describes characteristics of streams like laziness and parallelization. It covers creating streams from collections, common functional interfaces, and the anatomy of a stream pipeline including intermediate and terminal operations. It provides examples of common stream API methods like forEach, map, filter, findFirst, toArray, collect, and reduce.
This document provides an overview and tutorial on querying DBpedia using the Jena framework. It introduces Jena and its capabilities for working with RDF data, describes how to set up a development environment in Netbeans or Eclipse, and provides examples of querying DBpedia's SPARQL endpoint to retrieve information about people and locations. Other APIs for working with RDF in languages like PHP, Python, and C are also briefly mentioned.
This document contains the notes from a presentation on best practices for Java 8. It discusses 10 topics: 1) general adoption of Java 8, 2) lambdas and method references, 3) functional interfaces, 4) Optional, 5) streams and collections, 6) streams outside of collections, 7) functional programming with strings, 8) parallel streams, 9) simplifying design patterns with functional programming, and 10) other Java 8 features. For each topic, it provides high-level best practices such as preferring method references to lambdas, avoiding null returns, and testing that parallel streams provide real performance benefits.
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsBertram Ludäscher
Presentation slides of paper by Shawn Bowers, Timothy McPhillips, and Bertram Ludäscher, given by Shawn at Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, King's College London, UK, July 9-10, 2018.
The paper won a the IPAW best paper award: https://twitter.com/kbelhajj/status/1017082775856467968
ABSTRACT. An advantage of scientific workflow systems is their ability to collect runtime provenance information as an execution trace. Traces include the computation steps invoked as part of the workflow run along with the corresponding data consumed and produced by each workflow step. The information captured by a trace is used to infer "lineage'' relationships among data items, which can help answer provenance queries to find workflow inputs that were involved in producing specific workflow outputs. Determining lineage relationships, however, requires an understanding of the dependency patterns that exist between each workflow step's inputs and outputs, and this information is often under-specified or generally assumed by workflow systems. For instance, most approaches assume all outputs depend on all inputs, which can lead to lineage "false positives''. In prior work, we defined annotations for specifying detailed dependency relationships between inputs and outputs of computation steps. These annotations are used to define corresponding rules for inferring fine-grained data dependencies from a trace. In this paper, we extend our previous work by considering the impact of dependency annotations on workflow specifications. In particular, we provide a reasoning framework to ensure the set of dependency annotations on a workflow specification is consistent. The framework can also infer a complete set of annotations given a partially annotated workflow. Finally, we describe an implementation of the reasoning framework using answer-set programming.
Java 8 came out early last year and Java 7 is now, at the end of life, making Java 8 the only Oracle supported option. However, since developers value stability over trendiness, many of us are still working with Java 7, or even 6. Let’s look at some features of Java 8, and provide some arguments to persuade your code to upgrade with best practices.
The document describes the Jena framework, which is a Java API for building semantic web and linked data applications. It allows for parsing, creating, querying and inferencing over RDF data. The key classes and interfaces in Jena include the Model interface for representing RDF graphs, classes for creating resources, properties and literals, interfaces for representing statements and querying models. Jena supports reading/writing RDF files, working with ontologies and rules, and includes a SPARQL query engine.
"SPARQL Cheat Sheet" is a short collection of slides intended to act as a guide to SPARQL developers. It includes the syntax and structure of SPARQL queries, common SPARQL prefixes and functions, and help with RDF datasets.
The "SPARQL Cheat Sheet" is intended to accompany the SPARQL By Example slides available at http://www.cambridgesemantics.com/2008/09/sparql-by-example/ .
The slides of my university talk at Devoxx BE 2015. Presentation of the Java 8 Stream API and RxJava, pattern and performance comparisons. Presentation of the upcoming reactive API in Java 9: the Flow API.
The spliterators patterns can be found here: https://github.com/JosePaumard/jdk8-spliterators.
This document provides an overview of functional programming patterns and abstractions in Java 8. It introduces common functional concepts like functors, applicative functors, monads, and type classes. It then describes implementations of these concepts for option, either, stream, future, and completable future in Java. The document explains how functional patterns allow for simpler composition and avoidance of null checks compared to imperative programming with conditionals.
This document discusses processing SPARQL queries using Java with ARQ. It demonstrates how to execute a SPARQL query on an ontology model, print the results, and analyze various aspects of the query such as retrieving result variables, analyzing query elements like triple patterns, and examining the prefix mappings and expressions. The document provides an overview of executing SPARQL queries programmatically using the ARQ processor for Jena.
Lambdas and streams are key new features in Java 8. Lambdas allow blocks of code to be passed around as if they were objects. Streams provide an abstraction for processing collections of objects in a declarative way using lambdas. Optional is a new class that represents null-safe references and helps avoid null pointer exceptions. Checked exceptions can cause issues with lambdas, so helper methods are recommended to convert checked exceptions to unchecked exceptions.
The document introduces Jena, an open source Java framework for building semantic web and linked data applications, describing how it can be used to process RDF and OWL models, perform inference using reasoners, and query data using SPARQL. It also provides instructions on installing and getting started with Jena, and includes examples of creating and querying RDF models, performing inference, and connecting to a MySQL database for persistence.
This document contains the slides for a presentation on Java 8 Lambdas and Streams. The presentation will cover lambdas, including their concept, syntax, functional interfaces, variable capture, method references, and default methods. It will also cover streams. The slides provide some incomplete definitions that will be completed during the presentation. Questions from attendees are welcome. A quick survey asks about past experience with lambdas and streams.
This document summarizes the new features in JDK 8, including lambda expressions and method references that allow for functional programming in Java, stream API enhancements for aggregate operations on collections and arrays, annotations on Java types for additional type checking and metadata, preserving method parameter names in bytecode, improvements to BigInteger, StringJoiner and Base64 classes, and additional concurrency, security, and JavaScript engine enhancements.
SPARQL 1.1 introduced several new features including:
- Updated versions of the SPARQL Query and Protocol specifications
- A SPARQL Update language for modifying RDF graphs
- A protocol for managing RDF graphs over HTTP
- Service descriptions for describing SPARQL endpoints
- Basic federated query capabilities
- Other minor features and extensions
Presentation provides introduction and detailed explanation of the Java 8 Lambda and Streams. Lambda covers with Method references, default methods and Streams covers with stream operations,types of streams, collectors. Also streams are elaborated with parallel streams and benchmarking comparison of sequential and parallel streams.
Additional slides are covered with Optional, Splitators, certain projects based on lambda and streams
This document summarizes upcoming language features in Java, including local variable type inference, raw string literals, expression switch, pattern matching, records, and value types. It discusses the motivation and design of each feature, providing examples. The document indicates that Java releases will now occur every six months and language changes will be more frequent, with new features targeting each release.
Lambda expressions, default methods in interfaces, and the new date/time API are among the major new features in Java 8. Lambda expressions allow for functional-style programming by treating functionality as a method argument or anonymous implementation. Default methods add new capabilities to interfaces while maintaining backwards compatibility. The date/time API improves on the old Calendar and Date APIs by providing immutable and easier to use classes like LocalDate.
These slides are a brief update on the status of the work of the current SPARQL Working Group. "SPARQL 1.1" collectively refers to the upcoming versions of the SPARQL query language, SPARQL update language, and other deliverables of the 2nd (current) SPARQL Working Group.
This document provides an introduction and overview of the Python programming language. It discusses what Python is, its features, applications, and how to install Python on Windows and Linux systems. It also covers Python basics like variables, data types, operators, comments, conditional statements like if/else, and loops like for, while, and nested loops. Examples are provided for key concepts. The document is intended as a beginner tutorial for learning Python.
The slides of my university talk, Devoxx Belgium 2016.
The goal of this talk is to compare the two most popular implementations of List: LinkedList and ArrayList, and provide hints on which one to use in what case.
Productive Programming in Java 8 - with Lambdas and Streams Ganesh Samarthyam
The document provides an overview of lambda expressions and functional interfaces in Java 8. It discusses key concepts like lambda functions, built-in functional interfaces like Predicate and Consumer, and how they can be used with streams. Examples are provided to demonstrate using lambdas with built-in interfaces like Predicate to filter a stream and Consumer to forEach over a stream. The document aims to help readers get hands-on experience coding with lambdas and streams in Java 8.
The document introduces Jena, a Java framework for building Semantic Web applications. It discusses key Semantic Web technologies like RDF, RDFS, OWL and SPARQL. It also provides an overview of Jena's features for manipulating RDF graphs and querying them using SPARQL. Examples are given of how to use Jena's RDF and SPARQL APIs.
Dependent types (and other ideas for guaranteeing correctness with types)radexp
A few strategies for protecting yourself from your future self's mistakes. Write more robust code by expressing constraints in the type signature and avoiding partial functions. Give your compiler the tools it needs to assist you. Let it help you.
Functional Thinking - Programming with Lambdas in Java 8Ganesh Samarthyam
Functional programming is on the rise. Almost all major and mainstream languages support functional programming features, including C++, Java, Swift, and Python, and Visual Basic. With Java 8’s lambda functions, Java now supports functional programming. Moving to functional programming can result in significantly better code and productivity gains. However, it requires a paradigm shift: you need to move away from imperative and object-oriented thinking to start thinking functionally. That’s what this workshop will help you achieve: it will help you make your shift towards functional programming. The workshop will introduce lambda functions in Java with examples from Java library itself. Presented in OSI Days 2015 workshop - http://osidays.com/osidays/shifting-to-functional-programming-lambdas-for-java-developers/
Wrapper Generation Supervised by a Noisy CrowdDisheng Qiu
We present solutions based on crowdsourcing platforms to support large-scale production of accurate wrappers around data-intensive websites.
Our approach is based on supervised wrapper induction algorithms which demand the burden of generating the training data to the workers of a crowdsourcing platform. Workers are paid for answering simple membership queries chosen by the system. We present two algorithms: a single worker algorithm (ALF) and a multiple workers algorithm (ALFRED). Both the algorithms deal with the inherent uncertainty of the responses and use an active learning approach to select the most informative queries.
ALFRED estimates the workers’ error rate to decide at runtime how many workers are needed. The experiments that we conducted on real and synthetic data are encouraging: our approach is able to produce accurate wrappers at a low cost, even in presence of workers with a significant error rate.
This document is the preface and introduction to a book titled "7 Secrets Of Permanent Fat Loss And Fitness" by an author who struggled with weight for many years and spent over a decade and $23,000 testing various diets and exercise programs to find what works best. He discovered a group of techniques and secrets that helped him lose 42 pounds and 10 inches from his waist while building muscle. The book aims to share these secrets that allow permanent fat loss with minimal time commitment. It also discusses qualities successful people possess that help them achieve their goals. The introduction sets up that mainstream diets and exercise programs often fail because they are too restrictive or require too much time, and the secrets shared in the book avoid these
"SPARQL Cheat Sheet" is a short collection of slides intended to act as a guide to SPARQL developers. It includes the syntax and structure of SPARQL queries, common SPARQL prefixes and functions, and help with RDF datasets.
The "SPARQL Cheat Sheet" is intended to accompany the SPARQL By Example slides available at http://www.cambridgesemantics.com/2008/09/sparql-by-example/ .
The slides of my university talk at Devoxx BE 2015. Presentation of the Java 8 Stream API and RxJava, pattern and performance comparisons. Presentation of the upcoming reactive API in Java 9: the Flow API.
The spliterators patterns can be found here: https://github.com/JosePaumard/jdk8-spliterators.
This document provides an overview of functional programming patterns and abstractions in Java 8. It introduces common functional concepts like functors, applicative functors, monads, and type classes. It then describes implementations of these concepts for option, either, stream, future, and completable future in Java. The document explains how functional patterns allow for simpler composition and avoidance of null checks compared to imperative programming with conditionals.
This document discusses processing SPARQL queries using Java with ARQ. It demonstrates how to execute a SPARQL query on an ontology model, print the results, and analyze various aspects of the query such as retrieving result variables, analyzing query elements like triple patterns, and examining the prefix mappings and expressions. The document provides an overview of executing SPARQL queries programmatically using the ARQ processor for Jena.
Lambdas and streams are key new features in Java 8. Lambdas allow blocks of code to be passed around as if they were objects. Streams provide an abstraction for processing collections of objects in a declarative way using lambdas. Optional is a new class that represents null-safe references and helps avoid null pointer exceptions. Checked exceptions can cause issues with lambdas, so helper methods are recommended to convert checked exceptions to unchecked exceptions.
The document introduces Jena, an open source Java framework for building semantic web and linked data applications, describing how it can be used to process RDF and OWL models, perform inference using reasoners, and query data using SPARQL. It also provides instructions on installing and getting started with Jena, and includes examples of creating and querying RDF models, performing inference, and connecting to a MySQL database for persistence.
This document contains the slides for a presentation on Java 8 Lambdas and Streams. The presentation will cover lambdas, including their concept, syntax, functional interfaces, variable capture, method references, and default methods. It will also cover streams. The slides provide some incomplete definitions that will be completed during the presentation. Questions from attendees are welcome. A quick survey asks about past experience with lambdas and streams.
This document summarizes the new features in JDK 8, including lambda expressions and method references that allow for functional programming in Java, stream API enhancements for aggregate operations on collections and arrays, annotations on Java types for additional type checking and metadata, preserving method parameter names in bytecode, improvements to BigInteger, StringJoiner and Base64 classes, and additional concurrency, security, and JavaScript engine enhancements.
SPARQL 1.1 introduced several new features including:
- Updated versions of the SPARQL Query and Protocol specifications
- A SPARQL Update language for modifying RDF graphs
- A protocol for managing RDF graphs over HTTP
- Service descriptions for describing SPARQL endpoints
- Basic federated query capabilities
- Other minor features and extensions
Presentation provides introduction and detailed explanation of the Java 8 Lambda and Streams. Lambda covers with Method references, default methods and Streams covers with stream operations,types of streams, collectors. Also streams are elaborated with parallel streams and benchmarking comparison of sequential and parallel streams.
Additional slides are covered with Optional, Splitators, certain projects based on lambda and streams
This document summarizes upcoming language features in Java, including local variable type inference, raw string literals, expression switch, pattern matching, records, and value types. It discusses the motivation and design of each feature, providing examples. The document indicates that Java releases will now occur every six months and language changes will be more frequent, with new features targeting each release.
Lambda expressions, default methods in interfaces, and the new date/time API are among the major new features in Java 8. Lambda expressions allow for functional-style programming by treating functionality as a method argument or anonymous implementation. Default methods add new capabilities to interfaces while maintaining backwards compatibility. The date/time API improves on the old Calendar and Date APIs by providing immutable and easier to use classes like LocalDate.
These slides are a brief update on the status of the work of the current SPARQL Working Group. "SPARQL 1.1" collectively refers to the upcoming versions of the SPARQL query language, SPARQL update language, and other deliverables of the 2nd (current) SPARQL Working Group.
This document provides an introduction and overview of the Python programming language. It discusses what Python is, its features, applications, and how to install Python on Windows and Linux systems. It also covers Python basics like variables, data types, operators, comments, conditional statements like if/else, and loops like for, while, and nested loops. Examples are provided for key concepts. The document is intended as a beginner tutorial for learning Python.
The slides of my university talk, Devoxx Belgium 2016.
The goal of this talk is to compare the two most popular implementations of List: LinkedList and ArrayList, and provide hints on which one to use in what case.
Productive Programming in Java 8 - with Lambdas and Streams Ganesh Samarthyam
The document provides an overview of lambda expressions and functional interfaces in Java 8. It discusses key concepts like lambda functions, built-in functional interfaces like Predicate and Consumer, and how they can be used with streams. Examples are provided to demonstrate using lambdas with built-in interfaces like Predicate to filter a stream and Consumer to forEach over a stream. The document aims to help readers get hands-on experience coding with lambdas and streams in Java 8.
The document introduces Jena, a Java framework for building Semantic Web applications. It discusses key Semantic Web technologies like RDF, RDFS, OWL and SPARQL. It also provides an overview of Jena's features for manipulating RDF graphs and querying them using SPARQL. Examples are given of how to use Jena's RDF and SPARQL APIs.
Dependent types (and other ideas for guaranteeing correctness with types)radexp
A few strategies for protecting yourself from your future self's mistakes. Write more robust code by expressing constraints in the type signature and avoiding partial functions. Give your compiler the tools it needs to assist you. Let it help you.
Functional Thinking - Programming with Lambdas in Java 8Ganesh Samarthyam
Functional programming is on the rise. Almost all major and mainstream languages support functional programming features, including C++, Java, Swift, and Python, and Visual Basic. With Java 8’s lambda functions, Java now supports functional programming. Moving to functional programming can result in significantly better code and productivity gains. However, it requires a paradigm shift: you need to move away from imperative and object-oriented thinking to start thinking functionally. That’s what this workshop will help you achieve: it will help you make your shift towards functional programming. The workshop will introduce lambda functions in Java with examples from Java library itself. Presented in OSI Days 2015 workshop - http://osidays.com/osidays/shifting-to-functional-programming-lambdas-for-java-developers/
Wrapper Generation Supervised by a Noisy CrowdDisheng Qiu
We present solutions based on crowdsourcing platforms to support large-scale production of accurate wrappers around data-intensive websites.
Our approach is based on supervised wrapper induction algorithms which demand the burden of generating the training data to the workers of a crowdsourcing platform. Workers are paid for answering simple membership queries chosen by the system. We present two algorithms: a single worker algorithm (ALF) and a multiple workers algorithm (ALFRED). Both the algorithms deal with the inherent uncertainty of the responses and use an active learning approach to select the most informative queries.
ALFRED estimates the workers’ error rate to decide at runtime how many workers are needed. The experiments that we conducted on real and synthetic data are encouraging: our approach is able to produce accurate wrappers at a low cost, even in presence of workers with a significant error rate.
This document is the preface and introduction to a book titled "7 Secrets Of Permanent Fat Loss And Fitness" by an author who struggled with weight for many years and spent over a decade and $23,000 testing various diets and exercise programs to find what works best. He discovered a group of techniques and secrets that helped him lose 42 pounds and 10 inches from his waist while building muscle. The book aims to share these secrets that allow permanent fat loss with minimal time commitment. It also discusses qualities successful people possess that help them achieve their goals. The introduction sets up that mainstream diets and exercise programs often fail because they are too restrictive or require too much time, and the secrets shared in the book avoid these
Restaurant.com helps businesses increase profits without discounting by driving new customers and rewarding existing ones. As the largest restaurant deals website for 13 years, Restaurant.com fills tables through $50-100 gift cards businesses provide customers for as little as $6-10 each, gaining customers without losing value over time like traditional gift cards. Businesses choose how gift cards promote their brand and close sales while customers receive the flexible benefit of choosing from thousands of restaurant options.
The document discusses ALFRED, a system for extracting data from large datasets using crowdsourcing. It addresses challenges that arise from scaling up the number of non-expert workers, such as high worker error rates. The key contributions of ALFRED include using simple yes/no queries to reduce errors, active learning to select the most informative queries, and Bayesian modeling to evaluate wrapper quality while tolerating inaccurate responses.
O documento lista vários títulos de obras de arte relacionadas à Virgem Maria, incluindo Anunciações, Adorações dos Pastores e Madonnas. Ele também inclui o texto completo do Magnificat em português, que é o hino de louvor da Virgem Maria após a visita do anjo Gabriel.
Restaurant.com's Double Deals program allows businesses to attract new customers at no cost. Through the program, customers receive $50 worth of value when purchasing a $25 business voucher. Both the customer and business benefit, as customers receive a deal and businesses gain new customers. There is no upfront cost to businesses to participate, and they are paid for each voucher redeemed at their establishment. The program has been successfully running since 1999 and uses various marketing channels to promote participating businesses.
Restaurant.com is a website that has been filling empty restaurant tables for 13 years. It guarantees to drive new customers to restaurants at no cost to the restaurant. Restaurants can cancel their participation at any time with 30 days notice. Restaurant.com uses various marketing methods like websites, search engines, social media, emails and mobile apps to promote restaurants and drive customers. It provides restaurants with customer data and reports to help them make more money.
Tim La Mar presents tips for becoming your authentic "Brand You". The document recommends taking time for introspection, acknowledging past influences, and reflecting on what shapes your behaviors. It advises practicing behaviors that align with your values rather than others' expectations. While changing habits takes time and patience, recognizing patterns is key to correcting them. The document stresses staying open-minded during self-discovery, forgiving past mistakes, and expressing your authentic self externally through empathy, compassion, and helping others through sharing what you have learned.
This document summarizes a high school sports team's 2010-2011 season. It discusses the fans, coaches, different levels of the team from freshmen to varsity, their determination in becoming section champions and competing at the state level. It also thanks the senior players for their memories and contributions to the team.
La Unión Europea ha acordado un embargo petrolero contra Rusia en respuesta a la invasión de Ucrania. El embargo prohibirá las importaciones marítimas de petróleo ruso a la UE y pondrá fin a las entregas a través de oleoductos dentro de seis meses. Esta medida forma parte de un sexto paquete de sanciones de la UE destinadas a aumentar la presión económica sobre Moscú y privar al Kremlin de fondos para financiar su guerra.
The document provides an overview of Britain, including its history, government, geography, culture, and the countries that make up Britain - England, Wales, Scotland, and Northern Ireland. Some key points covered include Britain's ancient Celtic origins, conquest by the Romans and later Anglo-Saxon tribes, unification of England and Scotland in 1707, and the current political system with the Queen as head of state.
Nous nous engageons à améliorer
la rentabilité de nos clients grâce à l’utilisation
de technologies Internet de pointe maîtrisées
par des consultants compétents et orientés
service. Dans un contexte où les entreprises
prennent de plus en plus conscience de
l’importance d’Internet pour leur réussite et
leur développement, nous sommes prets à répondre
à leurs besoins avec une offre de marketing
Internet abordable et de sites Internet
innovants.
UNE STRATEGIE ORIENTEE CLIENT : WSI est à
l’écoute de ses clients pour mieux
comprendre leur activité, en s’appuyant sur
des retours d’information émanant de clients
et de consultants mais aussi sur des études
spécialisées, et ainsi les aider à déterminer
comment augmenter leur chiffre d’affaires en
ligne. L’expérience prouve que nos clients
recherchent un consultant de confiance,
capable de simplifier le marketing Internet et
de les aider à faire les meilleurs choix pour
développer leur activité. WSI observe sans
cesse les besoins de ses clients dans le
monde, mais ses consultants sont là où est
implantée votre entreprise pour mieux
appréhender le contexte local et ses
opportunités.
EN LIGNE AVEC LES LEADERS : En tant que
principal fournisseur de solutions de
marketing Internet pour les entreprises du
monde entier, WSI s’est aligné sur des
leaders industriels comme Google, Lyris,
SEMPO, MarketingSherpa, ReachLocal et
Webex. Autrement dit, avec HGWeb Consulting agence WSI, vous avez la
garantie d’une entreprise de pointe qui
développe et intègre les meilleures pratiques
de son secteur.
Retour sur la présentation de l'atelier "Ergonomie d'un site web" animé par Fred Colantonio.
Retrouvez les idées reçues, les principes élémentaires, etc.
This document provides an overview of Internet technology and applications, including the history and requirements of the World Wide Web. It discusses server-side and client-side programming languages, and covers topics like PHP programming, arrays, functions, and form handling in PHP.
Introduction to R for Learning Analytics ResearchersVitomir Kovanovic
The slides from my 2hr tutorial organised at 2018 Learning Analytics Summer Institute (LASI) at Teachers College, Columbia University on June 11, 2018.
The document discusses Java 8 Streams, which provide a way to process data in a functional style. Streams allow operations like filter, map, and reduce to be performed lazily on collections, arrays, or I/O sources. The key aspects of streams are that they are lazy, support both sequential and parallel processing, and represent a sequence of values rather than storing them. The document provides examples of using intermediate operations like filter and map and terminal operations like forEach and collect. It also discusses spliterators, which drive streams and allow parallelization, and functional interfaces which are used with lambda expressions in streams.
This document provides a summary of MapReduce algorithms. It begins with background on the author's experience blogging about MapReduce algorithms in academic papers. It then provides an overview of MapReduce concepts including the mapper and reducer functions. Several examples of recently published MapReduce algorithms are described for tasks like machine learning, finance, and software engineering. One algorithm is examined in depth for building a low-latency key-value store. Finally, recommendations are provided for designing MapReduce algorithms including patterns, performance, and cost/maintainability considerations. An appendix lists additional MapReduce algorithms from academic papers in areas such as AI, biology, machine learning, and mathematics.
The document discusses creating an optimized algorithm in R. It covers:
1) Background on R and some popular R packages and interfaces.
2) Optimizing code performance by using parallel computing techniques like multiple cores and high performance computing clusters.
3) Steps for writing functions in R, creating R packages, and optimizing code performance.
This document discusses various code quality tools such as FindBugs, PMD, and Checkstyle. It provides information on what each tool is used for, how to install plugins for them in Eclipse, and how to configure them for use with Ant builds. FindBugs looks for potential bugs in Java bytecode. PMD scans source code for coding mistakes, dead code, complicated expressions, and duplicate code. Checkstyle checks that code complies with coding style rules. The document explains how to download and configure each tool so it can be run from Eclipse or as part of an Ant build.
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
This presentation introduces Apache Flink, a massively parallel data processing engine which currently undergoes the incubation process at the Apache Software Foundation. Flink's programming primitives are presented and it is shown how easily a distributed PageRank algorithm can be implemented with Flink. Intriguing features such as dedicated memory management, Hadoop compatibility, streaming and automatic optimisation make it an unique system in the world of Big Data processing.
The Swift Compiler and Standard LibrarySantosh Rajan
The document discusses the Swift compiler, standard library, and key language features. It provides an overview of the Swift compiler's location and how to compile and run Swift programs from the command line. It also summarizes the main types, protocols, operators and global functions that are included in Swift's standard library.
This document provides an overview of basic Java programming concepts including:
- Java programs require a main method inside a class and use print statements for output.
- Java has primitive data types like int and double as well as objects. Variables are declared with a type.
- Control structures like if/else and for loops work similarly to other languages. Methods can call themselves recursively.
- Basic input is done through dialog boxes and output through print statements. Formatting is available.
- Arrays are objects that store multiple values of a single type and know their own length. Strings are immutable character arrays.
This document provides an overview and summary of new features in Java 8. It begins with the schedule and release dates for Java 8 from 2012 to 2014. The major changes covered include lambda expressions, which allow passing code as data and are enabled by default functional interfaces. The new date/time API provides a modern replacement for the legacy Date/Calendar APIs. Type annotations allow adding metadata to types. Compact profiles define modular class libraries. Overall, Java 8 aims to better support parallel programming through new language features and library APIs.
Capture the Flag (CTF) are information security challenges. They are fun, but they also provide a opportunity to practise for real-world security challenges.
In this talk we present the concept of CTF. We focus on some tools used by our team, which can also be used to solve real-world problems.
Tips And Tricks For Bioinformatics Software Engineeringjtdudley
This document provides tips and tricks for software engineering in bioinformatics. It discusses using object-oriented software design principles like encapsulation and inheritance. It also covers best practices like automating documentation, performance optimization, working with data using databases and file formats, parallel and distributed computing, hardware acceleration, and web services.
The document discusses compilers and web applications. It begins by stating that writing web applications involves many technologies like HTML, CSS, JavaScript frameworks, databases, and more. In contrast, writing a compiler mainly involves processing source code through phases like lexing, parsing, type checking, and code generation. The document focuses on the code generation and semantic analysis phases of compilers. It provides examples of Crystal code for a compiler and explains key concepts like the Program object that holds compiled program data.
The document discusses the phases of a compiler and analyzing source code semantically. It explains that semantic analysis involves processing the abstract syntax tree (AST) to perform type checking and declaration of types, methods, etc. The key phases are the top-level phase which declares classes, modules, and other top-level items, and the semantic visitor which analyzes nodes in the AST while tracking the current type and looking up declarations.
The document discusses compilers and web applications. It begins by stating that writing web applications involves many technologies like HTML, CSS, JavaScript frameworks, databases, and more. In contrast, writing a compiler mainly involves processing source code through phases like lexing, parsing, type checking, and code generation. The document focuses on the semantic analysis phase, noting it involves processing the abstract syntax tree and is easier than writing a web application since it only deals with one language rather than multiple technologies. It provides details on the Crystal compiler's implementation and organization.
The document discusses key concepts in Java programming including:
1. Java is an object-oriented programming language that is platform independent and allows developers to create applications, applets, and web applications.
2. The Java code is first compiled to bytecode, which can then be executed on any Java Virtual Machine (JVM) regardless of the underlying hardware or operating system.
3. Core Java concepts covered include classes, objects, encapsulation, inheritance, polymorphism, and abstraction. Operators, flow control statements, arrays, strings and object-oriented programming principles are also summarized.
This document provides an overview of the MySQL query optimizer. It discusses the main phases of the optimizer including logical transformations, cost-based optimizations, analyzing access methods, join ordering, and plan refinements. Logical transformations prepare the query for cost-based optimization by simplifying conditions. Cost-based optimizations select the optimal join order and access methods to minimize resources used. Access methods analyzed include table scans, index scans, and ref access. The join optimizer searches for the best join order. Plan refinements include sort avoidance and index condition pushdown.
This slides describes the basic concepts of industrial-strength compiler design. This includes basic concept of static single-assignment form (SSA) and various optimizations such as dead code elimination, global value numbering, constant propagation, etc. This is intend for a 150 minutes undergraduate compiler class.
Logical Expressions in C/C++. Mistakes Made by ProfessionalsPVS-Studio
In programming, a logical expression is a language construct that is evaluated as true or false. Many books that teach programming "from scratch" discuss possible operations on logical expressions familiar to every beginner. In this article, I won't be talking about the AND operator having higher precedence than OR. Instead, I will talk about common mistakes that programmers make in simple conditional expressions consisting of no more than three operators, and show how you can check your code using truth tables. Mistakes described here are the ones made by the developers of such well-known projects as FreeBSD, Microsoft ChakraCore, Mozilla Thunderbird, LibreOffice, and many others.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 𝟏)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐄𝐏𝐏 𝐂𝐮𝐫𝐫𝐢𝐜𝐮𝐥𝐮𝐦 𝐢𝐧 𝐭𝐡𝐞 𝐏𝐡𝐢𝐥𝐢𝐩𝐩𝐢𝐧𝐞𝐬:
- Understand the goals and objectives of the Edukasyong Pantahanan at Pangkabuhayan (EPP) curriculum, recognizing its importance in fostering practical life skills and values among students. Students will also be able to identify the key components and subjects covered, such as agriculture, home economics, industrial arts, and information and communication technology.
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐍𝐚𝐭𝐮𝐫𝐞 𝐚𝐧𝐝 𝐒𝐜𝐨𝐩𝐞 𝐨𝐟 𝐚𝐧 𝐄𝐧𝐭𝐫𝐞𝐩𝐫𝐞𝐧𝐞𝐮𝐫:
-Define entrepreneurship, distinguishing it from general business activities by emphasizing its focus on innovation, risk-taking, and value creation. Students will describe the characteristics and traits of successful entrepreneurs, including their roles and responsibilities, and discuss the broader economic and social impacts of entrepreneurial activities on both local and global scales.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
Communicating effectively and consistently with students can help them feel at ease during their learning experience and provide the instructor with a communication trail to track the course's progress. This workshop will take you through constructing an engaging course container to facilitate effective communication.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
1. A Framework for Learning Web
Wrappers from the Crowd
Valter Crescenzi, Paolo Merialdo, Disheng Qiu
Dipartimento di Ingegneria
Università degli Studi Roma Tre
Via della Vasca Navale, 79, Rome
disheng@dia.uniroma3.it
9. Scaling Wrapper Inference
Scaling the number of workers with Crowdsourcing platforms opens new
challenges:
Issues: Contributions:
2/15
10. Scaling Wrapper Inference
Scaling the number of workers with Crowdsourcing platforms opens new
challenges:
Issues: Contributions:
Non-expert
workers
• Simple interactions to reduce the
worker error rate
• Membership Query (yes/no answer)
2/15
11. Scaling Wrapper Inference
Scaling the number of workers with Crowdsourcing platforms opens new
challenges:
Issues: Contributions:
Non-expert
workers
• Simple interactions to reduce the
worker error rate
• Membership Query (yes/no answer)
• Active Learning to carefully select
queries
• Dynamic Expressiveness of the
inference language
Costs
2/15
12. Scaling Wrapper Inference
Scaling the number of workers with Crowdsourcing platforms opens new
challenges:
Issues: Contributions:
Non-expert
workers
• Simple interactions to reduce the
worker error rate
• Membership Query (yes/no answer)
• Active Learning to carefully select
queries
• Dynamic Expressiveness of the
inference language
Costs
2/15
Quality
• Bayesian Model to evaluate the
expected wrapper quality
• Sampling algorithms
13. ALFRED
ALFRED is a wrapper inference system supervised by workers from a
crowdsourcing platform.
Input annotated page (page0):
3/15
14. ALFRED
ALFRED is a wrapper inference system supervised by workers from a
crowdsourcing platform.
r1 = /html/table/tr[1]/td/text()
r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text()
....
Inference
algorithm!
Input annotated page (page0):
3/15
15. ALFRED
ALFRED is a wrapper inference system supervised by workers from a
crowdsourcing platform.
r1 = /html/table/tr[1]/td/text()
r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text()
....
Inference
algorithm!
page0
r1
r2
r3
Spirited Away
Spirited Away
Spirited Away
Input annotated page (page0):
3/15
16. ALFRED
ALFRED is a wrapper inference system supervised by workers from a
crowdsourcing platform.
r1 = /html/table/tr[1]/td/text()
r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text()
....
Inference
algorithm!
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
Input annotated page (page0):
3/15
17. ALFRED
ALFRED is a wrapper inference system supervised by workers from a
crowdsourcing platform.
r1 = /html/table/tr[1]/td/text()
r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text()
....
Inference
algorithm!
Input annotated page (page0):
Is this title the correct one?
3/15
18. ALFRED
ALFRED is a wrapper inference system supervised by workers from a
crowdsourcing platform.
r1 = /html/table/tr[1]/td/text()
r2 = //*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
r3 = //*[contains(.,”Director:”)]/../../tr[1]/td/text()
....
Inference
algorithm!
DB#Wrapper!
r1 = /html/table/tr[1]/td/text()
Input annotated page (page0):
Is this title the correct one?
3/15
19. Membership Query
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
4/15
Yes !
20. Membership Query
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
• Rules compatible with the answer more
likely to be correct (Bayesian Model)
For each new answer
4/15
Yes !
21. Membership Query
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
• Rules compatible with the answer more
likely to be correct (Bayesian Model)
For each new answer
• If no rule is good enough:
• a new query is selected (Active Learning)
4/15
Yes !
23. Bayesian Model
Training sequence
= {“Spirited Away” , “-” , “9.3” }
Yes No No
5/15
Lk
Lk
a rule r is correct:
none of the candidate rules is correct:
Probability that:
P(r|Lk
)
P(R|Lk
)
24. Bayesian update:
Bayesian Model
Training sequence
= {“Spirited Away” , “-” , “9.3” }
Yes No No
5/15
Lk
Lk
a rule r is correct:
none of the candidate rules is correct:
Probability that:
P(r|Lk
)
P(R|Lk
)
25. Bayesian update:
Bayesian Model
Training sequence
= {“Spirited Away” , “-” , “9.3” }
Yes No No
5/15
Lk
Lk
a rule r is correct:
none of the candidate rules is correct:
Probability that:
P(r|Lk
)
P(R|Lk
)
26. Active Learning
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
ALFRED actively selects the queries;
a good policy saves money
6/15
27. Active Learning
• Random (baseline)
Values are randomly selected
• Entropy
Values are selected by maximizing the Entropy (most uncertain value)
• Greedy
Values are selected by minimizing the queries to confirm the most likely rule
• Lucky
Hybrid approach, it starts with an Entropy algorithm and then switch to Greedy to
confirm the best rule
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
ALFRED actively selects the queries;
a good policy saves money
6/15
28. Expressiveness
The candidate rules are generated observing the first annotated page
Should we use all the XPath expressiveness or just a fragment?
7/15
Expressiveness of the fragment Number of candidate rules
29. Expressiveness
Pool of candidate rules organized in fragments:
The candidate rules are generated observing the first annotated page
Should we use all the XPath expressiveness or just a fragment?
7/15
Expressiveness of the fragment Number of candidate rules
30. Expressiveness
Pool of candidate rules organized in fragments:
/html/table/tr[1]/td/text() Absolute Rules (complete path from root)
The candidate rules are generated observing the first annotated page
Should we use all the XPath expressiveness or just a fragment?
7/15
Expressiveness of the fragment Number of candidate rules
31. Expressiveness
Pool of candidate rules organized in fragments:
/html/table/tr[1]/td/text() Absolute Rules (complete path from root)
//*[contains(.,”Spirited Away”)]/text()
//*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
//*[contains(.,”Director:”)]/../../tr[1]/td/text()
Relative Rules (path from a textual node)
The candidate rules are generated observing the first annotated page
Should we use all the XPath expressiveness or just a fragment?
7/15
Expressiveness of the fragment Number of candidate rules
32. Expressiveness
Pool of candidate rules organized in fragments:
/html/table/tr[1]/td/text() Absolute Rules (complete path from root)
//*[contains(.,”Spirited Away”)]/text()
//*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
//*[contains(.,”Director:”)]/../../tr[1]/td/text()
Relative Rules (path from a textual node)
The candidate rules are generated observing the first annotated page
.... other XPaths
Should we use all the XPath expressiveness or just a fragment?
7/15
Expressiveness of the fragment Number of candidate rules
34. Expressiveness
• The fragment is just expressive enough:
the correct rule can be generated.
• Few queries are needed to find it
/html/table/tr[1]/td/text()
/html/table/tr[1]/td/text()
//*[contains(.,”Spirited Away”)]/text()
//*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
//*[contains(.,”Director:”)]/../../tr[1]/td/text()
Correct (absolute) rule:
/html/table/tr[1]/td/text()
• The fragment is too expressive:
the correct rule can be generated
• But many MQ are needed to find it
8/15
35. Expressiveness
• The fragment is just expressive enough:
the correct rule can be generated.
• Few queries are needed to find it
/html/table/tr[1]/td/text()
/html/table/tr[1]/td/text()
//*[contains(.,”Spirited Away”)]/text()
//*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
//*[contains(.,”Director:”)]/../../tr[1]/td/text()
Correct (absolute) rule:
/html/table/tr[1]/td/text()
• The fragment is too expressive:
the correct rule can be generated
• But many MQ are needed to find it
8/15
State-of-the-art approaches fall in the first case !
They statically define the expressiveness of the XPath fragment
36. Expressiveness
• The fragment is just expressive enough:
the correct rule can be generated.
• Few queries are needed to find it
/html/table/tr[1]/td/text()
/html/table/tr[1]/td/text()
//*[contains(.,”Spirited Away”)]/text()
//*[contains(.,”Ratings:”)]/../../tr[1]/td/text()
//*[contains(.,”Director:”)]/../../tr[1]/td/text()
Correct (absolute) rule:
/html/table/tr[1]/td/text()
• The fragment is too expressive:
the correct rule can be generated
• But many MQ are needed to find it
8/15
State-of-the-art approaches fall in the first case !
They statically define the expressiveness of the XPath fragment
37. R0 : Absolute Rules
R1 : R0 + Relative Rules
.....
Expressiveness
5%
70%
25%
We defined simple XPath fragments.
Empirically observed: too expressive fragments are not actually needed.
9/15
38. Rules are organized in a Hierarchy of Fragments with increasing expressiveness
R0 : Absolute Rules
R1 : R0 + Relative Rules
.....
Expressiveness
5%
70%
25%
We defined simple XPath fragments.
Empirically observed: too expressive fragments are not actually needed.
9/15
39. Rules are organized in a Hierarchy of Fragments with increasing expressiveness
R0 : Absolute Rules
R1 : R0 + Relative Rules
.....
Inspired by Structural Risk Minimization (SRM)*:
a Machine Learning technique to address overfitting
*Details: Shawe-Taylor et all - IEEE Transactions on Information Theory, 44(5):1926–1940, 1998
Expressiveness
5%
70%
25%
We defined simple XPath fragments.
Empirically observed: too expressive fragments are not actually needed.
9/15
49. Results: Dynamic Expressiveness
Strategy #MQ (SRM off) #MQ (SRM on) % MQ saved P (SRM on) R (SRM on)
RANDOM 379 190 50% 0,998 0,977
GREEDY 398 169 58% 0,998 0,983
LUCKY 196 132 33% 0,996 0,995
ENTROPY 205 116 44% 0,998 0,99
Dynamic Expressiveness saves a lot of queries
12/15
50. Results: Dynamic Expressiveness
Strategy #MQ (SRM off) #MQ (SRM on) % MQ saved P (SRM on) R (SRM on)
RANDOM 379 190 50% 0,998 0,977
GREEDY 398 169 58% 0,998 0,983
LUCKY 196 132 33% 0,996 0,995
ENTROPY 205 116 44% 0,998 0,99
Dynamic Expressiveness saves a lot of queries
Small quality loss:
The expressiveness is not expanded when it is needed
12/15
53. Results: Dynamic Expressiveness
Static Expressiveness Dynamic Expressiveness
“Simple” attributes: complex algorithms are not needed
“Complex” attributes: Entropy, Lucky and Dynamic Expressiveness saves
a lot of queries
# candidate rules # candidate rules
13/15
54. Future development
Noisy Crowds: workers mistakes vs task redundancy*
How to evaluate the accuracy of the worker?
Another query or another worker?
Same learning framework, different problems: NLP, Crawling
14/15
*Demo
Title: ALFRED: Crowd Assisted Data Extraction
When: Tomorrow 17h
Where: Imperial Room
57. ... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
58. ... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
59. ... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
Inference
algorithm!
60. ... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
Inference
algorithm!
61. ... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
Inference
algorithm!
62. ... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
Inference
algorithm!
63. ... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
Wrapper!
Inference
algorithm!
64. ... selecting the right sample set is crucial
Sampling & Quality
2M pages from IMDB, we have to work with a sample set but ....
Wrapper!
Inference
algorithm!
DB#
... Not all pages look like the pages about famous movies
66. Sampling & Quality
page0
r1
r2
r3
Spirited Away
Spirited Away
Spirited Away
r1 = r2 = r3
page0 page1
r1
r2
r3
Spirited Away City of God
Spirited Away -
Spirited Away City of God
r1 = r3 != r2
67. Sampling & Quality
page0
r1
r2
r3
Spirited Away
Spirited Away
Spirited Away
r1 = r2 = r3
page0 page1
r1
r2
r3
Spirited Away City of God
Spirited Away -
Spirited Away City of God
r1 = r3 != r2
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
r1 != r3 != r2
68. Sampling & Quality
page0
r1
r2
r3
Spirited Away
Spirited Away
Spirited Away
r1 = r2 = r3
page0 page1
r1
r2
r3
Spirited Away City of God
Spirited Away -
Spirited Away City of God
r1 = r3 != r2
page0 page1 page2
r1
r2
r3
Spirited Away City of God Howl’s Moving Castle
Spirited Away - 9.3
Spirited Away City of God null
r1 != r3 != r2
Pages make apparent the
differences among the rules
Find a small set that makes apparent
the same differences observed in the
whole set of pages*
69. Sampling & Quality
The problem.
Find the smallest set that makes apparent the differences among the rules:
(e.g., 100 pages that make apparent the same differences that we would observe in 2M pages).
It is a NP-Hard problem !! Reduction to SET-Cover problem:
Find the smallest set of pages that cover all the group of rules (group = equivalent rules).
The smallest set is not needed:
A greedy algorithm O(|Pages|) in time and O(1) in space works very well in practice.
70. XPath rules
For every page p:
if (p makes apparent new differences)
representative pages += p
An offline algorithm that can be easily parallelized
Sampling & Quality
71. Results: Sampling
Three sample sets:
• Biased
Pages collected by crawling the website
• Random
Pages randomly picked from the whole set of pages
• Representative
Pages collected by our sampling algorithm
72. Results: Sampling
Entity Sampling |Pages| P R
Movies
Biased 250 0.98 0.71
Movies Random 250 0.99 0.99Movies
Representative 42 1.00 1.00
Actors
Biased 250 1.00 1.00
Actors Random 250 1.00 0.96Actors
Representative 30 1.00 1.00
Stocks
Biased 86 1.00 0.98
Stocks Random 86 1.00 0.99Stocks
Representative 15 1.00 1.00
Albums
Biased 258 1.00 0.99
Albums Random 258 1.00 1.00Albums
Representative 59 1.00 1.00
Bands
Biased 289 1.00 0.68
Bands Random 289 1.00 1.00Bands
Representative 36 1.00 1.00
73. Results: Sampling
Entity Sampling |Pages| P R
Movies
Biased 250 0.98 0.71
Movies Random 250 0.99 0.99Movies
Representative 42 1.00 1.00
Actors
Biased 250 1.00 1.00
Actors Random 250 1.00 0.96Actors
Representative 30 1.00 1.00
Stocks
Biased 86 1.00 0.98
Stocks Random 86 1.00 0.99Stocks
Representative 15 1.00 1.00
Albums
Biased 258 1.00 0.99
Albums Random 258 1.00 1.00Albums
Representative 59 1.00 1.00
Bands
Biased 289 1.00 0.68
Bands Random 289 1.00 1.00Bands
Representative 36 1.00 1.00
Representative perfect
74. Results: Sampling
Entity Sampling |Pages| P R
Movies
Biased 250 0.98 0.71
Movies Random 250 0.99 0.99Movies
Representative 42 1.00 1.00
Actors
Biased 250 1.00 1.00
Actors Random 250 1.00 0.96Actors
Representative 30 1.00 1.00
Stocks
Biased 86 1.00 0.98
Stocks Random 86 1.00 0.99Stocks
Representative 15 1.00 1.00
Albums
Biased 258 1.00 0.99
Albums Random 258 1.00 1.00Albums
Representative 59 1.00 1.00
Bands
Biased 289 1.00 0.68
Bands Random 289 1.00 1.00Bands
Representative 36 1.00 1.00
75. Results: Sampling
Entity Sampling |Pages| P R
Movies
Biased 250 0.98 0.71
Movies Random 250 0.99 0.99Movies
Representative 42 1.00 1.00
Actors
Biased 250 1.00 1.00
Actors Random 250 1.00 0.96Actors
Representative 30 1.00 1.00
Stocks
Biased 86 1.00 0.98
Stocks Random 86 1.00 0.99Stocks
Representative 15 1.00 1.00
Albums
Biased 258 1.00 0.99
Albums Random 258 1.00 1.00Albums
Representative 59 1.00 1.00
Bands
Biased 289 1.00 0.68
Bands Random 289 1.00 1.00Bands
Representative 36 1.00 1.00
Biased: recall loss
76. Results: Sampling
Entity Sampling |Pages| P R
Movies
Biased 250 0.98 0.71
Movies Random 250 0.99 0.99Movies
Representative 42 1.00 1.00
Actors
Biased 250 1.00 1.00
Actors Random 250 1.00 0.96Actors
Representative 30 1.00 1.00
Stocks
Biased 86 1.00 0.98
Stocks Random 86 1.00 0.99Stocks
Representative 15 1.00 1.00
Albums
Biased 258 1.00 0.99
Albums Random 258 1.00 1.00Albums
Representative 59 1.00 1.00
Bands
Biased 289 1.00 0.68
Bands Random 289 1.00 1.00Bands
Representative 36 1.00 1.00
77. Results: Sampling
Entity Sampling |Pages| P R
Movies
Biased 250 0.98 0.71
Movies Random 250 0.99 0.99Movies
Representative 42 1.00 1.00
Actors
Biased 250 1.00 1.00
Actors Random 250 1.00 0.96Actors
Representative 30 1.00 1.00
Stocks
Biased 86 1.00 0.98
Stocks Random 86 1.00 0.99Stocks
Representative 15 1.00 1.00
Albums
Biased 258 1.00 0.99
Albums Random 258 1.00 1.00Albums
Representative 59 1.00 1.00
Bands
Biased 289 1.00 0.68
Bands Random 289 1.00 1.00Bands
Representative 36 1.00 1.00
Random:
better than biased
78. State of Art
• 2006 - Interactive wrapper generation with minimal user effort.
U. Irmik et al. WWW
• 2006 - Active learning with multiple views.
I. Muslea et al. JAIR
Supervised
Wrapper Induction
79. State of Art
• 2008 - Wrapper inference for ambiguous web pages.
C. Valter and P. Merialdo JAAI
• 2005 - Web Data Extraction Based on Partial Tree Alignment
Yanhong Zhai WWW.
Unsupervised
Wrapper Induction
80. State of Art
• 2012 - D.I.A.D.E.M.
J. Furche and G. Gottlob WWW
• 2011 - Automatic wrappers for large scale web extraction.
N.N. Dalvi et al. VLDB.
Automatic Annotators