This document describes several web scraping functions:
1. A web scraper function that extracts data from an HTML document as a character object based on CSS selectors.
2. A function to check if an object is a valid HTML element.
3. A tag counter function that counts the number of tags in a string.
4. Closing tag locator functions that find the closing tags for given opening tags.
5. A content extractor function that extracts HTML elements based on tag positions and can remove tags or construct a data frame.
6. A table constructor function that creates a data frame from an HTML table.
7. A content remover function that removes unwanted content from extractor
The document provides an overview of the Groovy programming language. It discusses topics covered in the tutorial such as language basics, closures, builders, data access, testing with Groovy, and integration with Grails. The document also shows examples of string handling and processing in Groovy.
The document provides information about data type conversion and multi-dimensional arrays in JavaScript. It explains that strings returned by the prompt() function need to be converted to numbers using parseInt() or parseFloat() before performing mathematical operations. This is demonstrated through an example that incorrectly adds two numbers due to their string data type. The document then introduces multi-dimensional arrays as a way to store related data in groups or sub-arrays, like employee records with name, age, address fields. It provides examples of declaring and accessing elements in 1D, 2D, 3D and higher dimensional arrays.
The Ring programming language version 1.6 book - Part 38 of 189Mahmoud Samir Fayed
This document summarizes the key classes in the Ring programming language including the String, List, Stack, Queue, HashTable, Tree, and Math classes. For each class, it provides an overview of the parent class and core methods, followed by examples of using each method on instances of the classes. It demonstrates initializing class instances, accessing and modifying values, and performing common operations like adding/removing items, sorting, and mathematical functions.
This code is merging employee data from two files into matched and unmatched tables based on social security number. It loads configuration settings from an XML file to get the file paths. It then reads the CSV and fixed width files, compares the records to find matches, and writes the results to two output files - one for matched records and one for unmatched records. Logging statements are added to a list to be written to a log file.
The presentation from SPb Python Interest Group community meetup.
The presentation tells about the dictionaries in Python, reviews the implementation of dictionary in CPython 2.x, dictionary in CPython 3.x, and also recent changes in CPython 3.6. In addition to CPython the dictionaries in alternative Python implementations such as PyPy, IronPython and Jython are reviewed.
This document provides a summary of key functions and commands in the R programming language for getting help, inputting and outputting data, creating and manipulating data, selecting and extracting data, performing mathematical operations, working with dates and times, plotting graphs, and more. It includes brief explanations and examples of commonly used functions like read.table(), plot(), hist(), summary(), str(), and others.
Working with Databases and Groovy
This document discusses how to work with databases using Groovy. It covers:
- Connecting to databases and using groovy.sql.Sql to execute queries and statements
- Performing reads and writes like inserting, updating, deleting rows
- Calling stored procedures
- Advanced techniques like transactions, batching, pagination
- Using groovy.sql.DataSet to treat tables as collections
It also briefly introduces using MongoDB and Neo4j with Groovy.
The Ring programming language version 1.5.1 book - Part 75 of 180Mahmoud Samir Fayed
The document describes the Trace library in Ring which provides functions for tracing code execution. It defines trace events and data and functions like TraceLib_AllEvents that prints trace information and TraceLib_Debugger that enables breaking at errors for debugging. The library allows setting breakpoints and provides an interactive debugger command line for inspecting variables and executing code while debugging.
The document provides an overview of the Groovy programming language. It discusses topics covered in the tutorial such as language basics, closures, builders, data access, testing with Groovy, and integration with Grails. The document also shows examples of string handling and processing in Groovy.
The document provides information about data type conversion and multi-dimensional arrays in JavaScript. It explains that strings returned by the prompt() function need to be converted to numbers using parseInt() or parseFloat() before performing mathematical operations. This is demonstrated through an example that incorrectly adds two numbers due to their string data type. The document then introduces multi-dimensional arrays as a way to store related data in groups or sub-arrays, like employee records with name, age, address fields. It provides examples of declaring and accessing elements in 1D, 2D, 3D and higher dimensional arrays.
The Ring programming language version 1.6 book - Part 38 of 189Mahmoud Samir Fayed
This document summarizes the key classes in the Ring programming language including the String, List, Stack, Queue, HashTable, Tree, and Math classes. For each class, it provides an overview of the parent class and core methods, followed by examples of using each method on instances of the classes. It demonstrates initializing class instances, accessing and modifying values, and performing common operations like adding/removing items, sorting, and mathematical functions.
This code is merging employee data from two files into matched and unmatched tables based on social security number. It loads configuration settings from an XML file to get the file paths. It then reads the CSV and fixed width files, compares the records to find matches, and writes the results to two output files - one for matched records and one for unmatched records. Logging statements are added to a list to be written to a log file.
The presentation from SPb Python Interest Group community meetup.
The presentation tells about the dictionaries in Python, reviews the implementation of dictionary in CPython 2.x, dictionary in CPython 3.x, and also recent changes in CPython 3.6. In addition to CPython the dictionaries in alternative Python implementations such as PyPy, IronPython and Jython are reviewed.
This document provides a summary of key functions and commands in the R programming language for getting help, inputting and outputting data, creating and manipulating data, selecting and extracting data, performing mathematical operations, working with dates and times, plotting graphs, and more. It includes brief explanations and examples of commonly used functions like read.table(), plot(), hist(), summary(), str(), and others.
Working with Databases and Groovy
This document discusses how to work with databases using Groovy. It covers:
- Connecting to databases and using groovy.sql.Sql to execute queries and statements
- Performing reads and writes like inserting, updating, deleting rows
- Calling stored procedures
- Advanced techniques like transactions, batching, pagination
- Using groovy.sql.DataSet to treat tables as collections
It also briefly introduces using MongoDB and Neo4j with Groovy.
The Ring programming language version 1.5.1 book - Part 75 of 180Mahmoud Samir Fayed
The document describes the Trace library in Ring which provides functions for tracing code execution. It defines trace events and data and functions like TraceLib_AllEvents that prints trace information and TraceLib_Debugger that enables breaking at errors for debugging. The library allows setting breakpoints and provides an interactive debugger command line for inspecting variables and executing code while debugging.
This document discusses concurrency features in Groovy and GPars. It highlights useful Groovy features for concurrency like closures, immutable collections, and annotation support. It also discusses common concurrency libraries and tools that can be used with Groovy like GPars, Google Collections, and Actors. The document provides examples of how Groovy improves on Java for concurrency tasks through features like closures, immutable types, and domain specific languages.
This document discusses arrays in Java. It explains that arrays are objects that hold a collection of variables of the same type. It covers how to declare and initialize arrays, including one-dimensional, multi-dimensional, and jagged arrays. The document also discusses various array operations like length, for-each loops, searching, and more. Examples are provided to demonstrate array concepts.
Martin Fowler's Refactoring Techniques Quick ReferenceSeung-Bum Lee
The document discusses various refactoring techniques for restructuring code to improve design, readability, and extensibility without changing external behavior. It provides examples of techniques like extracting methods, replacing temporary variables, simplifying conditional expressions, and dealing with generalization through inheritance/polymorphism. The techniques are organized into categories like composing methods, organizing data, simplifying method calls, and dealing with generalization.
Scala er et Java-relateret, statisk typet programmeringssprog i hastig fremmarch. Sproget kombinerer aspekter fra objekt- og funktionsorienterede sprog og fokuserer på skalerbarhed og effektivitet, både på det kodemæssige og afviklingsmæssige niveau. Syntaksen er elegant og koncis. Samtidig indeholder sproget stærke konstruktioner til understøttelse af parallelle applikationer, der udnytter fremtidens hardwarearkitekturer.
The document discusses regular expressions (regex) in Python. It provides examples of using regex to search for patterns in strings, extract matches, and find and group substrings. Key concepts covered include regex syntax like anchors, character classes, repetition, capturing groups, greedy/non-greedy matching, and the re module's functions like search, findall, finditer, and sub. Real-world applications mentioned include validating formats like IP addresses and parsing structured data.
This document provides an overview of Groovy's collection API. It discusses how Groovy treats many objects like collections, including strings, numbers, and regular expressions. It demonstrates various collection notation and operations, including lists, maps, ranges, and spread operators. It also summarizes common collection methods like each, find, collect, reducers, and useful utility methods like groupBy, countBy, and set operations.
MCE^3 - Hannes Verlinde - Let The Symbols Do The WorkPROIDEA
Syntactic symbol manipulation may be the universal way of deriving new knowledge in science and engineering, but the technique is still rarely used in the act of writing software. We will explore this alternate way of reasoning about code, while demonstrating the power of formal refactoring and its potential for automation.
Kotlin Advanced - Apalon Kotlin Sprint Part 3Kirill Rozov
The document discusses several Kotlin advanced topics including delegated properties, lazy properties, objects, higher-order functions, lambdas, inline functions, and standard library functions like apply, also, let. It explains concepts like lazy initialization with lazy properties, property delegation, object expressions and declarations, functional programming with higher-order functions and lambdas, and inline functions for performance. It also covers utility functions in the standard library for working with objects and collections.
The Ring programming language version 1.5.4 book - Part 36 of 185Mahmoud Samir Fayed
This document provides documentation on Ring programming language classes and methods, including List, Stack, Queue, HashTable, Tree, Math, and DateTime classes. It describes the purpose and usage of each class and its methods, and provides examples of how to use the classes and methods.
The Ring programming language version 1.5.2 book - Part 35 of 181Mahmoud Samir Fayed
This document summarizes the key classes and methods in the Ring programming language documentation. It describes classes for strings, lists, stacks, queues, hash tables, trees and math functions. For each class it lists parent classes and example methods with brief descriptions of functionality. An example usage section demonstrates the methods on various classes.
Scala er et Java-relateret, statisk typet programmeringssprog i hastig fremmarch. Sproget kombinerer aspekter fra objekt- og funktionsorienterede sprog og fokuserer på skalerbarhed og effektivitet, både på det kodemæssige og afviklingsmæssige niveau. Syntaksen er elegant og koncis. Samtidig indeholder sproget stærke konstruktioner til understøttelse af parallelle applikationer, der udnytter fremtidens hardwarearkitekturer.
Java som sprog har ikke bevæget sig meget de seneste år. Vi har stadig ikke closures eller funktionelle aspekter som f.eks. C# har haft siden version 3. Er Scala svaret på enhver Javaudviklers bønner eller er sproget kun interessant for tågehoveder som mig, som begynder at synes bedre og bedre om funktionsorientering? Er den store portion syntaktisk sukker, Scala bringer på bordet, bare tomme kalorier?
The document provides examples of refactoring techniques including Extract Method, Introduce Explaining Variable, Replace Temp with Query, Substitute Algorithm, and Extract Class. Extract Method breaks a long or complex method into smaller methods focused on specific tasks. Introduce Explaining Variable uses variables to make complex expressions more readable. Replace Temp with Query replaces temporary variables with query methods to avoid long methods. Substitute Algorithm replaces an complex algorithm with a simpler alternative. Extract Class extracts part of a class into its own class to separate responsibilities.
Indexing and Query Optimizer (Aaron Staple)MongoSF
This document discusses MongoDB indexing and query optimization. It defines what indexes are, how they are stored and used to improve query performance. It provides examples of different types of queries and whether they can utilize indexes, including compound, geospatial and regular expression indexes. It also covers index creation, maintenance and limitations.
Grails GORM - You Know SQL. You Know Queries. Here's GORM.Ted Vinke
This presentation shows practical basics of how Grails Object Relational Mapping (GORM) can help you query data, test it, and think in domain terms along the way when SQL at the moment is all you know.
The Ring programming language version 1.6 book - Part 27 of 189Mahmoud Samir Fayed
The document describes various file handling functions in Ring programming language. It explains functions to read and write files, get directory listings, rename and delete files, open/close files, seek to positions in files, generate temporary files and names, check for end of file and errors, and more. Examples are provided to demonstrate the usage of each file function.
This document discusses using Kotlin for test suites and provides examples of writing unit tests in Kotlin compared to Java. It shows how Kotlin allows for cleaner, more readable tests through features like lambdas, extension functions, and DSL capabilities. Specifically, it provides an example of refactoring a test for a task presenter to initialize views into a clearer specification using the given-when-then syntax supported by testing frameworks like Spek for Kotlin. The refactored test checks that when a task with no response is assigned to the current user, the response button is enabled upon initializing the presenter's views.
The document discusses various primitive data types including numeric, boolean, character, and string types. It describes integers, floating point numbers, complex numbers, decimals, booleans, characters, and strings. It also covers array types like static, dynamic, and associative arrays. Other topics include records, slices, and unions.
The document discusses various primitive data types including numeric, boolean, character, and string types. It describes integer types like byte, short, int that can store negative numbers using two's complement. Floating point types are represented as fractions and exponents. Boolean types are either true or false. Character types are stored as numeric codes. String types can have static, limited dynamic, or fully dynamic lengths. User-defined types like enumerations and subranges are also covered. The document also discusses array types including their initialization, operations, and implementation using row-major and column-major ordering. Associative arrays are described as unordered collections indexed by keys. Record and union types are summarized.
The Ring programming language version 1.8 book - Part 30 of 202Mahmoud Samir Fayed
The document describes various functions for working with files in Ring programming language. Some of the key functions covered include Read() and Write() for reading from and writing to files, Dir() for getting directory listings, Rename() and Remove() for renaming and deleting files, Fopen() and Fclose() for opening and closing file handles, and Fseek(), Ftell(), Rewind() for manipulating the file position indicator. Functions for error handling like Feof(), Ferror(), Clearerr() and temporary files management are also discussed.
This document discusses key concepts related to files in R including file names, formats, paths, encodings, and types. It describes text files as human-readable files organized in lines with different extensions for different programs. Binary files contain machine-readable 1s and 0s. Paths locate files in a directory hierarchy using components like parent directories denoted by "..". Common encodings include ASCII for English and UTF-8 for multiple languages. R supports text, binary, and delimited files like CSVs that separate values with commas.
This document discusses concurrency features in Groovy and GPars. It highlights useful Groovy features for concurrency like closures, immutable collections, and annotation support. It also discusses common concurrency libraries and tools that can be used with Groovy like GPars, Google Collections, and Actors. The document provides examples of how Groovy improves on Java for concurrency tasks through features like closures, immutable types, and domain specific languages.
This document discusses arrays in Java. It explains that arrays are objects that hold a collection of variables of the same type. It covers how to declare and initialize arrays, including one-dimensional, multi-dimensional, and jagged arrays. The document also discusses various array operations like length, for-each loops, searching, and more. Examples are provided to demonstrate array concepts.
Martin Fowler's Refactoring Techniques Quick ReferenceSeung-Bum Lee
The document discusses various refactoring techniques for restructuring code to improve design, readability, and extensibility without changing external behavior. It provides examples of techniques like extracting methods, replacing temporary variables, simplifying conditional expressions, and dealing with generalization through inheritance/polymorphism. The techniques are organized into categories like composing methods, organizing data, simplifying method calls, and dealing with generalization.
Scala er et Java-relateret, statisk typet programmeringssprog i hastig fremmarch. Sproget kombinerer aspekter fra objekt- og funktionsorienterede sprog og fokuserer på skalerbarhed og effektivitet, både på det kodemæssige og afviklingsmæssige niveau. Syntaksen er elegant og koncis. Samtidig indeholder sproget stærke konstruktioner til understøttelse af parallelle applikationer, der udnytter fremtidens hardwarearkitekturer.
The document discusses regular expressions (regex) in Python. It provides examples of using regex to search for patterns in strings, extract matches, and find and group substrings. Key concepts covered include regex syntax like anchors, character classes, repetition, capturing groups, greedy/non-greedy matching, and the re module's functions like search, findall, finditer, and sub. Real-world applications mentioned include validating formats like IP addresses and parsing structured data.
This document provides an overview of Groovy's collection API. It discusses how Groovy treats many objects like collections, including strings, numbers, and regular expressions. It demonstrates various collection notation and operations, including lists, maps, ranges, and spread operators. It also summarizes common collection methods like each, find, collect, reducers, and useful utility methods like groupBy, countBy, and set operations.
MCE^3 - Hannes Verlinde - Let The Symbols Do The WorkPROIDEA
Syntactic symbol manipulation may be the universal way of deriving new knowledge in science and engineering, but the technique is still rarely used in the act of writing software. We will explore this alternate way of reasoning about code, while demonstrating the power of formal refactoring and its potential for automation.
Kotlin Advanced - Apalon Kotlin Sprint Part 3Kirill Rozov
The document discusses several Kotlin advanced topics including delegated properties, lazy properties, objects, higher-order functions, lambdas, inline functions, and standard library functions like apply, also, let. It explains concepts like lazy initialization with lazy properties, property delegation, object expressions and declarations, functional programming with higher-order functions and lambdas, and inline functions for performance. It also covers utility functions in the standard library for working with objects and collections.
The Ring programming language version 1.5.4 book - Part 36 of 185Mahmoud Samir Fayed
This document provides documentation on Ring programming language classes and methods, including List, Stack, Queue, HashTable, Tree, Math, and DateTime classes. It describes the purpose and usage of each class and its methods, and provides examples of how to use the classes and methods.
The Ring programming language version 1.5.2 book - Part 35 of 181Mahmoud Samir Fayed
This document summarizes the key classes and methods in the Ring programming language documentation. It describes classes for strings, lists, stacks, queues, hash tables, trees and math functions. For each class it lists parent classes and example methods with brief descriptions of functionality. An example usage section demonstrates the methods on various classes.
Scala er et Java-relateret, statisk typet programmeringssprog i hastig fremmarch. Sproget kombinerer aspekter fra objekt- og funktionsorienterede sprog og fokuserer på skalerbarhed og effektivitet, både på det kodemæssige og afviklingsmæssige niveau. Syntaksen er elegant og koncis. Samtidig indeholder sproget stærke konstruktioner til understøttelse af parallelle applikationer, der udnytter fremtidens hardwarearkitekturer.
Java som sprog har ikke bevæget sig meget de seneste år. Vi har stadig ikke closures eller funktionelle aspekter som f.eks. C# har haft siden version 3. Er Scala svaret på enhver Javaudviklers bønner eller er sproget kun interessant for tågehoveder som mig, som begynder at synes bedre og bedre om funktionsorientering? Er den store portion syntaktisk sukker, Scala bringer på bordet, bare tomme kalorier?
The document provides examples of refactoring techniques including Extract Method, Introduce Explaining Variable, Replace Temp with Query, Substitute Algorithm, and Extract Class. Extract Method breaks a long or complex method into smaller methods focused on specific tasks. Introduce Explaining Variable uses variables to make complex expressions more readable. Replace Temp with Query replaces temporary variables with query methods to avoid long methods. Substitute Algorithm replaces an complex algorithm with a simpler alternative. Extract Class extracts part of a class into its own class to separate responsibilities.
Indexing and Query Optimizer (Aaron Staple)MongoSF
This document discusses MongoDB indexing and query optimization. It defines what indexes are, how they are stored and used to improve query performance. It provides examples of different types of queries and whether they can utilize indexes, including compound, geospatial and regular expression indexes. It also covers index creation, maintenance and limitations.
Grails GORM - You Know SQL. You Know Queries. Here's GORM.Ted Vinke
This presentation shows practical basics of how Grails Object Relational Mapping (GORM) can help you query data, test it, and think in domain terms along the way when SQL at the moment is all you know.
The Ring programming language version 1.6 book - Part 27 of 189Mahmoud Samir Fayed
The document describes various file handling functions in Ring programming language. It explains functions to read and write files, get directory listings, rename and delete files, open/close files, seek to positions in files, generate temporary files and names, check for end of file and errors, and more. Examples are provided to demonstrate the usage of each file function.
This document discusses using Kotlin for test suites and provides examples of writing unit tests in Kotlin compared to Java. It shows how Kotlin allows for cleaner, more readable tests through features like lambdas, extension functions, and DSL capabilities. Specifically, it provides an example of refactoring a test for a task presenter to initialize views into a clearer specification using the given-when-then syntax supported by testing frameworks like Spek for Kotlin. The refactored test checks that when a task with no response is assigned to the current user, the response button is enabled upon initializing the presenter's views.
The document discusses various primitive data types including numeric, boolean, character, and string types. It describes integers, floating point numbers, complex numbers, decimals, booleans, characters, and strings. It also covers array types like static, dynamic, and associative arrays. Other topics include records, slices, and unions.
The document discusses various primitive data types including numeric, boolean, character, and string types. It describes integer types like byte, short, int that can store negative numbers using two's complement. Floating point types are represented as fractions and exponents. Boolean types are either true or false. Character types are stored as numeric codes. String types can have static, limited dynamic, or fully dynamic lengths. User-defined types like enumerations and subranges are also covered. The document also discusses array types including their initialization, operations, and implementation using row-major and column-major ordering. Associative arrays are described as unordered collections indexed by keys. Record and union types are summarized.
The Ring programming language version 1.8 book - Part 30 of 202Mahmoud Samir Fayed
The document describes various functions for working with files in Ring programming language. Some of the key functions covered include Read() and Write() for reading from and writing to files, Dir() for getting directory listings, Rename() and Remove() for renaming and deleting files, Fopen() and Fclose() for opening and closing file handles, and Fseek(), Ftell(), Rewind() for manipulating the file position indicator. Functions for error handling like Feof(), Ferror(), Clearerr() and temporary files management are also discussed.
This document discusses key concepts related to files in R including file names, formats, paths, encodings, and types. It describes text files as human-readable files organized in lines with different extensions for different programs. Binary files contain machine-readable 1s and 0s. Paths locate files in a directory hierarchy using components like parent directories denoted by "..". Common encodings include ASCII for English and UTF-8 for multiple languages. R supports text, binary, and delimited files like CSVs that separate values with commas.
This document discusses R data types and objects. It covers the basic data types in R: logical, integer, real/double, string/character, complex, and raw. The most common data structures are vectors, matrices, arrays, data frames, and lists. Vectors can be atomic, containing one data type, or generic lists, containing multiple data types. The document demonstrates how to create vectors using the c() function or colon operator, and how to name vectors by assigning them to an object. It also discusses the basic properties of vectors like their type, length, dimensions, and classes.
This document is a master's thesis that investigates the effect of sustainability certification on financial performance. It contains an introduction, literature review, hypotheses, methodology, data analysis and results, discussion, and references. The introduction provides background on sustainability and certification, identifies gaps in previous research, and establishes the problem statement and research questions. The literature review examines definitions of sustainability, forms of certification, and the relationship between certification and financial performance to develop hypotheses. The methodology describes the research design, sample, variables, and analysis strategy. The results chapter presents the data collection, statistical model, regression outputs, and descriptive analysis. Finally, the discussion provides conclusions, discusses limitations, and suggests areas for future research.
An infrared remote control is used to control the speed of an induction motor in 8 steps. A microcontroller reads coded data from the remote control and activates output pins to change the firing time of thyristors, which drives the fan motor. The microcontroller receives signals from IR sensors connected to the remote and controls the system. A regulated power supply provides power and a transformer steps down the voltage.
Summer Jones created a horror film and sought feedback from her target audience of teenagers and young adults. Niamh, age 16, said the lighting created a sense of mystery and the quick cuts at the beginning caught her attention. Luke, age 19, also said the lighting and titles identified the horror genre but felt the quick cuts lasted too long at the beginning before the storyline started. Based on this feedback, Summer Jones agrees the lighting effectively set the horror tone but thinks removing some quick cuts and adding longer narrative shots would improve audience engagement.
Este documento presenta los pasos para desarrollar un proyecto cívico. Explica que primero se debe identificar y analizar un problema público de la comunidad. Luego, se diseña, ejecuta y evalúa un plan para abordar ese problema, el cual incluye recolectar información, elaborar un dossier de trabajo y reflexionar sobre los resultados. Finalmente, detalla los componentes clave que debe contener el proyecto como el nombre, los objetivos, las actividades planificadas y el cronograma.
La conceptualización del «talento» así como la prelación de las «competencias» del personal no son cuestiones meramente académicas. Los errores de concepto en tales aspectos tienen determinante impacto negativo sobre las decisiones de selección, evaluación, retención y desarrollo del personal clave de las empresas y, por ende, sobre la actuación y los resultados terminantes de esta.
En esta publicación se esclarecen dichos conceptos y se definen con nitidez las prioridades de calificación demandadas por el trabajo en el escenario de cambios e incertidumbre que vivimos.
El documento discute cuatro cuestiones clave para determinar si los incentivos deberían ser individuales o colectivos: 1) Si los objetivos y resultados pueden medirse individualmente, los incentivos también pueden ser individuales, de lo contrario deben ser colectivos; 2) Los incentivos colectivos fomentan la colaboración cuando los resultados dependen de ella; 3) La percepción de equidad es importante para evitar que algunos se sientan injustamente recompensados; 4) El tamaño del grupo también es relevante, los incentivos colectivos funcionan mejor
Gerencia y administracion de salud, catedraJorge Amarante
Este documento trata sobre la gerencia y administración de la salud. Explica que la administración de la salud se refiere a la planificación, organización, dirección y control de organizaciones públicas y privadas dedicadas al cuidado y promoción de la salud. También define la gerencia como el proceso de planear, regular y ejecutar las operaciones de una empresa para lograr un objetivo determinado de manera eficiente. Finalmente, describe al hospital como una empresa social que coordina sus actividades para lograr objetivos como la participación en el mercado y la responsabilidad
This document summarizes a student teacher's action research project that involved creating a classroom blog for their 8th grade science students. The student teacher wanted to see if a blog could reinforce curriculum comprehension and strengthen student voice. After implementing the blog, the student teacher observed students using it and adapted it based on their needs and feedback. Through surveys, the student teacher also learned how students used the internet and felt about online expression. While the blog showed potential, it was not fully realized due to lack of updates and customization over time to meet student needs.
Este documento presenta una descripción detallada de varios diseños cuasiexperimentales, incluidos diseños de un solo grupo pre-post, grupos control no equivalentes pre-post, series de tiempo, casos equilibrados y diseños de un solo sujeto. También define términos como cuasiexperimento, grupos intactos y diseño ex-postfacto, mientras destaca las ventajas y desventajas de los diseños cuasiexperimentales.
The Ring programming language version 1.8 book - Part 50 of 202Mahmoud Samir Fayed
The Page class contains methods for generating HTML elements and adding content to web pages. It includes methods for common elements like headings, paragraphs, links, forms, tables, and more. Each method accepts a parameter that allows setting attributes of the element through a list. This allows generating HTML elements with customized attributes in a simple way.
The document contains code examples demonstrating various Scala programming concepts such as functions, pattern matching, traits, actors and more. It also includes links to online resources for learning Scala.
The pug templating engine is designed to be easily extensible, as well as having powerful core functionality. It was recently re-written to be more modular and structured. This talk discusses how a compiler is built out of a series of independent stages.
More Stored Procedures and MUMPS for DivConqeTimeline, LLC
This document discusses DivConq's MUMPS API and provides examples of querying, updating, and defining stored procedures in MUMPS using DivConq's framework. It shows how to:
- Query a MUMPS global to retrieve test data and return the results.
- Define the schema for stored procedures, including their inputs, outputs, and descriptions.
- Map data between Java and MUMPS for calling procedures.
- Add a new record to a MUMPS global by calling an "Update" procedure from Java that handles auditing.
- Retrieve complex nested data structures by calling a procedure that returns a list of records.
This document provides a cheat sheet overview of key concepts in the IRODS rule language, including numeric and string literals, arithmetic and comparison operators, functions for strings, lists, tuples, if/else statements, foreach loops, defining functions and rules, handling errors, and inductive data types. It describes syntax for defining data types using constructors, and using pattern matching to define functions over data types.
The document discusses the key concepts of metaprogramming in Ruby including dynamic method lookup, open classes, modules, callbacks, and dynamic method definition. Some examples provided include defining accessor methods using modules, extending classes with module methods, defining instance and class methods dynamically, and hooking into callbacks to add functionality. Metaprogramming allows code to generate and modify code at runtime enabling powerful abstractions.
Cypher inside out: Como a linguagem de pesquisas em grafo do Neo4j foi constr...adrianoalmeida7
The document discusses how Cypher, the query language of Neo4j, was built and how to use it. It explains that Cypher was constructed using parser combinators in Scala to parse queries into an AST. It then describes how the different clauses of a Cypher query (start, match, where, return, etc.) are parsed and provides examples. Finally, it discusses how queries are executed by passing through different processing pipes in the ExecutionEngine.
Mixing Functional and Object Oriented Approaches to Programming in C#Skills Matter
The document discusses mixing functional and object-oriented programming approaches in C#, including examples of filtering arrays using predicates and delegates. It covers the evolution of C# from version 1.0 to 3.0, introducing generics, lambda expressions, extension methods and LINQ. Functional programming concepts like higher-order functions, immutability and lazy evaluation are also briefly discussed.
Mixing functional and object oriented approaches to programming in C#Mark Needham
The document discusses mixing functional and object-oriented programming approaches in C#, covering topics like generics, LINQ, lambdas, anonymous methods, extension methods, and more. It provides examples of filtering arrays and enumerables using predicates in increasingly functional styles. It argues that functional programming can complement object-oriented code by abstracting over common operations.
This document provides an overview of arrays and linked lists as data structures. It discusses arrays, including declaration, initialization, updating elements, and multi-dimensional arrays. It also covers searching arrays, why arrays are needed, pros and cons of arrays, and character strings as arrays. The document then introduces linked lists as a data structure and discusses linked list operations like printing all elements, adding nodes, appending nodes, inserting nodes, and deleting nodes. Homework questions on arrays and linked lists are provided at the end.
JavaScript String:
The String object lets you work with a series of characters; it wraps Javascript's string primitive data type with a number of helper methods.
As JavaScript automatically converts between string primitives and String objects, you can call any of the helper methods of the String object on a string primitive.
JavaScript Arrays:
The Array object lets you store multiple values in a single variable. It stores a fixed-size sequential collection of elements of the same type. An array is used to store a collection of data, but it is often more useful to think of an array as a collection of variables of the same type.
String Function
1. charAt():
This method returns the character from the specified index.Characters in a string are indexed from left to right. The index of the first character is 0, and the index of the last character in a string called stringName is stringName.length - 1.
Syntax:
string.charAt(index);
Return Value:
Returns the character from the specified index.
Example:
<html>
<head>
<title>JavaScript String charAt() Method</title>
</head>
<body>
</body>
</html>
Output:
str.charAt(0) is:T
2. concat():
Description:
This method adds two or more strings and returns a new single string.
Syntax:
string.concat(string2, string3[, ..., stringN]);
parameters:
string2...stringN : These are the strings to be concatenated.
Return Value:
Returns a single concatenated string.
Example:
<html>
<head>
<title>JavaScript String concat() Method</title>
</head>
<body>
</body>
</html>
Output:
Concatenated String :This is string oneThis is string two.
3. indexOf():
Description:
This method returns the index within the calling String object of the first occurrence of the specified value, starting the search at fromIndex or -1 if the value is not found.
Syntax:
string.indexOf(searchValue[, fromIndex])
Parameters:
searchValue : A string representing the value to search for.
fromIndex : The location within the calling string to start the search from. It can be any integer between 0 and the length of the string. The default value is 0.
Return Value:
Returns the index of the found occurrence otherwise -1 if not found.
Example:
<html>
<head>
<title>JavaScript String indexOf() Method</title>
</head>
<body>
<br />");
var index = str1.indexOf( "one" );
document.write("indexOf found String :" + index );
</body></html>
Oputput:
indexOf found String :8
indexOf found String :15
4. lastIndexOf():
Description:
This method returns the index within the calling String object of the last occurrence of the specified value, starting the search at fromIndex or -1 if the value is not found.
Syntax:
string.lastIndexOf(searchValue[, fromIndex])
Parameters:
searchValue : A string representing the value to search for.
fromIndex : The location within the calling string to start the search from. It can be any integer between 0 and the length of the string. The default value is 0.
Return Value:
Returns the index of the last found occurrence otherwise -1 if not found.
Example:
<html>
<head>
<title>JavaScri
The document provides an overview of key concepts in the C# programming language including variables and data types, arrays, conditional logic, loops, methods, parameters, and delegates. It discusses basic syntax, operators, and how to perform common tasks like type conversions and working with dates and strings. The goal is to give readers enough information to get started with C# as well as refer back to for language details while working through ASP.NET examples.
Mixing functional programming approaches in an object oriented languageMark Needham
The document discusses applying functional programming approaches in object-oriented languages like C#. It starts with examples of filtering arrays using predicates and shows how this can be refactored to be more functional. It introduces interfaces for predicates, delegates for functions, and anonymous methods. Lambdas, type inference, extension methods and LINQ are presented as ways to further improve the functional style. Concerns like performance and side effects are addressed. Overall it promotes embracing functional techniques like passing functions as values to gain abstraction and reduce errors while also discussing where object-oriented approaches are still useful.
Underscore.js is a JavaScript utility library that provides support for functional programming without extending built-in JavaScript objects. It includes over 60 functions for working with arrays, objects, functions and more. Some key functions include map, reduce, find, and bind for working with collections and functions. Underscore is open source and part of the DocumentCloud project.
This document provides an overview of Kotlin for backend development. It discusses Kotlin's advantages like Java interoperability and null safety. Coroutines are presented as an alternative to callback-based asynchronous programming. Examples are given of adopting Kotlin in different contexts like libraries, components and web applications. Strategies covered include preparing development tools and environments, evaluating current skills, and sharing experiences.
This presentation explains few features of advance scala. The topics I have covered here are the Implementations of extractors, Implicit conversions, parameters and implicit context and update function with the code snippet.
The document provides information about a JavaScript course including:
1. The course consists of 5 lectures and 5 labs and is evaluated based on projects, assignments, labs and quizzes.
2. The lecture outline covers introduction to JavaScript, syntax, built-in objects and functions.
3. JavaScript was invented by Brendan Eich at Netscape and first appeared in the Netscape Navigator browser in 1995.
Transpose and manipulate character strings Rupak Roy
This document discusses techniques for manipulating data frames in R, including transposing data between wide and long formats using the reshape() function, extracting and transforming character strings using functions like substr() and grep(), and replacing patterns within strings using sub() and gsub(). Wide format stores variables in columns while long format stores them in rows. The melt() and dcast() functions are used to reshape between these formats.
This document provides an introduction to data analysis and graphics in R. It covers vectors and assignment, data types including logical, integer, numeric, character, factor, complex and raw. It also discusses data structures such as atomic vectors, matrices, arrays and lists. Finally, it discusses importing data into R from files such as .RData files, text files using read.table(), CSV files and Excel files.
This document provides an outline for a course on quantitative data analysis and graphics in R. The course will cover planning a data analysis, basics of data analysis, testing for normality, choosing appropriate statistical tests, hypothesis testing, confidence intervals, and statistical significance tests both parametric and non-parametric. It will also cover the differences between hypothesis testing and confidence intervals.
This document provides an introduction and overview of graphics and plotting in R. It discusses high level and low level plotting functions, interacting with graphics, and modifying plots. It also covers plotting different variable types including dichotomous, categorical, ordinal, and continuous variables. Examples are provided for various plot types including histograms, bar plots, dot plots, boxplots, and more.
This document provides an introduction and overview of summarizing data in R. It discusses numerical summaries for different variable types including discrete, continuous, dichotomous, categorical, and ordinal variables. Measures of central tendency like mean, median and mode are covered as well as measures of dispersion. Skewness and kurtosis are also discussed. Examples of calculating these summaries for sample datasets are provided.
This document provides an overview of data entry, management, and manipulation in R. It discusses how to create datasets using various functions like c(), matrix(), data.frame(), and list(). It also covers understanding dataset properties, importing data, creating new variables, and subsetting datasets. Useful functions for working with datasets include mode(), length(), dim(), names(), and attributes(). The document shows examples of entering data using these different methods.
R is a free and open-source programming language and software environment for statistical analysis and graphics. RStudio is a popular integrated development environment (IDE) for R that provides a convenient graphical user interface. This document introduces R and RStudio, covering how to install them, their basic layout and features, and how to get help when working with R. Key functions and concepts discussed include loading and installing packages, working directories, and calling functions.
This document introduces level three of a data analysis tutorial series, which focuses on statistics fundamentals. The level aims to provide a good statistics foundation by covering descriptive statistics, exploratory data analysis, inferential statistics, categorical analysis, time series analysis, and survival analysis. Students will learn how to describe and make inferences from data, and the reasoning behind statistical calculations and assumptions to enable sound interpretation of results.
This document provides an introduction to version control systems using Git and GitHub. It begins with an overview of why version control is important and the evolution of version control systems from local to centralized to distributed. It then discusses installing and setting up Git, initializing and tracking files in a Git repository, committing changes, and ignoring files that should not be tracked via a .gitignore file. The goal is for students to understand the basics of Git and GitHub and be able to version control files and collaborate on projects.
A needs analysis involves comparing current conditions to desired goals to understand performance problems. It can be extensive, using large sample sizes for general understanding, or intensive, using smaller samples for in-depth cause-and-effect analysis. Performing a needs analysis involves gap analysis, identifying priorities, outlining a methodology, gathering and analyzing both quantitative and qualitative data, presenting findings, and making conclusions and recommendations. An example needs assessment addressed gender-based violence in schools in Africa through stakeholder interviews, performances, photo voices, drawings, and documentaries to develop an action plan.
This document discusses result-based monitoring and evaluation (M&E). It defines monitoring as the systematic collection of data on indicators, and evaluation as the objective assessment of a project or program's design, implementation and results. The purpose of M&E is to assess progress, determine relevance and fulfillment of objectives, and enhance transparency and accountability. Key aspects of result-based M&E covered include logical frameworks, methods/tools like rapid appraisal and impact evaluation, and essential actions to build an effective result-based M&E system like formulating goals and indicators to measure outcomes.
This document provides an introduction to regular expressions (regex) in R. It discusses literal regex which match text exactly, and metacharacters which have special meanings like ., *, ?, etc. It also covers character classes [ ], anchors ^ and $, quantifiers like ?, *, +, {}, alternations |, and capturing groups () in regex. The document uses examples of matching file names and dates to illustrate regex patterns and their uses in text matching and replacement.
This document provides instructions for a case study on web scraping Olympics data from Wikipedia pages using R. It discusses importing the raw HTML data from the URLs using readLines(), then exploring the structure and content of the imported data. The document emphasizes using base R functions instead of packages for this task. It then provides an overview of HTML to help understand how to locate and extract the desired data tables from the raw HTML.
This document provides guidance on solving coding problems in R. It discusses identifying and defining problems, and where and how to get help. For problems where the necessary functions are unknown, it recommends gaining foundational R skills through tutorials and practice. For known functions producing errors, warnings or unexpected output, it demonstrates defining the specific problem and potential causes. Sources of help discussed include R's internal documentation, manuals, FAQs, and external resources like web searches and mailing lists. The document also provides an example of solving a statistical mode problem in R.
Plot() is the main plotting function in base R. It is a generic function that dispatches different methods depending on the class of the first argument. When called, plot() follows 8 steps: 1) Open a new plotting window, 2) Set the plotting coordinates, 3) Evaluate pre-plot expressions, 4) Make the actual plot, 5) Evaluate post-plot expressions, 6) Add axes, 7) Add a frame, and 8) Add annotations. The class of the first argument determines which plotting method is used, and additional arguments can customize the plot output.
This document provides an overview of working with dates and times in R. It discusses recognizing date-time objects in R, getting the current date and time, and creating date-time objects using the POSIXct and POSIXlt classes. Methods for converting character and numeric data to date-time objects are presented, along with extracting parts of date-time objects and performing computations. The goal is to introduce the reader to key date and time functionality in base R.
This document discusses indexing and manipulating data objects in R. It covers indexing one-dimensional objects like vectors and lists, as well as two-dimensional objects like matrices and data frames. Binary operators for comparisons like equality and inequality are also described. The key topics are:
1) Indexing values in data objects using integers, characters, or logical values and discussing one-dimensional vs multi-dimensional objects.
2) Comparing values within and between vectors using binary operators like equality ("==") and inequality ("!="). These operate element-wise on vectors.
3) Checking conditions on data using functions like "all()", "any()", and "which()" to subset objects based on logical criteria.
This document discusses importing and exporting data in R. It covers importing data from local files, networks, and databases using both the graphical user interface (GUI) and command line. The key aspects to consider when importing data are the encoding, headers, row names, separators, decimals, quotations, comments, and missing values. Delimited files like CSVs can be imported using read.table() and its wrapper functions, while other file types require packages like foreign and haven. Data can also be exported from R using base functions or packages. The document provides examples of importing delimited text files from a local directory and webpage.
The document discusses making function calls in R. It explains that a function call requires a function name and arguments within parentheses. Arguments can be named or unnamed. When arguments are named, their position does not matter, but when unnamed, their values must be provided in the correct order to match the function's expected arguments. The document provides examples of making function calls using the mean() function to illustrate named versus unnamed arguments and the importance of argument position when unnamed. It also discusses other function components like the function name, default argument values, and anonymous functions.
R is a statistical computing and graphics program that was developed in 1993 at the University of Auckland. It has since grown significantly and is now widely used for data analysis. This document discusses downloading and installing R and RStudio, and provides an overview of their basic interfaces and functionality. It explains how to work interactively in the R console, write scripts, install and load packages, and find help documentation. The goal is to provide readers with the necessary tools and skills to begin working with R and RStudio.
1. Web Scrapping Functions
1. Web scraper
Description
Extracts data from the internet as a HTML document object and stores it as an R character
object.
Usage
webScraper(url, css, ..., asis = TRUE, constructTable = FALSE, withOutTags = FALSE)
Arguments
url: Internet or file address to read a valid HTML document. css: A character vector with css
selectors to use for data extraction ...: Other arguments to be passed to readLine function
asis: Logical, should css pattern be matched as is (default) or permuted constructTable:
Logical, should a data frame be created, defaults to FALSE withOutTags: Logical, should
HTML tags be removed, defaults to FALSE
webScraper <- function(url, css, ..., constructTable = FALSE, withOutTags =
FALSE) {
if (is.null(get0("dom", envir = globalenv()))) {
cat("Opening", url, "n")
assign("dom", readLines(url, ...), globalenv())
} else {
if (identical(get0("dom", globalenv()), readLines(url, ...))) {
dom <- get0("dom", globalenv())
} else {
cat("Opening", url, "n")
assign("dom", readLines(url, ...), globalenv())
}
}
m <-
grepl("(^(*s)?w*$)|(^*$)|(^(*s)?w*([[^]]+])+$)*|(^(*s)?
w*([.](w*[[:punct:]]*)*)+$)+|(^(*s)?w*#(w*[[:punct:]]*)+$)+|(^(
*s)?w*:(w*[[:punct:]]*)+$)*", x = css)
if (m) {
simpleSelectors(css, doc, constructTable, withOutTags)
}
}
2. Check HTML element
Description
Check if an object is a valid HTML element.
2. Usage
is.htmlElement(x, doc = NULL)
Arguments
x A character string with HTML tags, or matrix with indices and position to extract content.
doc R object with valid HTML elements used to extract content when "x" is a matrix
otherwise NULL if "x" is a character vector.
is.htmlElement <- function(x, doc = NULL) {
if (class(x) == "matrix") {
if (is.null(doc)) stop("Please provide document to extract from")
x <- contentExtractor(x, doc)
}
if (length(x) > 1) x <- paste0(x, collapse = "n")
if (grepl("<.+/>$", x)) {
cat("Self-closing elementnn")
return(TRUE)
}
openings <- length(gregexpr("<(?!/)[^>]+(?<!/)>", x, perl = TRUE))
closings <- length(gregexpr("</[^>]+>", x))
equalTags <- openings == closings
sameName <- grepl("^<(w+b)[^>]*>.*</1>$", x)
if (equalTags && sameName) {
return(TRUE)
} else {
return(FALSE)
}
}
3. Tag Counter
Description
Count number of tags in a string
Usage
tagCounter(tag, string, start = 1, count = FALSE)
Arguments
tag a character vector with tag name. Add "/" before a tag name if counting a closing tag.
string a character string used for the search
start Integer giving exact location where "<" for the tag begins count Logical, if TRUE
returns an integer value for number of matches. If FALSE (default), return a matrix with all
of matches, there positions and length.
tagCounter <- function(tag, string, start = 1, count = FALSE) {
if (start != 1) {
3. string <- substr(string, start, nchar(string))
}
pattern = paste0("<", tag, "b[^>]*>")
matches = gregexpr(pattern = pattern, text = string)[[1]]
if (start != 1) {
position <- as.vector(matches) + (start - 1)
} else {
position <- as.vector(matches)
}
if (length(position) == 1 && position < 0) {
return(0)
}
length <- attr(matches, "match.length")
tagMat <- matrix(c(position, length), ncol = 2, dimnames =
list(1:length(position), c("Position", "Length")))
if (count) {
return(nrow(tagMat))
} else {
return(tagMat)
}
}
4. Closing tag Locator Functions
Description
"clsTagLocator" locates closing tags given position and name of an opening tag.
"multiClsTagLocator" locates closing tags for multiple opening tags.
Usage
clsTagLocator(tagName, doc, index = 1, startPos = 1) multiClsTagLocator(tagNames, doc,
indices = 1, startPos = 1)
Arguments
tagName(s) A character string for clsTagLocator and a character vector for
multiClsTagLocator. doc A valid HTML document object index/indices integer for
clsTagLocator or an integer vector of length greater than one for multiClsTagLocator. These
give index/indices of opening tags when "doc" is a multi string object
startPos an integer vector of length one or more giving start position for opening tag(s)
clsTagLocator <- function(tagName, doc, index = 1, startPos = 1) {
lengthtagName <- nchar(tagName)
if (length(doc) == 1) {
multi <- FALSE
if (index != 1) warning("index > 1 when length(doc) = 1 is not useful")
tag <- substr(doc, startPos, startPos + lengthtagName)
if (tag != paste0("<", tagName)) stop('There is no "<', tagName, '"
starting at position ', startPos, '. Start position must be at angle "<"
4. bracket and not the tag name.')
} else if (length(doc) > 1) {
if (grepl(paste0("<", tagName), doc[index])) {
locations <- as.vector(gregexpr(paste0("<", tagName),
doc[index])[[1]])
if (!any(locations == startPos)) {
warning('There is no "', tagName, '" at position ', startPos, ".
'startPos' has been set to ", locations[1])
startPos <- locations[1]
}
} else stop('Closing tag error: There is no match for <"', tagName, '"
at index ', index)
openingPos <- startPos
multi <- TRUE
multiDoc <- doc
doc <- paste0(doc, collapse = "n")
startPos <- (as.vector(gregexpr("n", doc)[[1]])[index - 1] + 1) +
(startPos - 1)
}
nCharDoc <- nchar(doc)
docSub <- substr(doc, startPos, nCharDoc)
pattern1 <- paste0("<", tagName, "b[^>]*/>")
pattern2 <- paste0("<", tagName, "b[^>]*>[^<]*</", tagName, ">")
pattern3 <- paste0("<", tagName, "b[^>]*>([^<]*<(?!/", tagName,
")[^>]+>)*?<", tagName, "[^>]*>")
if (as.vector(regexpr(pattern1, docSub)) == 1) {
cat("A self-closing elementnn")
if (multi) {
data <- c(index, index, openingPos, 0)
clsTagMat <- matrix(data, ncol = 4, byrow = TRUE, dimnames =
list("Single", c("OpeningIndex", "ClosingIndex", "OpeningPos",
"ClosingPos")))
} else {
data <- c(index, index, startPos, 0)
clsTagMat <- matrix(data, ncol = 4, byrow = TRUE, dimnames =
list("Single", c("OpeningIndex", "ClosingIndex","StartPos", "ClosingPos")))
}
return(clsTagMat)
} else if (as.vector(regexpr(pattern2, docSub)) == 1) {
m <- regexpr(pattern2, docSub)
} else if (as.vector(regexpr(pattern3, docSub, perl = TRUE)) == 1) {
pattern <- paste0("<", tagName, "b[^>]*>([^<]*<[^>]+>)*?</",
tagName, ">")
m <- regexpr(pattern, docSub)
} else {
pattern <- paste0("<", tagName, "b[^>]*>([^<]*(<[^>]+>)*)*?</",
tagName, ">")
m <- regexpr(pattern, docSub)
}
elementLength <- attr(m, "match.length")
6. startPos <- rep(startPos, length.out = nIndices)
}
}
nIndices <- length(indices)
clsTagList <- lapply(1:nIndices, function(i) clsTagLocator(tagNames[i],
doc, indices[i], startPos[i]))
}
Reduce("rbind", clsTagList)
}
5. Content extractor
Description
Given indices and location of opening and closing tags, it extracts HTML elements and can
either produce a data frame or remove HTML tags.
Usage
contentExtractor(x, doc, constructTable = FALSE, withOutTags = FALSE, encoding = "UTF-
8")
Arguments
x a matrix with indices and position of opeining and closing tags doc a valid HTML
document object constructTable logical; whether a data frame should be created. Defaults
to FALSE. withOutTags logical; should HTML tags be removed. Defaults to FALSE.
contentExtractor <- function(x, doc, constructTable = FALSE, withOutTags =
FALSE, encoding = "UTF-8") {
if (class(x) != "matrix") stop('"x" must be a matrix')
if (is.null(rownames(x))) stop("'rownames(x)' must be either 'Multi' or
'Single'")
rows <- nrow(x)
content <- lapply(1:rows, function(i) {
if (rownames(x)[i] == "Multi") {
multi <- doc[x[i, "OpeningIndex"]:x[i, "ClosingIndex"]]
multi[[1]] <- substr(multi[[1]], x[i, "OpeningPos"],
nchar(multi[[1]]))
multi[[length(multi)]] <- substr(multi[[length(multi)]], 1, x[i,
"ClosingPos"])
multi
} else {
substr(doc[x[i, "OpeningIndex"]], x[i, "OpeningPos"], x[i,
"ClosingPos"])
}
})
for (i in 1:rows) {
Encoding(content[[i]]) <- encoding
}
if (length(content) == 1) {
7. content <- content[[1]]
}
if (constructTable) {
if (rows == 1) {
return(tableConstructor(content))
} else {
return(multiTableConstructor(content))
}
} else if (withOutTags) {
content <- as.vector(sapply(content, gsub, pattern = '</?[^>]*>',
replacement = ""))
logi <- sapply(1:length(content), function(i) grepl(pattern = "w+", x
= content[[i]]))
return(lapply(1:length(logi), function(i)
content[[i]][which(logi[[i]])]))
} else {
return(content)
}
}
6. Table Constructor
Description
Creates a data frame from a html table element.
Usage
tableConstructor(x, encoding = "UTF-8")
Arguments
x A table element encoding Encoding to be set for all variables
tableConstructor <- function(x, encoding = "UTF-8") {
indices <- grep("<tr", x)
nIndices <- length(indices)
trOpCls <- multiClsTagLocator("tr", doc = x, indices = indices)
rawTr <- contentExtractor(trOpCls, x)
nRows <- if (any(grepl("th", rawTr[[1]]))) {
nIndices - 1
} else {
nIndices
}
nCols <- sapply(seq(rawTr), function(i) length(grep("<t(h|d)",
rawTr[[i]])))
uniqCol = unique(nCols)
nCols = nCols[which.max(nCols)]
if (length(uniqCol) > 1) {
warning("There are cell data spanning more than one column")
}
8. df = data.frame(matrix(nrow = nRows, ncol = nCols))
if (any(grepl("th", rawTr[[1]]))) {
th = grep("<[^>]*th", rawTr[[1]])
colNams = gsub("<s*/?[^>]*>", "", rawTr[[1]][th])
if (length(colNams) != nCols) {
names(df) = paste0("Var", seq(nCols))
} else {
names(df) = colNams
rawTr = rawTr[-1]
}
} else {
names(df) = paste0("Var", seq(nCols))
}
for (i in 1:length(rawTr)) {
ind <- grep("<t(h|d)", rawTr[[i]])
for (j in 1:length(ind)) {
tag <- sub("<(t(h|d))[^>]*>.*", "1", rawTr[[i]][ind[j]])
element <- clsTagLocator(tag, rawTr[[i]], ind[j])
ij = paste(contentExtractor(element, rawTr[[i]], withOutTags =
TRUE), collapse = "; ")
Encoding(ij) = encoding
df[i, j] = gsub(" ", " ", ij)
}
}
df
}
multiTableConstructor <- function(x, encoding = "UTF-8") {
tables <- vector("list", length(x))
for (i in 1:length(x)) {
tables[[i]] <- tableConstructor(x[[i]], encoding = "UTF-8")
}
tables
}
7. Content Remover
Description
Removes unwanted content from outputs of content extractor or from columns in created
data frames.
Usage
contentRemover(x, content, column = NULL)
9. Arguments
x a charcter string, a data frame or a list with data frames. content charcter string with any
regular expression including literals and special characters targeting content to be
removed column integer vector indicating one or more columns for which specified content
will be removed.
Value
If "x" is a data frame, then a data frame is returned, if it is a list, a list with data frames will
be outputed.
contentRemover <- function(x, content, column = NULL) {
removeContent <- function(x, content, column = NULL) {
if (length(column) == 1) {
x[, column] <- gsub(content, "", x[, column])
} else {
for (i in seq(column)) {
x[, column[i]] <- gsub(content, "", x[, column[i]])
}
}
x
}
if (class(x) == "data.frame") {
return(removeContent(x, column, content))
} else if (class(x) == "list") {
elements <- vector("list", length(x))
for (i in seq(x)) {
if (class(x[[i]]) == "data.frame") {
elements[[i]] <- removeContent(x = x[[i]], column, content)
} else {
elements[[i]] <- gsub(pattern = content, replacement = "", x =
x[[i]])
}
}
if (class(elements[[1]]) != "data.frame") {
elements <- sapply(seq(elements), function(i)
elements[[i]][which(nchar(elements[[i]]) > 0)])
}
return(elements)
} else {
return(gsub(pattern = content, replacement = "", x = x))
}
}
10. 8. Attribute pattern constructor
Description
Constructs a search pattern for attributes based on given css.
Usage
attrPatternConstructor(css, asis = TRUE)
Arguments
css character string with cascading styling sheet selector for which an attibute pattern will
be constructed. asis logical; if TRUE (default), it will construct a pattern using the given
order of attributes. If FALSE, an alternating pattern will be constructed out of all
permutations of listed attributes.
attrPatternConstructor <- function(css, asis = TRUE) {
if (!grepl("[.[#]", css)) {
if (grepl("w+", css)) cat("Detecting tag name onlyn")
stop("No attributes listed")
}
cssAttributes <- vector("list")
counter <- 0
pattern1 <- "(?<=[.])[^.[#]+"
pattern2 <- "(?<=#)[^.[#]+"
pattern3 <- "(?<=[)(.+?)(?=])"
if (grepl(pattern1, css, perl = TRUE)) {
counter <- counter + 1
classes <- regmatches(css, gregexpr(pattern1, css, perl = TRUE))[[1]]
classes <- sub(pattern1, "1", classes, perl = TRUE)
withClass <- TRUE
cssAttributes[[counter]] <- paste0('class="', paste(classes, collapse =
" "), '"')
} else withClass <- FALSE
if (grepl(pattern2, css, perl = TRUE)) {
counter <- counter + 1
if (length(gregexpr(pattern2, css, perl = TRUE)[[1]]) > 1)
warning("Elements can only have one 'id' attribute, hence only the first is
matched")
id <- regmatches(css, regexpr(pattern2, css, perl = TRUE))
withId <- TRUE
cssAttributes[[counter]] <- paste0('id="', sub(pattern2, "1", id,
perl = TRUE), '"')
} else withId <- FALSE
if (grepl(pattern3, css, perl = TRUE)) {
pattn1 <- '[class(="([[:graph:]]+)")?]'
pattn2 <- '[id(="[[:graph:]]+")?]'
if (withClass && grepl(pattn1, css)) {
additionalClasses <- regmatches(css, regexpr(pattn1, css))[[1]]
11. pattn <- '[class="([^"]*)"]'
if (grepl(pattn, additionalClasses)) {
additionalClasses <- sub('[class="([^"]*)"]', "1",
additionalClasses)
}
cssAttributes[[1]] <- paste0('class="', paste(classes,
additionalClasses, collapse = " "), '"')
css <- regmatches(css, regexpr(pattn1, css), invert = TRUE)[[1]]
css <- paste(css, collapse = "")
}
if (withId && grepl(pattn2, css)) {
warning("More than one version of element 'id' given, only the first
with '#' is used")
css <- regmatches(css, regexpr(pattn2, css), invert = TRUE)[[1]]
css <- paste(css, collapse = "")
}
}
if (grepl(pattern3, css, perl = TRUE)) {
attrb <- regmatches(css, gregexpr(pattern3, css, perl = TRUE))[[1]]
if (length(grep("=", attrb, invert = TRUE))) {
counter <- counter + 1
ind <- grep("=", attrb, invert = TRUE)
cssAttributes[[counter]] <- paste0(attrb[ind], '="[^"]+"')
}
pattern4 <- '([^=~|^$*]+)([~|^$*]?)="(.+)"'
if (any(grepl(pattern4, attrb))) {
ind <- grep(pattern4, attrb)
componentTwo <- sub(pattern4, "1", attrb[ind])
extra <- sub(pattern4, "2", attrb[ind])
nExtra <- length(extra)
value <- sub(pattern4, "3", attrb[ind])
val <- rep(NA, length(extra))
if (any(extra == "")) {
ind <- which(extra == "")
val[ind] <- paste0('"', value[ind], '"')
}
if (any(extra == "~")) {
ind <- which(extra == "~")
val[ind] <- paste0('"([[:graph:]]*s)*?', value[ind],
'(s[[:graph:]]*)*"')
}
if (any(extra == "|")) {
ind <- which(extra == "|")
val[ind] <- paste0('"', value[ind], '(-[[:graph:]]+)?"')
}
if (any(extra == "^")) {
ind <- which(extra == "^")
val[ind] <- paste0('"', value[ind], '[[:graph:]]+"')
}
if (any(extra == "$")) {
12. ind <- which(extra == "$")
val[ind] <- paste0('"[[:graph:]]+', value[ind], '"')
}
if (any(extra == "*")) {
ind <- which(extra == "*")
val[ind] <- paste0('"[[:graph:]]*', value[ind], '([[:graph:]]*"')
}
counter <- counter + 1
cssAttributes[[counter]] <- paste(componentTwo, val, sep = "=")
}
}
cssAttributes <- unlist(cssAttributes)
if (is.null(cssAttributes)) {
return(cssAttributes)
}
n <- length(cssAttributes)
f <- factorial(n)
cl <- "s([^s]+s)*?"
if (n == 1 | asis) {
pattern <- cssAttributes
if (length(pattern) > 1) {
pattern <- paste(pattern, collapse = " ")
}
} else {
indMatrix <- permutationTuples(n)
patternList <- lapply(1:f, function(i)
paste(cssAttributes[indMatrix[i,]], collapse = cl))
pattern <- unlist(patternList)
pattern <-paste(pattern, collapse = "|")
}
pattern
}
9. Permutation Tuples
Description
Generates a matrix with all permutation tuples given an integer.
Usage
permutationTuples(n)
Arguments
n integer vector of length one from which permutation tuples will be generated.
permutationTuples <- function(n) {
if (!is.numeric(n) | length(n) > 1) stop('"n" must be a numeric vector of
length one')
if (grepl("[.]", n)) {
13. warning('"n" is a float point number, it has be rounded up to ',
ceiling(n))
n <- ceiling(n)
}
permMat <- matrix(0,nrow = factorial(n), ncol = n)
i <- 0
repeat {
perm <- sample(n)
logi <- sapply(1:factorial(n), function(i) !all(permMat[i,] == perm))
if (all(logi)) {
i <- i + 1
permMat[i,] <- perm
}
if (i == factorial(n)) {break}
}
permMat
}
10. Simple Selector
Description
Based on simple selectors (part of css selectors), it produced targeted content as is or
without HTML tags or if a table element, can create a data frame.
Usage
simpleSelectors(css, doc, asis = TRUE, content = TRUE, constructTable = FALSE,
withOutTags = FALSE, encoding = "UTF-8")
Arguments
css character string with simple selector which include, type, universal, class, id and
attributes. Pseudo classes are currently (pre-aplha version: 0.0.0) not supported (but
underdevelopment). doc a HTML valid document object from which data will be extracted
asis logical; should css be used as is (default) or attributes permuted content logical; should
content of matched selector be returned (default) or tag names and indices of their match
constructTable logical; should a table be constructed if it is a valid HTML table element,
defaults to FALSE withOutTags logical; should HTML tags be removed, defaults to FALSE
encoding character string giving encoding to be applied to content.
simpleSelectors <- function(css, doc, asis = TRUE, content = TRUE,
constructTable = FALSE, withOutTags = FALSE, encoding = "UTF-8") {
if (grepl("^*$", css)) return(doc)
pattern <- "([^:]+):(w+$|w+-w+(-w+-?w*(([^)]+))?)?)"
if (grepl(pattern, css)) {
withPseudo <- TRUE
css <- sub(pattern, "1", css)
pseudoClass <- sub(pattern, "2", css)
} else withPseudo <- FALSE
14. if (grepl("^(*s)?w+$", css)) {
tagName <- sub("^(*s)?(w+)$", "2", css)
if (all(grepl(paste0("<", tagName), doc) == FALSE)) stop("No match
found for <", tagName)
indices <- grep(paste0("<", tagName), doc)
if (!content) return(list(tagNames = tagName, indices = indices))
stp <- sapply(indices, function(i) regexpr(paste0("<", tagName), text =
doc[i]))
clsTagMat <- multiClsTagLocator(tagNames = tagName, doc = doc, indices
= indices, startPos = stp)
} else {
if (grepl("^w", css)) {
pattn <- "^(w+)([.[#].*)"
tagNames <- sub(pattn, "1", css)
pattern <- attrPatternConstructor(sub(pattn, "2", css), asis)
pattern <- paste0("<", tagNames, "b[^>]*?", pattern, "[^>]*>")
if (all(grepl(pattern, doc) == FALSE)) stop("No match for ",
pattern)
indices <- grep(pattern, doc)
stp <- sapply(indices, function(i) regexpr(pattern, doc[i]))
} else {
pattern <- attrPatternConstructor(css, asis)
if (all(grepl(pattern, doc) == FALSE)) stop("No match for ",
pattern)
pattn <- paste0('<(w+b)[^>]*?', pattern, "[^>]*>.+$")
indices <-grep(pattn, doc)
tagNames <- sub(pattn, "1", doc[indices])
if (!all(sapply(tagNames, grepl, pattern = "bw+b"))) {
stop("Pattern does not match tag names, instead matches ",
tagNames)
}
stp <- sapply(indices, function(i) regexpr(pattn, doc[i]))
}
if (!content) return(list(tagNames = tagNames, indices = indices))
clsTagMat <- multiClsTagLocator(tagName = tagNames, doc = doc, indices
= indices, startPos = stp)
}
contentExtractor(x = clsTagMat, doc = doc, constructTable =
constructTable, withOutTags = withOutTags, encoding = encoding)
}
11. nth interpreter
Description
Used to compute "an+b" algebra in pseudo-class selector.
Usage
nthInterpreter(nth, nDoc, fromLast = FALSE)
15. Arguments
nth character vector with details of "an+b". Essentially what is in brackets when pseudo-
class selector begins with "nth". nDoc integer; total number of from which nth will be
compted fromLast logical; should selection be done in reverse as is the case with pseudo-
class selectors with "from-last". Default is FALSE.
nthInterpreter <- function(nth, nDoc, fromLast = FALSE) {
if (grepl("^d+$", nth)) {
return(as.integer(nth))
}
pattern <- "([+-]?)(d*)([+-]?)(n?)([+-]?)(d*)"
if (!grepl(pattern, nth) | nth == "") stop("nth is not interpretable")
if (grepl("^[+-]?n[+-]?d+", nth)) nth <- paste0("+1", nth)
a <- sub(pattern, "2", nth)
n <- sub(pattern, "4", nth)
b <- sub(pattern, "6", nth)
if (a != "" && n == "" && b != "") stop("nth not interpretable")
if (a != "" && n != "" && b == "") b <- 0
if (a == "" && n != "" && b == "") stop('"a" and "b" missing')
if (a == b && n != "") b <- 0
if (nth == "even") {
a <- "2"
n <- "n"
b <- "0"
}
if (nth == "odd") {
a <- "2"
n <- "n"
b <- "1"
}
if (!(a != "" && n != "" && b != "")) stop("nth not interpretable")
aSign <- sub(pattern, "1", nth)
nSign <- sub(pattern, "3", nth)
bSign <- sub(pattern, "5", nth)
if (all(c(aSign != "+", aSign != "-"))) aSign <- "+"
if (all(c(nSign != "+", nSign != "-"))) nSign <- "+"
if (all(c(bSign != "+", bSign != "-"))) bSign <- "+"
a <- as.numeric(paste0(aSign, a)); b <- as.numeric(paste0(bSign, b))
if (fromLast) {
n <- ceiling(as.numeric(paste0(nSign, (nDoc/a - b):0)))
} else {
n <- ceiling(as.numeric(paste0(nSign, 0:(nDoc/a - b))))
}
nthIndices <- a * n + b
nthIndices <- nthIndices[which(nthIndices > 0)]
if (length(nthIndices) == 1 && nDoc != 1 && fromLast) {
nthIndices <- (nDoc:1)[nthIndices]
}