This document discusses using Python to access and process web data. It covers using the Requests library to make HTTP requests and get web content, parsing web content with Beautiful Soup and JSON, and accessing web services using REST. Example code is provided for making GET and POST requests, extracting data from HTML and JSON responses, and creating a simple Flask web service.
This document provides examples of web scraping using Python. It discusses fetching web pages using requests, parsing data using techniques like regular expressions and BeautifulSoup, and writing output to files like CSV and JSON. Specific examples demonstrated include scraping WTA tennis rankings, New York election board data, and engineering firm profiles. The document also covers related topics like handling authentication, exceptions, rate limiting and Unicode issues.
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...Anton
The document provides instructions on how to scrape websites for data using Python and the Scrapy framework. It describes Scrapy as a framework for crawling websites and extracting structured data. It also discusses using XPath expressions to select nodes from HTML or XML documents and extract specific data fields. The document gives an example of defining Scrapy items to represent the data fields to extract from a tourism website and spiders to crawl the site to retrieve attraction URLs and then scrape detail pages to fill the item fields.
Downloading the internet with Python + ScrapyErin Shellman
The document describes using the Python library Scrapy to build web scrapers and extract structured data from websites. It discusses monitoring competitor prices as a motivation for scraping. It provides an overview of Scrapy projects and components. It then walks through setting up a Scrapy spider to scrape product data from the backcountry.com website, including defining items to scrape, crawling and parsing instructions, requesting additional pages, and cleaning extracted data. The goal is to build a scraper that extracts product and pricing information from backcountry.com to monitor competitor prices.
Web Crawling Modeling with Scrapy Models #TDC2014Bruno Rocha
This document discusses Scrapy Models, which is a framework for structuring web crawlers and scrapers using models. It allows defining models with fields that select elements using CSS or XPath. Fields can contain multiple queries and validate the best match. Methods can parse each field after fetching. The scrapy_model library provides a BaseFetcherModel to create scrapers that select elements and populate models from scraped data.
Description
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
Abstract
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
This workshop will provide an overview of Scrapy, starting from the fundamentals and working through each new topic with hands-on examples.
Participants will come away with a good understanding of Scrapy, the principles behind its design, and how to apply the best practices encouraged by Scrapy to any scraping task.
Goals:
Set up a python environment.
Learn basic concepts of the Scrapy framework.
How to scraping content from web for location-based mobile app.Diep Nguyen
This document discusses using web scraping to collect location-based data for mobile apps. It introduces web scraping and the Scrapy framework. It then proposes using Scrapy to continuously scrape data like addresses, latitude/longitude coordinates, and phone numbers from various websites. The scraped data would be extracted using techniques like XPath and geocoding APIs. Duplicated data would be prevented and the scraping system could run without a dedicated server by syncing scraped results to a server.
Python tools for webscraping provides an overview of scraping techniques like screen scraping, report mining, and web scraping using spiders and crawlers. It then demonstrates various Python libraries for web scraping including Selenium, Requests, Beautiful Soup, PyQuery, Scrapy, and Scrapy Cloud. The document shows how to scrape data from websites using these tools and techniques.
Introduction to Selenium e Scrapy by Arcangelo Saracino
Web UI testing with Selenium, check actions, text and submit form.
Scrapy to crawl info from a website combined with selenium.
This document provides examples of web scraping using Python. It discusses fetching web pages using requests, parsing data using techniques like regular expressions and BeautifulSoup, and writing output to files like CSV and JSON. Specific examples demonstrated include scraping WTA tennis rankings, New York election board data, and engineering firm profiles. The document also covers related topics like handling authentication, exceptions, rate limiting and Unicode issues.
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...Anton
The document provides instructions on how to scrape websites for data using Python and the Scrapy framework. It describes Scrapy as a framework for crawling websites and extracting structured data. It also discusses using XPath expressions to select nodes from HTML or XML documents and extract specific data fields. The document gives an example of defining Scrapy items to represent the data fields to extract from a tourism website and spiders to crawl the site to retrieve attraction URLs and then scrape detail pages to fill the item fields.
Downloading the internet with Python + ScrapyErin Shellman
The document describes using the Python library Scrapy to build web scrapers and extract structured data from websites. It discusses monitoring competitor prices as a motivation for scraping. It provides an overview of Scrapy projects and components. It then walks through setting up a Scrapy spider to scrape product data from the backcountry.com website, including defining items to scrape, crawling and parsing instructions, requesting additional pages, and cleaning extracted data. The goal is to build a scraper that extracts product and pricing information from backcountry.com to monitor competitor prices.
Web Crawling Modeling with Scrapy Models #TDC2014Bruno Rocha
This document discusses Scrapy Models, which is a framework for structuring web crawlers and scrapers using models. It allows defining models with fields that select elements using CSS or XPath. Fields can contain multiple queries and validate the best match. Methods can parse each field after fetching. The scrapy_model library provides a BaseFetcherModel to create scrapers that select elements and populate models from scraped data.
Description
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
Abstract
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
This workshop will provide an overview of Scrapy, starting from the fundamentals and working through each new topic with hands-on examples.
Participants will come away with a good understanding of Scrapy, the principles behind its design, and how to apply the best practices encouraged by Scrapy to any scraping task.
Goals:
Set up a python environment.
Learn basic concepts of the Scrapy framework.
How to scraping content from web for location-based mobile app.Diep Nguyen
This document discusses using web scraping to collect location-based data for mobile apps. It introduces web scraping and the Scrapy framework. It then proposes using Scrapy to continuously scrape data like addresses, latitude/longitude coordinates, and phone numbers from various websites. The scraped data would be extracted using techniques like XPath and geocoding APIs. Duplicated data would be prevented and the scraping system could run without a dedicated server by syncing scraped results to a server.
Python tools for webscraping provides an overview of scraping techniques like screen scraping, report mining, and web scraping using spiders and crawlers. It then demonstrates various Python libraries for web scraping including Selenium, Requests, Beautiful Soup, PyQuery, Scrapy, and Scrapy Cloud. The document shows how to scrape data from websites using these tools and techniques.
Introduction to Selenium e Scrapy by Arcangelo Saracino
Web UI testing with Selenium, check actions, text and submit form.
Scrapy to crawl info from a website combined with selenium.
The document discusses using asyncio to perform asynchronous web scraping in Python. It begins with an overview of common Python web scraping tools like Requests, BeautifulSoup, Scrapy. It then covers key concepts of asyncio like event loops, coroutines, tasks and futures. It provides examples of building an asynchronous web crawler and downloader using these concepts. Finally, it discusses alternatives to asyncio for asynchronous programming like ThreadPoolExecutor and ProcessPoolExecutor.
Introducing the Eve REST API Framework.
FOSDEM 2014, Brussels
PyCon Sweden 2014, Stockholm
PyCon Italy 2014, Florence
Python Meetup, Helsinki
EuroPython 2014, Berlin
Python, web scraping and content management: Scrapy and DjangoSammy Fung
This document discusses using Python, Scrapy, and Django for web scraping and content management. It provides an overview of open data principles and describes how Scrapy can be used to extract structured data from websites. Scrapy spiders can be defined to scrape specific sites and output extracted data. Django is introduced as a web framework for building content management systems. The document demonstrates how Scrapy and Django can be integrated, with Scrapy scraping data and Django providing data models and administration. It also describes the hk0weather project on GitHub as an example that scrapes Hong Kong weather data using these tools.
RESTFUL SERVICES MADE EASY: THE EVE REST API FRAMEWORK - Nicola Iarocci - Co...Codemotion
The document discusses RESTful web services and the Eve framework for building them. It provides an overview of key Eve features like setting up a basic API, connecting to MongoDB, adding validation rules, embedding resources, pagination, filtering, sorting, rate limiting, file storage, and ensuring data integrity. The presentation also covers more advanced topics such as geo-spatial support, authentication, and using Eve with other data stores like SQL and Elasticsearch.
This document discusses using Flask and Eve to build a REST API with Python in 3 days. It introduces Flask as a microframework for building web applications with Python. Eve is presented as a Python framework built on Flask that allows building RESTful APIs with MongoDB in a simple way. The document provides examples of creating basic Flask and Eve apps, configuring Eve settings like schemas and authentication, and describes many features of Eve like filtering, sorting, pagination and validation.
Assumptions: Check yo'self before you wreck yourselfErin Shellman
Predicting the future is hard and it requires a lot of assumptions, also known as beliefs, also known as faith. In “Assumptions: Check yo self, before you wreck yo self” we explore the consequences of beliefs when constructing predictive models. We’ll walk through the process of developing a demand forecast for Evo, a Seattle-based outdoor recreation retailer, and discuss how assumptions influence the behavior of your application and ultimately the decisions you make.
Mango allows users to declaratively define and query Apache CouchDB indexes. Mango leverages Lucene not only to perform text search, but also to enable ad-hoc querying capabilities.
This document provides information on programming with HDInsight including important port numbers, using WebHDFS and WebHCat to interact with HDFS and Hive, running MapReduce jobs with .NET, and using Mahout for machine learning tasks like classification, clustering, and collaborative filtering recommendations.
This document provides instructions for building a basic PHP search engine with MySQL database. It explains how to create a search form, connect to a MySQL database, perform searches on the database, and display search results. Code is provided to create a database table, write the search logic, and paginate results across multiple pages. The search engine allows users to search the database and returns relevant results from the title and article fields.
- The original Scuk.cz application was rewritten using new technologies (PHP/Symfony backend, JS/React frontend) while keeping the same URLs, links between sections, and backend APIs.
- The routing system was updated to use regular expressions to match URLs and dynamically route to backend controllers or the classic application as needed.
- Errors are handled by returning specific status codes that trigger the nginx proxy to route requests either to the backend or classic application as needed.
Cross Domain Web Mashups with JQuery and Google App EngineAndy McKay
This document discusses cross-domain mashups using jQuery and Google App Engine. It describes common techniques for dealing with the same-origin policy, including proxies, JSONP, and building sample applications that mashup Twitter data, geotagged tweets, and maps. Examples include parsing RSS feeds from Twitter into JSONP and displaying tweets on a map based on their geotagged locations. The document concludes by noting issues with trust, failures, and limitations for enterprise use.
Web Development with CoffeeScript and SassBrian Hogan
The document discusses using CoffeeScript and Sass to improve the web development process. CoffeeScript offers a cleaner syntax for writing JavaScript code, while Sass provides extensions to CSS. Together with an automated workflow, these tools allow developers to build modern web applications using better techniques that make the code more readable and maintainable. The presentation provides examples of how CoffeeScript cleans up JavaScript code and syntax, such as declaring variables and functions, as well as how it interacts with libraries like jQuery.
This document provides an overview of key concepts for working with Elasticsearch including:
- What documents and their metadata fields like _index, _type, and _id represent
- How to index, retrieve, update, delete and check for existence of documents
- Using versions for optimistic concurrency control
- Partially updating documents with scripts
- Retrieving multiple documents with _mget
- Reducing overhead with bulk indexing operations
- Setting default types to reduce repetition
This document provides an overview of CouchDB, a document-oriented database. It describes CouchDB's key features such as storing data as JSON documents with dynamic schemas, providing a RESTful HTTP API, using JavaScript for views and aggregations, and replicating data between databases. It also provides code examples for common operations like creating, retrieving, updating and deleting documents, as well as attaching files. The document recommends libraries for using CouchDB from different programming languages and shares the code for a simple CouchDB library created in an afternoon.
Django is a high-level Python web framework that allows for fast development, automates repetitive tasks, and follows best practices. It provides features like an admin interface, elegant URL design, templates, and is powerful yet easy to use for web development. The document discusses Django's history, installation, basic tutorial on creating models, views, URLs, and templates.
The document contains code snippets from various CGI/PHP programs. The programs demonstrate concepts like:
- Printing HTTP headers and HTML tags to display web pages
- Retrieving and displaying environment variables and form parameters
- Performing SQL queries to insert, update, select from databases
- Using sessions and cookies to store and retrieve user information across requests
The programs cover basic to intermediate level skills for server-side scripting with CGI/PHP.
Scott Kingsley Clark is the lead developer of Pods, a senior web engineer at 10up, a top WordPress development agency, and has more than ten years of experience in web development, primarily using WordPress. In this talk he'll share some of the lesser known functions and capabilities of Pods and WordPress that you should know about in order to take your WordPress development to the next level.
The document discusses FakeWeb, a Ruby gem that allows stubbing and mocking HTTP requests in tests without making actual requests. It provides examples of registering fake responses for URIs with different bodies, status codes, headers, and number of requests. FakeWeb makes it possible to test code that makes web requests without needing a live connection.
Here are the steps to solve this problem:
1. Convert both lists of numbers to sets:
set1 = {11, 2, 3, 4, 15, 6, 7, 8, 9, 10}
set2 = {15, 2, 3, 4, 15, 6}
2. Find the intersection of the two sets:
intersection = set1.intersection(set2)
3. The number of elements in the intersection is the number of similar elements:
similarity = len(intersection)
4. Print the result:
print(similarity)
The similarity between the two sets is 4, since they both contain the elements {2, 3, 4, 15}.
The basics of Python are rather straightforward. In a few minutes you can learn most of the syntax. There are some gotchas along the way that might appear tricky. This talk is meant to bring programmers up to speed with Python. They should be able to read and write Python.
The document discusses using asyncio to perform asynchronous web scraping in Python. It begins with an overview of common Python web scraping tools like Requests, BeautifulSoup, Scrapy. It then covers key concepts of asyncio like event loops, coroutines, tasks and futures. It provides examples of building an asynchronous web crawler and downloader using these concepts. Finally, it discusses alternatives to asyncio for asynchronous programming like ThreadPoolExecutor and ProcessPoolExecutor.
Introducing the Eve REST API Framework.
FOSDEM 2014, Brussels
PyCon Sweden 2014, Stockholm
PyCon Italy 2014, Florence
Python Meetup, Helsinki
EuroPython 2014, Berlin
Python, web scraping and content management: Scrapy and DjangoSammy Fung
This document discusses using Python, Scrapy, and Django for web scraping and content management. It provides an overview of open data principles and describes how Scrapy can be used to extract structured data from websites. Scrapy spiders can be defined to scrape specific sites and output extracted data. Django is introduced as a web framework for building content management systems. The document demonstrates how Scrapy and Django can be integrated, with Scrapy scraping data and Django providing data models and administration. It also describes the hk0weather project on GitHub as an example that scrapes Hong Kong weather data using these tools.
RESTFUL SERVICES MADE EASY: THE EVE REST API FRAMEWORK - Nicola Iarocci - Co...Codemotion
The document discusses RESTful web services and the Eve framework for building them. It provides an overview of key Eve features like setting up a basic API, connecting to MongoDB, adding validation rules, embedding resources, pagination, filtering, sorting, rate limiting, file storage, and ensuring data integrity. The presentation also covers more advanced topics such as geo-spatial support, authentication, and using Eve with other data stores like SQL and Elasticsearch.
This document discusses using Flask and Eve to build a REST API with Python in 3 days. It introduces Flask as a microframework for building web applications with Python. Eve is presented as a Python framework built on Flask that allows building RESTful APIs with MongoDB in a simple way. The document provides examples of creating basic Flask and Eve apps, configuring Eve settings like schemas and authentication, and describes many features of Eve like filtering, sorting, pagination and validation.
Assumptions: Check yo'self before you wreck yourselfErin Shellman
Predicting the future is hard and it requires a lot of assumptions, also known as beliefs, also known as faith. In “Assumptions: Check yo self, before you wreck yo self” we explore the consequences of beliefs when constructing predictive models. We’ll walk through the process of developing a demand forecast for Evo, a Seattle-based outdoor recreation retailer, and discuss how assumptions influence the behavior of your application and ultimately the decisions you make.
Mango allows users to declaratively define and query Apache CouchDB indexes. Mango leverages Lucene not only to perform text search, but also to enable ad-hoc querying capabilities.
This document provides information on programming with HDInsight including important port numbers, using WebHDFS and WebHCat to interact with HDFS and Hive, running MapReduce jobs with .NET, and using Mahout for machine learning tasks like classification, clustering, and collaborative filtering recommendations.
This document provides instructions for building a basic PHP search engine with MySQL database. It explains how to create a search form, connect to a MySQL database, perform searches on the database, and display search results. Code is provided to create a database table, write the search logic, and paginate results across multiple pages. The search engine allows users to search the database and returns relevant results from the title and article fields.
- The original Scuk.cz application was rewritten using new technologies (PHP/Symfony backend, JS/React frontend) while keeping the same URLs, links between sections, and backend APIs.
- The routing system was updated to use regular expressions to match URLs and dynamically route to backend controllers or the classic application as needed.
- Errors are handled by returning specific status codes that trigger the nginx proxy to route requests either to the backend or classic application as needed.
Cross Domain Web Mashups with JQuery and Google App EngineAndy McKay
This document discusses cross-domain mashups using jQuery and Google App Engine. It describes common techniques for dealing with the same-origin policy, including proxies, JSONP, and building sample applications that mashup Twitter data, geotagged tweets, and maps. Examples include parsing RSS feeds from Twitter into JSONP and displaying tweets on a map based on their geotagged locations. The document concludes by noting issues with trust, failures, and limitations for enterprise use.
Web Development with CoffeeScript and SassBrian Hogan
The document discusses using CoffeeScript and Sass to improve the web development process. CoffeeScript offers a cleaner syntax for writing JavaScript code, while Sass provides extensions to CSS. Together with an automated workflow, these tools allow developers to build modern web applications using better techniques that make the code more readable and maintainable. The presentation provides examples of how CoffeeScript cleans up JavaScript code and syntax, such as declaring variables and functions, as well as how it interacts with libraries like jQuery.
This document provides an overview of key concepts for working with Elasticsearch including:
- What documents and their metadata fields like _index, _type, and _id represent
- How to index, retrieve, update, delete and check for existence of documents
- Using versions for optimistic concurrency control
- Partially updating documents with scripts
- Retrieving multiple documents with _mget
- Reducing overhead with bulk indexing operations
- Setting default types to reduce repetition
This document provides an overview of CouchDB, a document-oriented database. It describes CouchDB's key features such as storing data as JSON documents with dynamic schemas, providing a RESTful HTTP API, using JavaScript for views and aggregations, and replicating data between databases. It also provides code examples for common operations like creating, retrieving, updating and deleting documents, as well as attaching files. The document recommends libraries for using CouchDB from different programming languages and shares the code for a simple CouchDB library created in an afternoon.
Django is a high-level Python web framework that allows for fast development, automates repetitive tasks, and follows best practices. It provides features like an admin interface, elegant URL design, templates, and is powerful yet easy to use for web development. The document discusses Django's history, installation, basic tutorial on creating models, views, URLs, and templates.
The document contains code snippets from various CGI/PHP programs. The programs demonstrate concepts like:
- Printing HTTP headers and HTML tags to display web pages
- Retrieving and displaying environment variables and form parameters
- Performing SQL queries to insert, update, select from databases
- Using sessions and cookies to store and retrieve user information across requests
The programs cover basic to intermediate level skills for server-side scripting with CGI/PHP.
Scott Kingsley Clark is the lead developer of Pods, a senior web engineer at 10up, a top WordPress development agency, and has more than ten years of experience in web development, primarily using WordPress. In this talk he'll share some of the lesser known functions and capabilities of Pods and WordPress that you should know about in order to take your WordPress development to the next level.
The document discusses FakeWeb, a Ruby gem that allows stubbing and mocking HTTP requests in tests without making actual requests. It provides examples of registering fake responses for URIs with different bodies, status codes, headers, and number of requests. FakeWeb makes it possible to test code that makes web requests without needing a live connection.
Here are the steps to solve this problem:
1. Convert both lists of numbers to sets:
set1 = {11, 2, 3, 4, 15, 6, 7, 8, 9, 10}
set2 = {15, 2, 3, 4, 15, 6}
2. Find the intersection of the two sets:
intersection = set1.intersection(set2)
3. The number of elements in the intersection is the number of similar elements:
similarity = len(intersection)
4. Print the result:
print(similarity)
The similarity between the two sets is 4, since they both contain the elements {2, 3, 4, 15}.
The basics of Python are rather straightforward. In a few minutes you can learn most of the syntax. There are some gotchas along the way that might appear tricky. This talk is meant to bring programmers up to speed with Python. They should be able to read and write Python.
This document discusses techniques for query-directed data mining using Python and parallel processing. It describes Compete Inc., which analyzes large amounts of data daily and stores terabytes annually to support predictive analysis. Compete uses a Unix cluster of over 60 machines with job-level parallelism managed by a batch system to process massive amounts of data from tera-scale storage needs while allowing for ad-hoc research queries against these large data sources.
This document discusses SQLite record recovery from deleted areas of an SQLite database file. It begins with an introduction to SQLite and why it is useful for forensic analysis. It then covers the structure of SQLite database files including header pages, table B-trees, index B-trees, overflow pages, and free pages. The document simulates traversing and parsing the cells within a table B-tree to understand how records are stored and indexed. It aims to help analysts understand SQLite file structure to enable recovery of deleted records through analysis of unused areas.
- Normalization is the process of structuring a database to minimize duplication and define relationships clearly. There are five normal forms that provide guidelines for optimal database design.
- Normalization specifies design criteria to guide the database structure and identify problems. It provides rules to reorganize data in a consistent, clean fashion.
- Denormalization intentionally introduces some data duplication to improve performance of complex queries by reducing the number of table joins required. It is typically done on read-only systems like data warehouses.
SQLite is an embedded SQL database that implements common SQL features. It stores each database in a single file and is open source, free, and used by many applications including Firefox, Safari, and Apple Mail. Some advantages are that it is self-contained, easy to set up, and has no server requirements. It supports ACID transactions and has a small memory footprint. SQLite is generally suitable for simple applications that need local data storage, while larger projects requiring enterprise features may need a different database system.
This document provides examples of using SQLite with Python. It demonstrates how to connect to an SQLite database, execute SQL commands like CREATE TABLE and INSERT, and query the database. It also shows how to integrate SQLite with the Flask web framework, including connecting to the database within a Flask app and returning query results in JSON format. The examples cover basic CRUD (create, read, update, delete) operations and converting rows to dictionaries for JSON serialization.
Weka is a popular open source machine learning software that contains tools for data analysis, classification, regression, and clustering. The document demonstrates how to use Weka to perform simple linear regression with one dependent variable and multiple linear regression with multiple dependent variables. It also shows how to use Weka for classification by training a model on demographic data to predict contraceptive method choice. Weka builds models that can make predictions on new test data to classify and regress targets based on patterns learned from training data.
This document provides an introduction to data visualization using Python libraries like Pandas, Seaborn, NumPy, and SciPy. It begins with an about section for the author and then discusses data and visualization. Several Python libraries for visualization are introduced, including Pandas, Matplotlib, Seaborn, and examples are shown for each. The document concludes by asking for any other questions.
O documento descreve o SQLite, um banco de dados SQL leve e open source. Ele define o SQLite como uma biblioteca que armazena e recupera dados diretamente de um arquivo no disco, sem necessidade de configuração ou servidor. O documento também discute a história, características, instalação e uso do SQLite.
The document discusses SQLite, including that it is a file-based, single-user, cross-platform SQL database engine. It provides information on SQLite tools like SQLite Manager and the sqlite3 command line utility. It also covers how to use the SQLite API in iOS/macOS applications, including opening a database connection, executing queries, and using prepared statements. Key aspects covered are getting the documents directory path, copying a default database on app startup, and the callback signature for query results.
The document provides information about the sqlite3 command-line interface tool and its commands. It summarizes that sqlite3 allows users to interactively issue SQL commands and see results, lists some common dot-commands like .tables to display table names and .schema to display the SQL schema, and outlines the main SQL commands supported by SQLite like CREATE TABLE, SELECT, INSERT and DELETE.
This document discusses using Python and SQLite3 to interact with a SQLite database. It shows how to connect to a database using sqlite3.connect(), get a cursor object using conn.cursor(), execute SQL statements like CREATE TABLE and INSERT using the cursor's execute method, and retrieve data using fetch methods. The goal is to demonstrate basic SQLite database operations in Python like creating tables, inserting data, and querying data.
SQLite is a public-domain software package that provides a lightweight relational database management system (RDBMS). It can be used to store user-defined records in tables. SQLite is flexible in where it can run and how it can be used, such as for in-memory databases, data stores, client/server stand-ins, and as a generic SQL engine. The SQLite codebase supports multiple platforms and consists of the SQLite core, sqlite3 command-line tool, Tcl extension, and configuration/building options. SQL is the main language for interacting with relational databases like SQLite and consists of commands for data definition, manipulation, and more.
Classification and Clustering Analysis using Weka Ishan Awadhesh
This Term Paper demonstrates the classification and clustering analysis on Bank Data using Weka. Classification Analysis is used to determine whether a particular customer would purchase a Personal Equity PLan or not while Clustering Analysis is used to analyze the behavior of various customer segments.
SQLite es una biblioteca de base de datos SQL ligera y portable que almacena la base de datos completa en un solo archivo. Fue creada por D. Richard Hipp y es de código abierto, no requiere configuración, funciona sin servidor, y es compatible con múltiples plataformas. Ofrece características como registros de longitud variable, compilación de consultas SQL a código de máquina virtual, y extensiones del lenguaje SQL.
This document provides an introduction and overview to the Python programming language. It includes sections on why learn programming and Python, how to learn Python, Python versions 2 vs 3, data types in Python like integers, floats, strings, lists, dictionaries, functions, loops, and classes. The document contains links to online resources for learning Python and examples of basic Python code.
This document discusses different ways to extend semantics on the web through microdata, microformats, RDFa, and schema.org. It explains the basic syntax for using microdata to embed machine-readable data in HTML documents. Microdata provides a simple way to do this while being standardized in HTML5. It also recommends using schema.org as a unified vocabulary for semantic markup.
The document discusses web scraping in Ruby, including why to scrape the web (for information, research, and testing), legal concerns around scraping, and various Ruby libraries and tools that can be used for scraping like Mechanize, Nokogiri, Webrat, Scrubyt, and Watir. It also covers CSS and XPath selectors for targeting data, and browser developer tools for inspecting pages and selecting nodes.
The document discusses how structured data can be used to enhance search engine results and user experiences across websites, mobile apps, and other interfaces. It provides examples of using schemas like Microdata and JSON-LD to define relationships in structured data that power rich snippets, app deep links, personalized search cards, and more. The use of structured data from emails and events is also highlighted as a way to deliver pushed search results and populate the knowledge graph.
HTTP REQUEST RESPONSE OBJECT - WEB APPLICATION USING C# LABpriya Nithya
The document contains code for several ASP.NET web pages that demonstrate using request and response objects to work with query strings, browser details, and cookies. The home page links to pages for querying query strings, accessing browser details, and setting/reading cookies. The query string and browser pages display values from the respective request objects. The cookies page uses cookies to set the background color for 10 seconds when a color is selected from a dropdown.
This document provides instructions for building a Python web application using Bottle and Gevent. It discusses setting up an asynchronous server using Bottle and Gevent to make more efficient use of CPU resources. It then demonstrates how to create routes, handle inputs, return different content types like plaintext, JSON, and HTML templates, and display lists and highlight names in templates.
Ein Microservice alleine macht noch keinen Sommer. Interessant wird es erst, wenn viele scheinbar unabhängige Services ein großes Ganzes bilden. Als Kommunikationsmuster zwischen den einzelnen Services wird dabei nicht selten auf REST zurückgegriffen. So weit, so gut. Aber wie sieht eine wirklich gute, und vor allem zukunftssichere REST-Schnittstelle aus? Welches Austauschformat sollte man wählen? XML, JSON, Binary oder am besten gleich alle drei? Wie geht man mit dem Thema Versionierung um? Und wie sichert man das API gegen unbefugte Benutzung ab? Welche Rolle spielen die Response-Header? Was ist mit dem Thema „Error Handling“? Und wie nutzt man möglichst effektiv die verschiedenen HTTP-Statuscodes? Macht es Sinn, für unterschiedliche Channels unterschiedliche Schnittstellen anzubieten? Und was ist noch einmal dieses HATEOAS? All diese Fragen – und viele weitere – wollen wir uns am Beispiel eines eigenen API-Designs anschauen. Ausgehend von einer einfachen REST-Schnittstelle, werden wir Schritt für Schritt neue Anforderungen einführen und dafür passende praxisnahe Lösungen entwickeln.
Talk presented at the April 2011 meeting of the Harvard CMS working group. Overview for Facebook open graph integration and the approaches taken at the Harvard Gazette.
The document discusses how to integrate social media APIs from Facebook, Twitter, and Meetup into websites. It provides examples and code snippets for creating Facebook pages and widgets, accessing Twitter APIs and feeds, and using Meetup APIs to lookup events and RSVPs.
The document discusses how to integrate social media APIs from Facebook, Twitter, and Meetup into websites. It provides examples and code for creating Facebook pages and widgets, accessing Twitter APIs and feeds, and using Meetup APIs to lookup events and RSVPs. Integration methods include embedding widgets, parsing API responses with PHP, and displaying custom Twitter feeds.
True Dreams Furniture sells office furniture online, including conference tables, workstations, chairs, and other items. It has a wide selection of office furniture available at affordable prices and delivers to Delhi, India and surrounding areas. The website provides information about their products and allows customers to purchase office furniture online.
The document discusses various methods for consuming web services using PHP, including REST, SOAP, and specific examples using Flickr, Delicious, and eBay APIs. REST uses HTTP requests and XML responses, while SOAP encapsulates requests and responses in XML for platform independence. Examples demonstrate using PHP with SimpleXML to parse REST responses, as well as the SOAP extension to call SOAP APIs and handle authentication.
The document provides an overview of important on-page SEO elements and best practices, including meta tags, URLs, links, images, social metadata, structured data, internationalization, and responsive design. It covers topics like the meta description tag, image alt text, HTTP status codes, XML sitemaps, canonicalization, pagination, and more. User agents, robots.txt, and meta robots tags are also discussed for controlling crawlers.
The document provides guidance on search engine optimization techniques. It discusses important on-page SEO elements like meta tags, URLs, and structured data. It also covers off-page elements like linking and social sharing. Specific techniques are presented for international SEO, mobile optimization, and responsive design.
This document provides an introduction to embedding PHP code in HTML documents and sending form data from the client to the server. It discusses using PHP to echo HTML tags and strings. It explains how form data is sent via GET and POST methods and how it can be accessed in the PHP file specified in the form's action using the $_GET, $_POST and $_REQUEST superglobal arrays. It also covers uploading files via HTML forms and accessing file data in the PHP file using the $_FILES array.
Creating and Deploying Static Sites with HugoBrian Hogan
Most web sites don’t have data that changes, so why power them with a database and take the performance hit? In this talk we’ll explore static site generation using Hugo, an open-source static site generator. You’ll learn how to make a master layout for all pages, and how to use Markdown to create your content pages quickly.
Then we’ll explore how to deploy the site we made to production. We’ll automate the entire process. When you’re done, you’ll be able to build and deploy static web sites quickly with minimal tooling.
This document provides a cheat sheet for search engine optimization (SEO) best practices. It outlines recommendations for on-page elements like hyperlinks, images, URLs and page titles. It also covers on-page content optimization strategies like using headings, extracting keywords and generating XML sitemaps. The cheat sheet additionally addresses off-page elements like submission to search engine catalogs, using robots.txt files and redirecting URLs. Source links are provided for additional reference on SEO guidelines.
HTML5, The Open Web, and what it means for you - AltranRobert Nyman
This document discusses HTML5 and related topics. It provides code examples of new HTML5 elements like <header>, <article>, and <canvas>. It demonstrates how to add semantics, draw shapes, and load images onto a canvas. It also mentions new APIs for video, custom data attributes, and live regions for accessibility. The goal is to introduce HTML5 and showcase its capabilities for building engaging web content.
How I learned to stop worrying and love the .htaccess fileRoxana Stingu
An introduction to .htaccess and what this file can do to help with SEO.
Redirects:
- Mod_alias and mod_rewrite
- Most common redirect types (domain migrations, subdomain to folder and folder renaming and how to deal with duplicate content).
Indexing & Crawling:
- Set HTTP headers for canonicals and meta robots for non-HTML files.
Website speed:
- Gzip and Deflate
- Cache control
Defeating Cross-Site Scripting with Content Security Policy (updated)Francois Marier
How a new HTTP response header can help increase the depth of your web application defenses.
Also includes a few slides on HTTP Strict Transport Security, a header which helps protects HTTPS sites from sslstrip attacks.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Drona Infotech is a premier mobile app development company in Noida, providing cutting-edge solutions for businesses.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Consistent toolbox talks are critical for maintaining workplace safety, as they provide regular opportunities to address specific hazards and reinforce safe practices.
These brief, focused sessions ensure that safety is a continual conversation rather than a one-time event, which helps keep safety protocols fresh in employees' minds. Studies have shown that shorter, more frequent training sessions are more effective for retention and behavior change compared to longer, infrequent sessions.
Engaging workers regularly, toolbox talks promote a culture of safety, empower employees to voice concerns, and ultimately reduce the likelihood of accidents and injuries on site.
The traditional method of conducting safety talks with paper documents and lengthy meetings is not only time-consuming but also less effective. Manual tracking of attendance and compliance is prone to errors and inconsistencies, leading to gaps in safety communication and potential non-compliance with OSHA regulations. Switching to a digital solution like Safelyio offers significant advantages.
Safelyio automates the delivery and documentation of safety talks, ensuring consistency and accessibility. The microlearning approach breaks down complex safety protocols into manageable, bite-sized pieces, making it easier for employees to absorb and retain information.
This method minimizes disruptions to work schedules, eliminates the hassle of paperwork, and ensures that all safety communications are tracked and recorded accurately. Ultimately, using a digital platform like Safelyio enhances engagement, compliance, and overall safety performance on site. https://safelyio.com/
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid
IBM watsonx Code Assistant for Z, our latest Generative AI-assisted mainframe application modernization solution. Mainframe (IBM Z) application modernization is a topic that every mainframe client is addressing to various degrees today, driven largely from digital transformation. With generative AI comes the opportunity to reimagine the mainframe application modernization experience. Infusing generative AI will enable speed and trust, help de-risk, and lower total costs associated with heavy-lifting application modernization initiatives. This document provides an overview of the IBM watsonx Code Assistant for Z which uses the power of generative AI to make it easier for developers to selectively modernize COBOL business services while maintaining mainframe qualities of service.
INTRODUCTION TO AI CLASSICAL THEORY TARGETED EXAMPLESanfaltahir1010
Image: Include an image that represents the concept of precision, such as a AI helix or a futuristic healthcare
setting.
Objective: Provide a foundational understanding of precision medicine and its departure from traditional
approaches
Role of theory: Discuss how genomics, the study of an organism's complete set of AI ,
plays a crucial role in precision medicine.
Customizing treatment plans: Highlight how genetic information is used to customize
treatment plans based on an individual's genetic makeup.
Examples: Provide real-world examples of successful application of AI such as genetic
therapies or targeted treatments.
Importance of molecular diagnostics: Explain the role of molecular diagnostics in identifying
molecular and genetic markers associated with diseases.
Biomarker testing: Showcase how biomarker testing aids in creating personalized treatment plans.
Content:
• Ethical issues: Examine ethical concerns related to precision medicine, such as privacy, consent, and
potential misuse of genetic information.
• Regulations and guidelines: Present examples of ethical guidelines and regulations in place to safeguard
patient rights.
• Visuals: Include images or icons representing ethical considerations.
Content:
• Ethical issues: Examine ethical concerns related to precision medicine, such as privacy, consent, and
potential misuse of genetic information.
• Regulations and guidelines: Present examples of ethical guidelines and regulations in place to safeguard
patient rights.
• Visuals: Include images or icons representing ethical considerations.
Content:
• Ethical issues: Examine ethical concerns related to precision medicine, such as privacy, consent, and
potential misuse of genetic information.
• Regulations and guidelines: Present examples of ethical guidelines and regulations in place to safeguard
patient rights.
• Visuals: Include images or icons representing ethical considerations.
Real-world case study: Present a detailed case study showcasing the success of precision
medicine in a specific medical scenario.
Patient's journey: Discuss the patient's journey, treatment plan, and outcomes.
Impact: Emphasize the transformative effect of precision medicine on the individual's
health.
Objective: Ground the presentation in a real-world example, highlighting the practical
application and success of precision medicine.
Data challenges: Address the challenges associated with managing large sets of patient data in precision
medicine.
Technological solutions: Discuss technological innovations and solutions for handling and analyzing vast
datasets.
Visuals: Include graphics representing data management challenges and technological solutions.
Objective: Acknowledge the data-related challenges in precision medicine and highlight innovative solutions.
Data challenges: Address the challenges associated with managing large sets of patient data in precision
medicine.
Technological solutions: Discuss technological innovations and solutions
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISTier1 app
Are you ready to unlock the secrets hidden within Java thread dumps? Join us for a hands-on session where we'll delve into effective troubleshooting patterns to swiftly identify the root causes of production problems. Discover the right tools, techniques, and best practices while exploring *real-world case studies of major outages* in Fortune 500 enterprises. Engage in interactive lab exercises where you'll have the opportunity to troubleshoot thread dumps and uncover performance issues firsthand. Join us and become a master of Java thread dump analysis!
Malibou Pitch Deck For Its €3M Seed Roundsjcobrien
French start-up Malibou raised a €3 million Seed Round to develop its payroll and human resources
management platform for VSEs and SMEs. The financing round was led by investors Breega, Y Combinator, and FCVC.
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...The Third Creative Media
"Navigating Invideo: A Comprehensive Guide" is an essential resource for anyone looking to master Invideo, an AI-powered video creation tool. This guide provides step-by-step instructions, helpful tips, and comparisons with other AI video creators. Whether you're a beginner or an experienced video editor, you'll find valuable insights to enhance your video projects and bring your creative ideas to life.
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
Odoo releases a new update every year. The latest version, Odoo 17, came out in October 2023. It brought many improvements to the user interface and user experience, along with new features in modules like accounting, marketing, manufacturing, websites, and more.
The Odoo 17 update has been a hot topic among startups, mid-sized businesses, large enterprises, and Odoo developers aiming to grow their businesses. Since it is now already the first quarter of 2024, you must have a clear idea of what Odoo 17 entails and what it can offer your business if you are still not aware of it.
This blog covers the features and functionalities. Explore the entire blog and get in touch with expert Odoo ERP consultants to leverage Odoo 17 and its features for your business too.
An Overview of Odoo ERP
Odoo ERP was first released as OpenERP software in February 2005. It is a suite of business applications used for ERP, CRM, eCommerce, websites, and project management. Ten years ago, the Odoo Enterprise edition was launched to help fund the Odoo Community version.
When you compare Odoo Community and Enterprise, the Enterprise edition offers exclusive features like mobile app access, Odoo Studio customisation, Odoo hosting, and unlimited functional support.
Today, Odoo is a well-known name used by companies of all sizes across various industries, including manufacturing, retail, accounting, marketing, healthcare, IT consulting, and R&D.
The latest version, Odoo 17, has been available since October 2023. Key highlights of this update include:
Enhanced user experience with improvements to the command bar, faster backend page loading, and multiple dashboard views.
Instant report generation, credit limit alerts for sales and invoices, separate OCR settings for invoice creation, and an auto-complete feature for forms in the accounting module.
Improved image handling and global attribute changes for mailing lists in email marketing.
A default auto-signature option and a refuse-to-sign option in HR modules.
Options to divide and merge manufacturing orders, track the status of manufacturing orders, and more in the MRP module.
Dark mode in Odoo 17.
Now that the Odoo 17 announcement is official, let’s look at what’s new in Odoo 17!
What is Odoo ERP 17?
Odoo 17 is the latest version of one of the world’s leading open-source enterprise ERPs. This version has come up with significant improvements explained here in this blog. Also, this new version aims to introduce features that enhance time-saving, efficiency, and productivity for users across various organisations.
Odoo 17, released at the Odoo Experience 2023, brought notable improvements to the user interface and added new functionalities with enhancements in performance, accessibility, data analysis, and management, further expanding its reach in the market.
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
Project Management: The Role of Project Dashboards.pdfKarya Keeper
Project management is a crucial aspect of any organization, ensuring that projects are completed efficiently and effectively. One of the key tools used in project management is the project dashboard, which provides a comprehensive view of project progress and performance. In this article, we will explore the role of project dashboards in project management, highlighting their key features and benefits.
4. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
▸ Web Parser
▸ Web Services
5. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Requests Library
import requests
requests.get(‘http://www.facebook.com’).text
pip install requests #install library
6. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Make a Request
#GET Request
import requests
r = requests.get(‘http://www.facebook.com’)
if r.status_code == 200:
print(“Success”)
Success
7. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Make a Request
#POST Request
import requests
r = requests.post('http://httpbin.org/post', data = {'key':'value'})
if r.status_code == 200:
print(“Success”)
Success
8. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Make a Request
#Other Types of Request
import requests
r = requests.put('http://httpbin.org/put', data = {'key':'value'})
r = requests.delete('http://httpbin.org/delete')
r = requests.head('http://httpbin.org/get')
r = requests.options('http://httpbin.org/get')
9. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Passing Parameters In URLs
#GET Request with parameter
import requests
r = requests.get(‘https://www.google.co.th/?hl=th’)
if r.status_code == 200:
print(“Success”)
Success
10. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Passing Parameters In URLs
#GET Request with parameter
import requests
r = requests.get(‘https://www.google.co.th’,params={“hl”:”en”})
if r.status_code == 200:
print(“Success”)
Success
11. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Passing Parameters In URLs
#POST Request with parameter
import requests
r = requests.post("https://m.facebook.com",data={"key":"value"})
if r.status_code == 200:
print(“Success”)
Success
12. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Content
#Text Response
import requests
data = {“email” :“…..” , pass : “……”}
r = requests.post(“https://m.facebook.com”,data=data)
if r.status_code == 200:
print(r.text)
'<?xml version="1.0" encoding="utf-8"?>n<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML
Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd"><html xmlns="http://
www.w3.org/1999/xhtml"><head><title>Facebook</title><meta name="referrer"
content="default" id="meta_referrer" /><style type=“text/css”>/*<!………………..
13. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Content
#Response encoding
import requests
r = requests.get('https://www.google.co.th/logos/doodles/2016/king-
bhumibol-adulyadej-1927-2016-5148101410029568.2-hp.png')
r.encoding = ’tis-620'
if r.status_code == 200:
print(r.text)
'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage"
lang="th"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta
content="/logos/doodles/2016/king-bhumibol-adulyadej-1927-2016-5148101410029568.2-
hp.png" itemprop="image"><meta content="ปวงข้าพระพุทธเจ้า ขอน้อมเกล้าน้อมกระหม่อมรำลึกใน...
14. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Content
#Binary Response
import requests
r = requests.get('https://www.google.co.th/logos/doodles/2016/king-
bhumibol-adulyadej-1927-2016-5148101410029568.2-hp.png')
if r.status_code == 200:
open(“img.png”,”wb”).write(r.content)
15. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Status Codes
#200 Response (OK)
import requests
r = requests.get('https://api.github.com/events')
if r.status_code == requests.codes.ok:
print(data[0]['actor'])
{'url': 'https://api.github.com/users/ShaolinSarg', 'display_login': 'ShaolinSarg', 'avatar_url': 'https://
avatars.githubusercontent.com/u/6948796?', 'id': 6948796, 'login': 'ShaolinSarg', 'gravatar_id': ''}
16. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Status Codes
#200 Response (OK)
import requests
r = requests.get('https://api.github.com/events')
print(r.status_code)
200
17. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Status Codes
#404
import requests
r = requests.get('https://api.github.com/events/404')
print(r.status_code)
404
18. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Headers
#404
import requests
r = requests.get('http://www.sanook.com')
print(r.headers)
print(r.headers[‘Date’])
{'Content-Type': 'text/html; charset=UTF-8', 'Date': 'Tue, 08 Nov 2016 14:38:41 GMT', 'Cache-
Control': 'private, max-age=0', 'Age': '16', 'Content-Encoding': 'gzip', 'Content-Length': '38089',
'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Accept-Ranges': 'bytes'}
Tue, 08 Nov 2016 14:38:41 GMT
19. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Timeouts
#404
import requests
r = requests.get(‘http://www.sanook.com',timeout=0.001)
ReadTimeout: HTTPConnectionPool(host='github.com', port=80): Read timed out. (read
timeout=0.101)
20. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Authentication
#Basic Authentication
import requests
r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
print(r.status_code)
200
21. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
read more : http://docs.python-requests.org/en/master/
22. USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Quiz#1 : Tag Monitoring
1. Get webpage : http://pantip.com/tags
2. Save to file every 5 minutes (time.sleep(300))
3. Use current date time as filename
(How to get current date time using Python?, find it on Google)
23. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
HTML Parser : beautifulsoup
from bs4 import BeautifulSoup
soup = BeautifulSoup(open(“file.html”),"html.parser") #parse from file
soup = BeautifulSoup(“<html>data</html>”,"html.parser") #parse from
text
pip install beautifulsoup4 #install library
24. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
from bs4 import BeautifulSoup
soup = BeautifulSoup(“<html>data</html>”,"html.parser")
print(soup)
<html>data</html>
25. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#Navigating using tag names
from bs4 import BeautifulSoup
html_doc = """<html><head><title>The Dormouse's story</title></
head><body><p class="title"><b>The Dormouse's story</b></p></
body>”””
soup = BeautifulSoup(html_doc,"html.parser")
soup.head
soup.title
soup.body.p
26. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
<head><title>The Dormouse's story</title></head>
<title>The Dormouse's story</title>
<p class="title"><b>The Dormouse's story</b></p>
27. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#Access string
from bs4 import BeautifulSoup
html_doc = “""<h1>hello</h1>”””
soup = BeautifulSoup(html_doc,"html.parser")
print(soup.h1.string)
hello
28. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#Access attribute
from bs4 import BeautifulSoup
html_doc = “<a href="http://example.com/elsie" >Elsie</a>”
soup = BeautifulSoup(html_doc,"html.parser")
print(soup.a[‘href’])
http://example.com/elsie
29. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#Get all text in the page
from bs4 import BeautifulSoup
html_doc = """<html><head><title>The Dormouse's story</title></
head><body><p class="title"><b>The Dormouse's story</b></p></
body>”””
soup = BeautifulSoup(html_doc,"html.parser")
print(soup.get_text)
<bound method Tag.get_text of <html><head><title>The Dormouse's story</title></
head><body><p class="title"><b>The Dormouse's story</b></p></body></html>>
30. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
# find_all()
from bs4 import BeautifulSoup
html_doc = """<a href="http://example.com/elsie" class="sister"
id="link1">Elsie</a>,<a href="http://example.com/lacie" class="sister"
id="link2">Lacie</a> and <a href="http://example.com/tillie"
class="sister" id="link3">Tillie</a>;”””
soup = BeautifulSoup(html_doc,"html.parser")
for a in soup.find_all(‘a’):
print(a)
31. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
<a class="sister" href="http://example.com/elsie"
id="link1">Elsie</a>
<a class="sister" href="http://example.com/lacie"
id="link2">Lacie</a>
32. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#find_all()
soup.find_all(id='link2')
soup.find_all(href=re.compile("elsie"))
soup.find_all(id=True)
data_soup.find_all(attrs={"data-foo": “value"})
soup.find_all("a", class_="sister")
soup.find_all("a", recursive=False)
soup.p.find_all(“a", recursive=False)
33. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
re.compile(…..)
<a href=“http://192.x.x.x” class=“c1”>hello</a>
<a href=“https://192.x.x.x” class=“c1”>hello</a>
<a href=“https://www.com” class=“c1”>hello</a>
find_all(href=re.compile(‘(https|http)://[0-9.]’))
https://docs.python.org/2/howto/regex.html
34. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
read more : https://www.crummy.com/software/BeautifulSoup/
bs4/doc/
35. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#2 : Tag Extraction
1. Get webpage : http://pantip.com/tags
2. Extract tag name, tag link, number of topic in
first 10 pages
3. save to file as this format
tag name, tag link, number of topic, current datetime
4. Run every 5 minutes
36. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
import json
json_doc = json.loads(“{key : value}“)
built-in function
37. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
#JSON string
json_doc = “””{“employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]} “””
38. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
#Parse string to object
import json
json_obj = json.loads(json_doc)
print(json_obj)
{'employees': [{'firstName': 'John', 'lastName': 'Doe'}, {'firstName': 'Anna', 'lastName': 'Smith'},
{'firstName': 'Peter', 'lastName': 'Jones'}]}
39. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
#Access json object
import json
json_obj = json.loads(json_doc)
print(json_obj[‘employees’][0][‘firstName’])
print(json_obj[‘employees’][0][‘lastName’])
John
Doe
40. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
#Create json doc
import json
json_obj = {“firstName” : “name”,”lastName” : “last”} #Dictionary
print(json.dumps(json_obj,indent=1))
{
"firstName": "name",
"lastName": “last"
}
41. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#3 : Post Monitoring
1. Register as Facebook Developer on
developers.facebook.com
2. Get information of last 10 hours post on the page
https://www.facebook.com/MorningNewsTV3
3. save to file as this format
post id, post datetime, #number like, current datetime
42. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#3 : Post Monitoring
URL
https://graph.facebook.com/v2.8/<PageID>?
fields=posts.limit(100)%7Blikes.limit(1).summary(true)
%2Ccreated_time%7D&access_token=
51. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON
{"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}
list
dict
key
value
read more : http://www.json.org/
52. USING PYTHON TO ACCESS WEB DATA
▸ Web Service
Create Simple Web Service
from flask.ext.api import FlaskAPI
app = FlaskAPI(__name__)
@app.route('/example/')
def example():
return {'hello': 'world'}
app.run(debug=False,port=5555)
pip install Flask-API
53. USING PYTHON TO ACCESS WEB DATA
▸ Web Service
Create Simple Web Service
#receive input
from flask.ext.api import FlaskAPI
app = FlaskAPI(__name__)
@app.route(‘/hello/<name>/<lastName>')
def example(name,lastName):
return {'hello':name}
app.run(debug=False,port=5555)
54. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#4 : Tag Service
1. Build get TopTagInfo function using web service.
2. Input : Number of top topic
3. Output: tag name and number of top the topic in json
format.
55. USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#4 : Top Tag Service
1. Build getTopTagInfo web service.
2. Input : Number of top topic
3. Output: tag name and number of top the topic in json
format.
58. USING DATABASES WITH PYTHON
Zero configuration
– SQLite does not need to be Installed as there is no setup procedure to use it.
Server less
– SQLite is not implemented as a separate server process. With SQLite, the process that wants to access the
database reads and writes directly from the database files on disk as there is no intermediary server process.
Stable Cross-Platform Database File
– The SQLite file format is cross-platform. A database file written on one machine can be copied to and used
on a different machine with a different architecture.
Single Database File
– An SQLite database is a single ordinary disk file that can be located anywhere in the directory hierarchy.
Compact
– When optimized for size, the whole SQLite library with everything enabled is less than 400KB in size
59. USING DATABASES WITH PYTHON
SQLite
import sqlite3
conn = sqlite3.connect('my.db')
built-in library : sqlite3
60. USING DATABASES WITH PYTHON
SQLite
1. Connect to db
2. Get cursor
3. Execute command
4. Commit (insert / update/delete) / Fetch result (select)
5. Close database
Workflow
61. USING DATABASES WITH PYTHON
SQLite
import sqlite3
conn = sqlite3.connect(‘example.db') # connect db
c = conn.cursor() # get cursor
# execute1
c.execute('''CREATE TABLE stocks
(date text, trans text, symbol text, qty real, price real)''')
# execute2
c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")
conn.commit() # commit
conn.close() # close
Workflow Example
63. USING DATABASES WITH PYTHON
Database Storage
import sqlite3
conn = sqlite3.connect(‘example.db') #store in disk
conn = sqlite3.connect(‘:memory:’) #store in memory
64. USING DATABASES WITH PYTHON
Execute
#execute
import sqlite3
conn = sqlite3.connect(‘example.db')
c = conn.cursor()
t = ('RHAT',)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
65. USING DATABASES WITH PYTHON
Execute
#executemany
import sqlite3
conn = sqlite3.connect(‘example.db')
c = conn.cursor()
purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
('2006-04-05', 'BUY', 'MSFT', 1000, 72.00),
('2006-04-06', 'SELL', 'IBM', 500, 53.00),]
c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)
66. USING DATABASES WITH PYTHON
fetch
#fetchaone
import sqlite3
conn = sqlite3.connect(‘example.db')
c = conn.cursor()
c.execute('SELECT * FROM stocks')
c.fetchone()
('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14)
67. USING DATABASES WITH PYTHON
fetch
#fetchall
import sqlite3
conn = sqlite3.connect(‘example.db')
c = conn.cursor()
c.execute('SELECT * FROM stocks')
for d in c.fetchall():
print(d)
[('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14),
('2006-03-28', 'BUY', 'IBM', 1000.0, 45.0),
('2006-04-05', 'BUY', 'MSFT', 1000.0, 72.0),
68. USING DATABASES WITH PYTHON
Context manager
import sqlite3
con = sqlite3.connect(":memory:")
con.execute("create table person (id integer primary key, firstname
varchar unique)")
#con.commit() is called automatically afterwards
with con:
con.execute("insert into person(firstname) values (?)", ("Joe"))
69. USING DATABASES WITH PYTHON
Read more :
https://docs.python.org/2/library/sqlite3.html
https://www.tutorialspoint.com/python/python_database_access.htm
70. USING DATABASES WITH PYTHON
Quiz#5 : Post DB
1. Register as Facebook Developer on
developers.facebook.com
2. Get information of last 10 hours post on the page
https://www.facebook.com/MorningNewsTV3
(post id, post datetime, #number like, current datetime)
3. design and create table to store posts
72. PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Processing : pandas
pip install pandas
high-performance, easy-to-use data structures and
data analysis tools
73. USING DATABASES WITH PYTHON
Pandas : Series
#create series with Array-like
import pandas as pd
from numpy.random import rand
s = pd.Series(rand(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)
a 0.690232
b 0.738294
c 0.153817
d 0.619822
e 0.4347
74. USING DATABASES WITH PYTHON
Pandas : Series
#create series with dictionary
import pandas as pd
from numpy.random import rand
d = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(d) #with dictionary
print(s)
a 0
b 1
c 2
dtype: float64
75. USING DATABASES WITH PYTHON
Pandas : Series
#create series with Scalar
import pandas as pd
from numpy.random import rand
s = pd.Series(5., index=['a', 'b', 'a', 'd', ‘a']) #index can duplicate
print(s[‘a’])
a 5
a 5
a 5
dtype: float64
76. USING DATABASES WITH PYTHON
Pandas : Series
#access series data
import pandas as pd
from numpy.random import rand
s = pd.Series(5., index=['a', 'b', 'a', 'd', ‘a']) #index can duplicate
print(s[0])
print(s[:3])
5.0
a 5
b 5
a 5
dtype: float64
77. USING DATABASES WITH PYTHON
Pandas : Series
#series operations
import pandas as pd
from numpy.random import rand
import numpy as np
s = pd.Series(rand(10)) #index can duplicate
s = s + 2
s = s * s
s = np.exp(s)
print(s)
0 187.735606
1 691.660752
2 60.129741
3 595.438606
4 769.479456
5 397.052123
6 4691.926483
7 1427.593520
8 180.001824
9 410.994395
dtype: float64
78. USING DATABASES WITH PYTHON
Pandas : Series
#series filtering
import pandas as pd
from numpy.random import rand
import numpy as np
s = pd.Series(rand(10)) #index can duplicate
s = s[s > 0.1]
print(s)
1 0.708700
2 0.910090
3 0.380613
6 0.692324
7 0.508440
8 0.763977
9 0.470675
dtype: float64
79. USING DATABASES WITH PYTHON
Pandas : Series
#series incomplete data
import pandas as pd
from numpy.random import rand
import numpy as np
s1 = pd.Series(rand(10))
s2 = pd.Series(rand(8))
s = s1 + s2
print(s)
0 0.813747
1 1.373839
2 1.569716
3 1.624887
4 1.515665
5 0.526779
6 1.544327
7 0.740962
8 NaN
9 NaN
dtype: float64
80. USING DATABASES WITH PYTHON
Pandas : Series
#create series with Array-like
import pandas as pd
from numpy.random import rand
s = pd.Series(rand(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)
a 0.690232
b 0.738294
c 0.153817
d 0.619822
e 0.4347
81. USING DATABASES WITH PYTHON
Pandas : DataFrame
2-dimensional labeled data
structure with columns
of potentially different types
82. USING DATABASES WITH PYTHON
Pandas : DataFrame
#create dataframe with dict
d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print(df)
one two
a 1 1
b 2 2
c 3 3
d NaN 4
83. USING DATABASES WITH PYTHON
Pandas : DataFrame
#create dataframe with dict list
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df)
one two
0 1 4
1 2 3
2 3 2
3 4 1
85. USING DATABASES WITH PYTHON
Pandas : DataFrame
#access dataframe row
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df.iloc[:3])
one two
0 1 4
1 2 3
2 3 2
86. USING DATABASES WITH PYTHON
Pandas : DataFrame
#add new column
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
df['three'] = [1,2,3,2]
print(df)
one two three
0 1 4 1
1 2 3 2
2 3 2 3
3 4 1 2
87. USING DATABASES WITH PYTHON
Pandas : DataFrame
#show data : head() and tail()
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
df['three'] = [1,2,3,2]
print(df.head())
print(df.tail())
one two three
0 1 4 1
1 2 3 2
2 3 2 3
3 4 1 2
88. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe summary
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df.describe())
89. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe function
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df.mean())
one 2.5
two 2.5
dtype: float64
90. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe function
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df.corr()) #calculate correlation
one two
one 1 -1
two -1 1
91. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe filtering
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df[(df[‘one’] > 1) & (df[‘one’] < 3)] )
one two
1 2 3
92. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe filtering with isin
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df[df[‘one’].isin([2,4])] )
one two
1 2 3
3 4 1
93. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe with row data
d = [ [1., 2., 3., 4.], [4., 3., 2., 1.]]
df = pd.DataFrame(d)
df.columns = ["one","two","three","four"]
print(df)
one two three four
0 1 2 3 4
1 4 3 2 1
94. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe sort values
d = [ [2., 1., 3., 4.], [1., 3., 2., 4.]]
df = pd.DataFrame(d)
df.columns = ["one","two","three","four"]
df = df.sort_values([“one”,”two”],ascending=[1,0])
print(df)
one two three four
0 2 1 3 4
1 1 3 2 4
95. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe from csv file
df = pd.read_csv(‘file.csv’)
print(df)
one two three
0 1 2 3
1 1 2 3
2 1 2 3
file.csv
one,two,three
1,2,3
1,2,3
1,2,3
96. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe from csv file, without header.
df = pd.read_csv(‘file.csv’,header=-1)
print(df)
0 1 2
0 1 2 3
1 1 2 3
2 1 2 3
file.csv
1,2,3
1,2,3
1,2,3
98. USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe from html, need to install lxml first (pip install lxml)
df = pd.read_html(‘https://simple.wikipedia.org/wiki/
List_of_U.S._states’)
print(df[0])
Abbreviation State Name Capital Became a State
1 AL Alabama Montgomery December 14, 1819
2 AK Alaska Juneau January 3, 1959
3 AZ Arizona Phoenix February 14, 1912
99. USING DATABASES WITH PYTHON
Quiz#6 : Data Exploration
1. Goto https://archive.ics.uci.edu/ml/datasets/Adult
to read data description
2. Parse data into pandas using read_csv() and set
columns name
3. Explore data to answer following questions,
- find number of person in each education level.
- find correlation and covariance between continue
fields
- Avg age of United-States population where income
>50K.
101. PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
pip install seaborn
visualization library based on matplotlib
102. PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : set inline plot for jupyter
%matplotlib inline
import numpy as np
import seaborn as sns
# Generate some sequential data
x = np.array(list("ABCDEFGHI"))
y1 = np.arange(1, 10)
sns.barplot(x, y1)
104. PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : set layout
%matplotlib inline
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
f,ax = plt.subplots(1,1,figsize=(10, 10))
sns.barplot(x=[1,2,3,4,5],y=[3,2,3,4,2])
106. PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : set layout
%matplotlib inline
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
f,ax = plt.subplots(2,2,figsize=(10, 10))
sns.barplot(x=[1,2,3,4,5],y=[3,2,3,4,2],ax=ax[0,0])
sns.distplot([3,2,3,4,2],ax=ax[0,1])
112. PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : plot types
http://seaborn.pydata.org/examples/index.html
113. USING DATABASES WITH PYTHON
Quiz#7 : Adult Plot
1. Goto https://archive.ics.uci.edu/ml/datasets/Adult
to read data description
2. Parse data into pandas using read_csv() and set
columns name
3. Plot five charts.