SlideShare a Scribd company logo
Web crawling and database tables
• We want to crawl/scrap web
pages and get the proper
content to build standartize
database tables.
What can we use?
Google Search Tools
* Google uses structured data that it finds on the web to
understand the content of the page, as well as to gather
information about the web and the world in general.
* Structured data is a standardized format for providing
information about a page and classifying the page content;
for example, on a recipe page, what are the ingredients,
the cooking time and temperature, the calories, and so on.
Google search tools
• https://schema.org
Schema.org is a collaboration
between Google, Microsoft,
Yahoo! and Yandex - large search
engines who will use this marked-
up data from web pages.
* schema.org provide a normalize
about property, type and
descriptions of structured data
tags.
• The Google Structured Data
Testing Tool is an easy and
useful tool for validating your
structured data, and in some
cases, previewing a feature in
Google Search.
https://search.google.com/structu
red-data/testing-tool/
@type
@id
url
name
image
dateModified
totalTime
recipeYield
recipeIngredient
recipeInstructions
recipeCategory
keywords
recipeCuisine
cookTime
prepTime
"recipeIngredient": [
"1 (15 ounce) package double crust ready-to-use pie
crust",
"6 cups thinly sliced, peeled apples (6 medium)",
"3/4 cup sugar", "2 tablespoons all-purpose flour",
"3/4 teaspoon ground cinnamon",
"1/4 teaspoon salt",
"1/8 teaspoon ground nutmeg",
"1 tablespoon lemon juice"
]
There are structured data format and property
examples for recipe.
Inspect of source code with The Google Structured Data Testing Tool
from the point of structured data
• Search results of ‘yemek tarif’ on Google.
First page websites (03.03.2020; 14:00);
1. Yemek.com
2. Lezzet.com.tr
3. Refikaninmutfagi.com
4. Nefisyemektarifleri.com
Inspect of this web page’s source code
** Common issue of ‘yemek.com, nefisyemektarifleri.com, lezzet.com.tr’ is there is
no match on the main page but run the (javascript) code before.
On source code page (ctrl-f);
https://yemek.com/ // no match ‘recipeIngredient’
https://yemek.com/tarif/narenciyeli-hashasli-kek/ // match ‘recipeIngredient’
Website Useful Structured Data
Yemek.com
+
Lezzet.com.tr
+
Nefisyemektarifleri.com
+
Refikaninmutfagi.com
-
** yemek.com, nefisyemektarifleri.com, lezzet.com.tr have useful structured
data.
We crawl/scrape this sites with same settings and send a json, csv file or
database.
** refikaninmutfagi.com has not useful structured data. We set a specific
crawl format for this site.
yemek.com lezzet.com.tr nefisyemektarifleri.com refikaninmutfagi.com
@type @type @type @type
@id name @id @id
url image url url
name description mainEntityOfPage inLanguage
image recipeYield name
image recipeIngredient name datePublished
image recipeInstructions headline dateModified
dateModified prepTime description description
totalTime cookTime datePublished isPartOf
recipeYield author dateModified
recipeIngredient aggregateRating url
recipeInstructions keywords mainEntityOfPage
recipeCategory nutrition recipeYield
keywords recipeCategory prepTime
recipeCuisine recipeCuisine cookTime
cookTime video totalTime
prepTime recipeIngredient
description ingredients
author recipeInstructions
aggregateRating author
nutrition aggregateRating
keywords
nutrition
recipeCategory
recipeCuisine
video
• We extract (schema.org) microdata using scrapy.
https://blog.scrapinghub.com/2014/06/18/extracting-schema-org-
microdata-using-scrapy-selectors-and-xpath
* Alternative ways to scrape websites (Schema.org Microdata, JSON
Linked Data, internal JavaScript variables, and XHRs).
https://blog.apify.com/web-scraping-in-2018-forget-html-use-xhrs-
metadata-or-javascript-variables-8167f252439c
• End to end scrapy tutorial part I-IV (2019 sep).
https://towardsdatascience.com/a-minimalist-end-to-end-scrapy-
tutorial-part-i-11e350bcdec0

More Related Content

Similar to Web crawling scraping

Integrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptx
Integrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptxIntegrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptx
Integrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptx
Begum Kaya
 
Seo Tip 5
Seo Tip 5Seo Tip 5
Seo Tip 5
Abhishek Mitra
 
SEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideSEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive Guide
Adam Audette
 
11 Actionable SEO Tips and Tricks You Can Use Today!
11 Actionable SEO Tips and Tricks You Can Use Today!11 Actionable SEO Tips and Tricks You Can Use Today!
11 Actionable SEO Tips and Tricks You Can Use Today!
Daniel Bianchini
 
Search engine
Search engineSearch engine
Search engine
Rishabh Agarwal
 
IRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search Results
IRJET Journal
 
What is Structured Data?
What is Structured Data?What is Structured Data?
What is Structured Data?
Abhishek Kumar
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptx
ScrbifPt
 
Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...
Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...
Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...
Jose Luis Hernando Sanz
 
Schema Tags In Seo
Schema Tags In SeoSchema Tags In Seo
Checking google index status at scale
Checking google index status at scaleChecking google index status at scale
Checking google index status at scale
Builtvisible
 
BITM3730 11-14.pptx
BITM3730 11-14.pptxBITM3730 11-14.pptx
BITM3730 11-14.pptx
MattMarino13
 
IST 561 Spring 2007--Session7, Sources of Information
IST 561 Spring 2007--Session7, Sources of InformationIST 561 Spring 2007--Session7, Sources of Information
IST 561 Spring 2007--Session7, Sources of Information
D.A. Garofalo
 
How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...
How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...
How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...
Tarun Gehani
 
Introduction to Microdata & Google Rich Snippets
Introduction to Microdata  & Google Rich SnippetsIntroduction to Microdata  & Google Rich Snippets
Introduction to Microdata & Google Rich Snippets
Plus91 Technologies Pvt. Ltd.
 
The Technical SEO Full Course how to do
The Technical SEO  Full Course  how to doThe Technical SEO  Full Course  how to do
The Technical SEO Full Course how to do
asadkhan888889990
 
SEO-HIGH TRAFFIC ROUTING
SEO-HIGH TRAFFIC ROUTINGSEO-HIGH TRAFFIC ROUTING
SEO-HIGH TRAFFIC ROUTING
BUDNET
 
Search Engine Optimization (SEO)
Search Engine Optimization (SEO)Search Engine Optimization (SEO)
Search Engine Optimization (SEO)
Christopher Mbinda
 
Chapter 8 part1
Chapter 8   part1Chapter 8   part1
Chapter 8 part1
application developer
 
Week10
Week10Week10
Week10
guruupuF
 

Similar to Web crawling scraping (20)

Integrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptx
Integrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptxIntegrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptx
Integrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptx
 
Seo Tip 5
Seo Tip 5Seo Tip 5
Seo Tip 5
 
SEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideSEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive Guide
 
11 Actionable SEO Tips and Tricks You Can Use Today!
11 Actionable SEO Tips and Tricks You Can Use Today!11 Actionable SEO Tips and Tricks You Can Use Today!
11 Actionable SEO Tips and Tricks You Can Use Today!
 
Search engine
Search engineSearch engine
Search engine
 
IRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search Results
 
What is Structured Data?
What is Structured Data?What is Structured Data?
What is Structured Data?
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptx
 
Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...
Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...
Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...
 
Schema Tags In Seo
Schema Tags In SeoSchema Tags In Seo
Schema Tags In Seo
 
Checking google index status at scale
Checking google index status at scaleChecking google index status at scale
Checking google index status at scale
 
BITM3730 11-14.pptx
BITM3730 11-14.pptxBITM3730 11-14.pptx
BITM3730 11-14.pptx
 
IST 561 Spring 2007--Session7, Sources of Information
IST 561 Spring 2007--Session7, Sources of InformationIST 561 Spring 2007--Session7, Sources of Information
IST 561 Spring 2007--Session7, Sources of Information
 
How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...
How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...
How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...
 
Introduction to Microdata & Google Rich Snippets
Introduction to Microdata  & Google Rich SnippetsIntroduction to Microdata  & Google Rich Snippets
Introduction to Microdata & Google Rich Snippets
 
The Technical SEO Full Course how to do
The Technical SEO  Full Course  how to doThe Technical SEO  Full Course  how to do
The Technical SEO Full Course how to do
 
SEO-HIGH TRAFFIC ROUTING
SEO-HIGH TRAFFIC ROUTINGSEO-HIGH TRAFFIC ROUTING
SEO-HIGH TRAFFIC ROUTING
 
Search Engine Optimization (SEO)
Search Engine Optimization (SEO)Search Engine Optimization (SEO)
Search Engine Optimization (SEO)
 
Chapter 8 part1
Chapter 8   part1Chapter 8   part1
Chapter 8 part1
 
Week10
Week10Week10
Week10
 

Recently uploaded

一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
vaishalijagtap12
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
Jhone kinadey
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
campbellclarkson
 
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
OnePlan Solutions
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
DevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps ServicesDevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps Services
seospiralmantra
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
Reetu63
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
ToXSL Technologies
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
kalichargn70th171
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
Luigi Fugaro
 
Liberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptxLiberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptx
Massimo Artizzu
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 

Recently uploaded (20)

一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
 
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
 
DevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps ServicesDevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps Services
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
 
Liberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptxLiberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptx
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 

Web crawling scraping

  • 1. Web crawling and database tables • We want to crawl/scrap web pages and get the proper content to build standartize database tables. What can we use?
  • 2. Google Search Tools * Google uses structured data that it finds on the web to understand the content of the page, as well as to gather information about the web and the world in general. * Structured data is a standardized format for providing information about a page and classifying the page content; for example, on a recipe page, what are the ingredients, the cooking time and temperature, the calories, and so on.
  • 3. Google search tools • https://schema.org Schema.org is a collaboration between Google, Microsoft, Yahoo! and Yandex - large search engines who will use this marked- up data from web pages. * schema.org provide a normalize about property, type and descriptions of structured data tags. • The Google Structured Data Testing Tool is an easy and useful tool for validating your structured data, and in some cases, previewing a feature in Google Search. https://search.google.com/structu red-data/testing-tool/
  • 4. @type @id url name image dateModified totalTime recipeYield recipeIngredient recipeInstructions recipeCategory keywords recipeCuisine cookTime prepTime "recipeIngredient": [ "1 (15 ounce) package double crust ready-to-use pie crust", "6 cups thinly sliced, peeled apples (6 medium)", "3/4 cup sugar", "2 tablespoons all-purpose flour", "3/4 teaspoon ground cinnamon", "1/4 teaspoon salt", "1/8 teaspoon ground nutmeg", "1 tablespoon lemon juice" ] There are structured data format and property examples for recipe.
  • 5. Inspect of source code with The Google Structured Data Testing Tool from the point of structured data • Search results of ‘yemek tarif’ on Google. First page websites (03.03.2020; 14:00); 1. Yemek.com 2. Lezzet.com.tr 3. Refikaninmutfagi.com 4. Nefisyemektarifleri.com
  • 6. Inspect of this web page’s source code ** Common issue of ‘yemek.com, nefisyemektarifleri.com, lezzet.com.tr’ is there is no match on the main page but run the (javascript) code before. On source code page (ctrl-f); https://yemek.com/ // no match ‘recipeIngredient’ https://yemek.com/tarif/narenciyeli-hashasli-kek/ // match ‘recipeIngredient’
  • 7. Website Useful Structured Data Yemek.com + Lezzet.com.tr + Nefisyemektarifleri.com + Refikaninmutfagi.com - ** yemek.com, nefisyemektarifleri.com, lezzet.com.tr have useful structured data. We crawl/scrape this sites with same settings and send a json, csv file or database. ** refikaninmutfagi.com has not useful structured data. We set a specific crawl format for this site.
  • 8. yemek.com lezzet.com.tr nefisyemektarifleri.com refikaninmutfagi.com @type @type @type @type @id name @id @id url image url url name description mainEntityOfPage inLanguage image recipeYield name image recipeIngredient name datePublished image recipeInstructions headline dateModified dateModified prepTime description description totalTime cookTime datePublished isPartOf recipeYield author dateModified recipeIngredient aggregateRating url recipeInstructions keywords mainEntityOfPage recipeCategory nutrition recipeYield keywords recipeCategory prepTime recipeCuisine recipeCuisine cookTime cookTime video totalTime prepTime recipeIngredient description ingredients author recipeInstructions aggregateRating author nutrition aggregateRating keywords nutrition recipeCategory recipeCuisine video
  • 9. • We extract (schema.org) microdata using scrapy. https://blog.scrapinghub.com/2014/06/18/extracting-schema-org- microdata-using-scrapy-selectors-and-xpath * Alternative ways to scrape websites (Schema.org Microdata, JSON Linked Data, internal JavaScript variables, and XHRs). https://blog.apify.com/web-scraping-in-2018-forget-html-use-xhrs- metadata-or-javascript-variables-8167f252439c • End to end scrapy tutorial part I-IV (2019 sep). https://towardsdatascience.com/a-minimalist-end-to-end-scrapy- tutorial-part-i-11e350bcdec0