SlideShare a Scribd company logo
1 of 21
DATA COLLECTION FOR
SOCIAL MEDIA PLATFORMS
Eng/ Mahmoud Yasser Hammam
WHAT IS SOCIAL MEDIA API
• API stands for application programming interface. A social media API is a piece of
code that allows social media networks to integrate with third-party apps and tools
HOW DO SOCIAL MEDIA APIS WORK?
• A social media API works by connecting social media platforms with external tools and apps.
It gives external developers access to certain kinds of data that social media-related tools
require to work.
• All popular social media networks have APIs that developers can use to create social media
management tools. You can dig into details on each network’s site for developers:
• Instagram APIs
• Facebook APIs
• YouTube APIs
• Twitter APIs
• LinkedIn APIs
• Pinterest API
• TikTok APIs
WHAT ARE THE DIFFERENT TYPES OF
SOCIAL MEDIA APIS? 1- OPEN APIS
• Open APIs are publicly available interfaces. These are also sometimes called public
APIs or a free social media API.
• Generally, these don’t provide access to proprietary or copyrighted data. Instead,
they are designed to help developers make use of data that is publicly available.
PARTNER APIS
• Partner APIs are only available to approved business partners. Before developers get
access to the API, they need to apply for approval. Then, they are granted a specific
type of access in the form of a license or a rights agreement.
• Since these APIs provide access to data that is not publicly available, they are much
more limited in use. They are usually restricted to performing one specific task.
Authentication is usually required in the form of an access token.
INTERNAL APIS
• Internal APIs are used to help different systems within one social network work
better together. They provide access to backend data for developers who either
work for the company or have been contracted by the company. These are not
accessible to outside developers. These are also sometimes called private APIs.
1. SOCIAL MEDIA APIS POWER
INTERACTIVE CHATBOTS
• APIs are the connecting force that allow personalized, interactive, and AI-powered
chatbots to run on social media platforms. Have you ever interacted with a chatbot
on Facebook Messenger? That conversation was made possible by the Facebook
Messenger API.
2.SOCIAL MEDIA APIS FROM DIFFERENT
PLATFORMS CAN BE USED TOGETHER
• Each social network has its own API(s). But developers can use those APIs in
combination to create tools that provide functionality for multiple social networks.
This makes life much easier for anyone who has more than one social media
account.
3. SOCIAL MEDIA APIS HAVE GENERALLY
BEEN FREE—BUT TWITTER RECENTLY
CHANGED THAT
4.SOCIAL MEDIA APIS COULD HELP
PREVENT CHRONIC DISEASE OR
IMPROVE DISASTER RESPONSE
• Social media APIs provide valuable information for researchers that can be used for the public good—
such as preventing chronic disease or getting early warning about natural disasters.
• For example, the World Health Organization created a pilot project to gather COVID-19 data from
public online conversations, including through social media monitoring API. Other researchers analyzed
data related to nicotine poisoning using TikTok hashtags and APIs.
WHAT IS WEB SCRAPPING
• Web scraping is one of the most effective methods to collect qualitative data
because it automates most of the process. Instead of manually navigating multiple
web pages and copying the required information, web scraping tools extract and
organize data in a structured and usable format.
DEVELOPING A WEB SCRAPING
STRATEGY FOR QUALITATIVE DATA
COLLECTION
1- Know your
objective and the
data you need
2- Identify data
sources (blogs)
3- Select the right
web scraping tool
such as seleniumn
4- Design scrapping
logic
5- Data Cleaning and
preprocessing
6- Analyzing the
Data
WEB SCRAPPING VS. APIS
In terms of data extraction, web scraping is typically
automated, while APIs can involve manual or
automated data retrieval.
The data format obtained through web scraping varies,
often requiring additional processing, whereas APIs
provide structured data.
OVERVIEW OF WEB SCRAPING
TOOLS
Scrapping tools
Automated Web Scraping
Software: Tools like Octoparse and
ParseHub offer a user-friendly
interface for non-technical users to
extract web data.
Programming Libraries: Python
libraries such as BeautifulSoup and
Scrapy are popular among developers
for custom web scraping tasks.
Cloud-Based Web Scraping Services:
Platforms like PromptCloud provide end-to-
end managed web scraping services, ideal
for large-scale and complex data extraction
needs.
WEB SCRAPING TECHNIQUES
HTML Parsing
Description: This is the most fundamental technique, where scrapers parse HTML code to extract data. Tools like
BeautifulSoup in Python are used to navigate the structure of HTML and extract relevant information.
Use Case: Ideal for scraping static websites where data is embedded directly in the HTML.
AJAX and JavaScript Rendering
Description: Many modern websites load their content dynamically using AJAX and JavaScript. Scraping these
sites requires tools that can execute JavaScript and retrieve data loaded asynchronously.
Use Case: Useful for extracting data from web applications and sites that rely heavily on JavaScript for content
rendering.
WEB SCRAPING TECHNIQUES
Handling Pagination and Infinite Scroll
• Description: Techniques to navigate through multiple pages of content, either by
following pagination links or handling infinite scroll functionalities.
• Use Case: Essential for e-commerce sites, online directories, or any site where content
spans across several pages.
Captcha Solving and Proxy Rotation
• Description: Advanced techniques involving the use of proxy servers to mask scraping
activities and algorithms to solve CAPTCHAs, allowing the scraper to mimic human
browsing behavior and avoid detection.
• Use Case: Necessary for scraping websites with strict anti-bot measures.
WEB SCRAPING TECHNIQUES
Headless Browsers
• Description: Tools like Selenium or Puppeteer use headless browsers to interact with
web pages programmatically, simulating human browsing patterns, including clicking
and scrolling.
• Use Case: Ideal for complex scraping tasks where direct HTML parsing is insufficient,
especially in websites requiring user interaction.
API Extraction
• Description: Extracting data by making requests to public or private APIs, often
returning data in a structured format like JSON or XML.
• Use Case: Effective for social media platforms, mobile applications, or any service
offering a data API.
WEB SCRAPING TECHNIQUES
Regular Expressions (Regex)
• Description: Using pattern matching to extract specific text or data points from a larger text
corpus.
• Use Case: Useful for extracting specific information like phone numbers, email addresses, or
any standardized data format.
• Each of these techniques addresses specific challenges in web scraping, ranging from basic
data extraction to navigating complex dynamic sites and evading anti-scraping
technologies. The choice of technique largely depends on the structure and complexity of
the target website.
PYTHON WEB SCRAPPING
1- BeautifulSoup:
- Used for parsing HTML and
XML documents.
Ideal for extracting data from
static websites.
- Works well with Python’s
built-in requests library to fetch
web page content.
2- Scrapy:
- An open-source and
collaborative web crawling
framework.
- Allows you to write rules to
extract the data from web
pages.
- Can handle more complex
and large-scale web scraping
tasks.
3- Selenium:
- Primarily used for automating
web applications for testing
purposes.
- Can be used for scraping
dynamic content that requires
interaction, like clicking buttons
or filling forms.
- Utilizes a real web browser,
enabling it to execute
JavaScript just like a regular
browser.
LAB 1
• Twitter Data Collection using Tweepy
Step 1: twitter developer account from here https://developer.twitter.com/
Step 2: install tweepy
Step 3: setup twitter api keys
Step 4: Authorize Tweepy
Step 5: Extract tweets
RESOURCES
• 1- https://blog.hootsuite.com/social-media-api/
• 2- https://scrapingrobot.com/blog/methods-to-collect-qualitative-data/
• 3- https://sjinnovation.com/best-way-extract-
data#:~:text=Key%20Differences%20between%20Web%20scraping,whereas%20API
s%20provide%20structured%20data.
• 4- https://www.promptcloud.com/blog/the-ultimate-guide-to-web-scraping-tools-
techniques-and-use-cases/

More Related Content

Similar to Data Collection from Social Media Platforms

What are the different types of web scraping approaches
What are the different types of web scraping approachesWhat are the different types of web scraping approaches
What are the different types of web scraping approachesAparna Sharma
 
Developing Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through AraportDeveloping Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through AraportMatthew Vaughn
 
ICAR 2015 Workshop - Matt Vaughn
ICAR 2015 Workshop - Matt VaughnICAR 2015 Workshop - Matt Vaughn
ICAR 2015 Workshop - Matt VaughnAraport
 
Top 13 web scraping tools in 2022
Top 13 web scraping tools in 2022Top 13 web scraping tools in 2022
Top 13 web scraping tools in 2022Aparna Sharma
 
Manage your ap is securely and easily ibm apim 4.0
Manage your ap is securely and easily ibm apim 4.0Manage your ap is securely and easily ibm apim 4.0
Manage your ap is securely and easily ibm apim 4.0sflynn073
 
Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...
Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...
Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...Techugo
 
Top 17 web scraping tools for data extraction in 2022
Top 17 web scraping tools for data extraction in 2022Top 17 web scraping tools for data extraction in 2022
Top 17 web scraping tools for data extraction in 2022Aparna Sharma
 
#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6Jack Carnes
 
API, Integration, and SOA Convergence
API, Integration, and SOA ConvergenceAPI, Integration, and SOA Convergence
API, Integration, and SOA ConvergenceKasun Indrasiri
 
Portal and Intranets
Portal and Intranets Portal and Intranets
Portal and Intranets Redar Ismail
 
What do you need to know before going in to Sri Lankan IT industry
What do you need to know before going in to Sri Lankan IT industryWhat do you need to know before going in to Sri Lankan IT industry
What do you need to know before going in to Sri Lankan IT industryAndun Sameera
 
WSO2Con Asia 2014 - Building the API-Centric Enterprise
WSO2Con Asia 2014 - Building the API-Centric EnterpriseWSO2Con Asia 2014 - Building the API-Centric Enterprise
WSO2Con Asia 2014 - Building the API-Centric EnterpriseWSO2
 
API Strategy Introduction
API Strategy IntroductionAPI Strategy Introduction
API Strategy IntroductionDoug Gregory
 
Online tools for Content Development
Online tools for Content DevelopmentOnline tools for Content Development
Online tools for Content Developmentadrianlaranjo111
 

Similar to Data Collection from Social Media Platforms (20)

What are the different types of web scraping approaches
What are the different types of web scraping approachesWhat are the different types of web scraping approaches
What are the different types of web scraping approaches
 
TEC-Roundtable-API
TEC-Roundtable-APITEC-Roundtable-API
TEC-Roundtable-API
 
Developing Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through AraportDeveloping Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through Araport
 
ICAR 2015 Workshop - Matt Vaughn
ICAR 2015 Workshop - Matt VaughnICAR 2015 Workshop - Matt Vaughn
ICAR 2015 Workshop - Matt Vaughn
 
Top 13 web scraping tools in 2022
Top 13 web scraping tools in 2022Top 13 web scraping tools in 2022
Top 13 web scraping tools in 2022
 
Manage your ap is securely and easily ibm apim 4.0
Manage your ap is securely and easily ibm apim 4.0Manage your ap is securely and easily ibm apim 4.0
Manage your ap is securely and easily ibm apim 4.0
 
Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...
Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...
Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...
 
Smartone v1.0
Smartone v1.0Smartone v1.0
Smartone v1.0
 
Top 17 web scraping tools for data extraction in 2022
Top 17 web scraping tools for data extraction in 2022Top 17 web scraping tools for data extraction in 2022
Top 17 web scraping tools for data extraction in 2022
 
#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6
 
Third party api integration
Third party api integrationThird party api integration
Third party api integration
 
API, Integration, and SOA Convergence
API, Integration, and SOA ConvergenceAPI, Integration, and SOA Convergence
API, Integration, and SOA Convergence
 
Portal and Intranets
Portal and Intranets Portal and Intranets
Portal and Intranets
 
What do you need to know before going in to Sri Lankan IT industry
What do you need to know before going in to Sri Lankan IT industryWhat do you need to know before going in to Sri Lankan IT industry
What do you need to know before going in to Sri Lankan IT industry
 
WSO2Con Asia 2014 - Building the API-Centric Enterprise
WSO2Con Asia 2014 - Building the API-Centric EnterpriseWSO2Con Asia 2014 - Building the API-Centric Enterprise
WSO2Con Asia 2014 - Building the API-Centric Enterprise
 
Implementation ofWeb Application for Disease Prediction Using AI
Implementation ofWeb Application for Disease Prediction Using AIImplementation ofWeb Application for Disease Prediction Using AI
Implementation ofWeb Application for Disease Prediction Using AI
 
API Strategy Introduction
API Strategy IntroductionAPI Strategy Introduction
API Strategy Introduction
 
Online tools for Content Development
Online tools for Content DevelopmentOnline tools for Content Development
Online tools for Content Development
 
Group 1 LinkedIn
Group 1 LinkedInGroup 1 LinkedIn
Group 1 LinkedIn
 
What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
 

Recently uploaded

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 

Recently uploaded (20)

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 

Data Collection from Social Media Platforms

  • 1. DATA COLLECTION FOR SOCIAL MEDIA PLATFORMS Eng/ Mahmoud Yasser Hammam
  • 2. WHAT IS SOCIAL MEDIA API • API stands for application programming interface. A social media API is a piece of code that allows social media networks to integrate with third-party apps and tools
  • 3. HOW DO SOCIAL MEDIA APIS WORK? • A social media API works by connecting social media platforms with external tools and apps. It gives external developers access to certain kinds of data that social media-related tools require to work. • All popular social media networks have APIs that developers can use to create social media management tools. You can dig into details on each network’s site for developers: • Instagram APIs • Facebook APIs • YouTube APIs • Twitter APIs • LinkedIn APIs • Pinterest API • TikTok APIs
  • 4. WHAT ARE THE DIFFERENT TYPES OF SOCIAL MEDIA APIS? 1- OPEN APIS • Open APIs are publicly available interfaces. These are also sometimes called public APIs or a free social media API. • Generally, these don’t provide access to proprietary or copyrighted data. Instead, they are designed to help developers make use of data that is publicly available.
  • 5. PARTNER APIS • Partner APIs are only available to approved business partners. Before developers get access to the API, they need to apply for approval. Then, they are granted a specific type of access in the form of a license or a rights agreement. • Since these APIs provide access to data that is not publicly available, they are much more limited in use. They are usually restricted to performing one specific task. Authentication is usually required in the form of an access token.
  • 6. INTERNAL APIS • Internal APIs are used to help different systems within one social network work better together. They provide access to backend data for developers who either work for the company or have been contracted by the company. These are not accessible to outside developers. These are also sometimes called private APIs.
  • 7. 1. SOCIAL MEDIA APIS POWER INTERACTIVE CHATBOTS • APIs are the connecting force that allow personalized, interactive, and AI-powered chatbots to run on social media platforms. Have you ever interacted with a chatbot on Facebook Messenger? That conversation was made possible by the Facebook Messenger API.
  • 8. 2.SOCIAL MEDIA APIS FROM DIFFERENT PLATFORMS CAN BE USED TOGETHER • Each social network has its own API(s). But developers can use those APIs in combination to create tools that provide functionality for multiple social networks. This makes life much easier for anyone who has more than one social media account.
  • 9. 3. SOCIAL MEDIA APIS HAVE GENERALLY BEEN FREE—BUT TWITTER RECENTLY CHANGED THAT
  • 10. 4.SOCIAL MEDIA APIS COULD HELP PREVENT CHRONIC DISEASE OR IMPROVE DISASTER RESPONSE • Social media APIs provide valuable information for researchers that can be used for the public good— such as preventing chronic disease or getting early warning about natural disasters. • For example, the World Health Organization created a pilot project to gather COVID-19 data from public online conversations, including through social media monitoring API. Other researchers analyzed data related to nicotine poisoning using TikTok hashtags and APIs.
  • 11. WHAT IS WEB SCRAPPING • Web scraping is one of the most effective methods to collect qualitative data because it automates most of the process. Instead of manually navigating multiple web pages and copying the required information, web scraping tools extract and organize data in a structured and usable format.
  • 12. DEVELOPING A WEB SCRAPING STRATEGY FOR QUALITATIVE DATA COLLECTION 1- Know your objective and the data you need 2- Identify data sources (blogs) 3- Select the right web scraping tool such as seleniumn 4- Design scrapping logic 5- Data Cleaning and preprocessing 6- Analyzing the Data
  • 13. WEB SCRAPPING VS. APIS In terms of data extraction, web scraping is typically automated, while APIs can involve manual or automated data retrieval. The data format obtained through web scraping varies, often requiring additional processing, whereas APIs provide structured data.
  • 14. OVERVIEW OF WEB SCRAPING TOOLS Scrapping tools Automated Web Scraping Software: Tools like Octoparse and ParseHub offer a user-friendly interface for non-technical users to extract web data. Programming Libraries: Python libraries such as BeautifulSoup and Scrapy are popular among developers for custom web scraping tasks. Cloud-Based Web Scraping Services: Platforms like PromptCloud provide end-to- end managed web scraping services, ideal for large-scale and complex data extraction needs.
  • 15. WEB SCRAPING TECHNIQUES HTML Parsing Description: This is the most fundamental technique, where scrapers parse HTML code to extract data. Tools like BeautifulSoup in Python are used to navigate the structure of HTML and extract relevant information. Use Case: Ideal for scraping static websites where data is embedded directly in the HTML. AJAX and JavaScript Rendering Description: Many modern websites load their content dynamically using AJAX and JavaScript. Scraping these sites requires tools that can execute JavaScript and retrieve data loaded asynchronously. Use Case: Useful for extracting data from web applications and sites that rely heavily on JavaScript for content rendering.
  • 16. WEB SCRAPING TECHNIQUES Handling Pagination and Infinite Scroll • Description: Techniques to navigate through multiple pages of content, either by following pagination links or handling infinite scroll functionalities. • Use Case: Essential for e-commerce sites, online directories, or any site where content spans across several pages. Captcha Solving and Proxy Rotation • Description: Advanced techniques involving the use of proxy servers to mask scraping activities and algorithms to solve CAPTCHAs, allowing the scraper to mimic human browsing behavior and avoid detection. • Use Case: Necessary for scraping websites with strict anti-bot measures.
  • 17. WEB SCRAPING TECHNIQUES Headless Browsers • Description: Tools like Selenium or Puppeteer use headless browsers to interact with web pages programmatically, simulating human browsing patterns, including clicking and scrolling. • Use Case: Ideal for complex scraping tasks where direct HTML parsing is insufficient, especially in websites requiring user interaction. API Extraction • Description: Extracting data by making requests to public or private APIs, often returning data in a structured format like JSON or XML. • Use Case: Effective for social media platforms, mobile applications, or any service offering a data API.
  • 18. WEB SCRAPING TECHNIQUES Regular Expressions (Regex) • Description: Using pattern matching to extract specific text or data points from a larger text corpus. • Use Case: Useful for extracting specific information like phone numbers, email addresses, or any standardized data format. • Each of these techniques addresses specific challenges in web scraping, ranging from basic data extraction to navigating complex dynamic sites and evading anti-scraping technologies. The choice of technique largely depends on the structure and complexity of the target website.
  • 19. PYTHON WEB SCRAPPING 1- BeautifulSoup: - Used for parsing HTML and XML documents. Ideal for extracting data from static websites. - Works well with Python’s built-in requests library to fetch web page content. 2- Scrapy: - An open-source and collaborative web crawling framework. - Allows you to write rules to extract the data from web pages. - Can handle more complex and large-scale web scraping tasks. 3- Selenium: - Primarily used for automating web applications for testing purposes. - Can be used for scraping dynamic content that requires interaction, like clicking buttons or filling forms. - Utilizes a real web browser, enabling it to execute JavaScript just like a regular browser.
  • 20. LAB 1 • Twitter Data Collection using Tweepy Step 1: twitter developer account from here https://developer.twitter.com/ Step 2: install tweepy Step 3: setup twitter api keys Step 4: Authorize Tweepy Step 5: Extract tweets
  • 21. RESOURCES • 1- https://blog.hootsuite.com/social-media-api/ • 2- https://scrapingrobot.com/blog/methods-to-collect-qualitative-data/ • 3- https://sjinnovation.com/best-way-extract- data#:~:text=Key%20Differences%20between%20Web%20scraping,whereas%20API s%20provide%20structured%20data. • 4- https://www.promptcloud.com/blog/the-ultimate-guide-to-web-scraping-tools- techniques-and-use-cases/