SlideShare a Scribd company logo
1 of 23
Web Scraping and Healthcare
Presented By:
Avanish Kumar Giri
BMSCE, Bangalore
Contents
• Introduction
• Research paper-1: Big data analytics in healthcare
• Research paper-2: Big data analytics in healthcare: promise and potential
• Research paper-3: Improving Healthcare Using Big Data Analytics
• Research paper-4: Big Data Analytics: Solution to healthcare
• Research paper-5: A Dive into Web Scraper World
• Research paper-6: Automated scraping of structured data records from
health discussion forums using semantic analysis
• Comparison of all the research papers
• Proposed Framework
• Conclusion
• References and Bibliography
Introduction
• The internet is a massive storehouse of health related information.
• But it is not freely available because
▫ Individual privacy issues
▫ Data leak
▫ Website restrictions
• So, we can target health discussion forums which are easily accessible to
scrape the data and analyze it for various purposes.
• Web scraping,
• Also known as web extraction or harvesting, is a technique to extract data from the
World Wide Web (WWW) and save it to a file system or database for later retrieval
or analysis.
Research Paper-1
• Big data analytics in healthcare: promise and potential
▫ The healthcare industry historically has generated large amounts of data,
driven by record keeping, compliance, regulatory requirements, and
patient care.
▫ While most data is stored in hard copy form, the current trend is toward
rapid digitization of these large amounts of data.
▫ Big data for U.S. healthcare will soon reach the zettabyte (1021 gigabytes)
scale.[1]
Research Paper-2
• Big data analytics in healthcare
▫ Image processing
 Computed tomography (CT),
 Magnetic resonance imaging (MRI),
 X ray, molecular imaging, ultrasound, etc.
▫ Signal processing
▫ Data analytics in disease detection
▫ Data Analytics in medical diagnosis
Research Paper-3
• Improving Healthcare Using Big Data Analytics
▫ 50 Petabytes of data in the health care realm, predicted to grow to
25,000 Petabytes by 2020, reported by a new info-graphic from
Oracle.[2]
▫ Data analytics in public health research
 With the wild expansion of public health information, we can use data
analytic technique to crawl and filter out varied types of public health
info data.
 Hospital Information system (HIS), which includes electronic
medical record system (EMRS)
 Laboratory Information system (LIS)
 Radiology Information system (RIS)
 Clinical decision support system (CDSS), etc.
Research Paper-4
• Web Scraping: Data Extraction from websites
▫ Web scraping, also known as web extraction or harvesting, is a technique
to extract data from the World Wide Web (WWW) and save it to a file
system or database for later retrieval or analysis.
▫ Scraping is mentioned as one of sources for big data collection. In the
definition is also mentioned another term – Web Crawler.
Fig 1: Web crawler Vs. Web Scraping
Source: https://www.quora.com/What-are-the-
biggest-differences-between-web-crawling-and-
web-scraping
Research Paper-4(cntd.)
• Available Frameworks
• Scrapy
▫ It is one of the advanced web scraping frameworks available.
▫ The Framework is written in Python.
• FMiner
▫ It combines visual configuration with scripting features.
Research Paper-5
• A Dive into Web Scraper World
▫ Is Web Scraping Legal?
 This question is always left unanswered properly.
 There are lots of different views of different people on the legal and
illegal aspects of Scraping the Web.
▫ Crawling policies
 Selection policies- It states the pages to download
 Re-Visits Policy- It states when to check for changes to the pages
▫ Designing a custom scrapper
 A Web Scraper broadly composed of two parts:
 Web crawler for crawling links
 Data extractor from crawled link
Research Paper-6
• Automated scraping of structured data records from
health discussion forums using semantic analysis
▫ In the context of healthcare, web scraping is gaining foothold gradually
but qualitatively.
▫ Several factors have led to the use of web scraping in healthcare.
 Too complex to be analysed by traditional techniques.
 Web scraping along with data extraction can improve decision
making.
Source: https://patient.info/forums/discuss/browse/abdominal-disorders-3321
What is a health discussion forum
General Framework for data extraction[6]
Fig 2: General Framework of data extraction
Source:https://www.sciencedirect.com/science/article/pii/S2352914817302253
Comparison
Papers Process Advantages Disadvantages Future work
1. Big data concepts in
healthcare
Improved record
keeping and patient
care
Large amount of
data
Analyzing size of
the data
2. Image and signal
processing in big data
Easy processing of
reports
Domain knowledge
is important to
process such data
Many other types
of reports can be
processed
3. Data analytics in PHR Easy processing of
reports
Research knowledge
required
Easy definition of
reports
4. Web crawler and
available frameworks
Easy frameworks
are easily available
Illegal activities
could take place
Limitations in
frameworks
5. Legal aspects and
crawling policies
No proper legal
definition
Depends on the
website
Defining clear
legal structure
6. Scraping web
discussion forums
Easy to get data
from such forums
Website policies Less frequent
requests
Proposed Framework
• Fetching data from health discussion forums.
• Use of BeautifulSoup for parsing (Python Library)
• Store data in JSON Format
• Process data for decision making
Fetching the data
• Involves finding the endpoint - URL or URL’s
• Sending HTTP requests to the server
• Using requests library:
▫ import requests
data=requests.get(‘https://patient.info/forums/discuss/browse/anxiety-
disorders-70)
▫ html = data.content
Use BeautifulSoup for parsing
• Provides simple methods to-
▫ Search
▫ Navigate
▫ Select
• Export the data
▫ Database (relational or non-relational)
▫ CSV
▫ JSON
Impact of web scraping on healthcare
• Healthcare isn’t the sector that depends completely on the person-to-
person interactions.
• With the current system, where all options are data-centric, healthcare web
scraping can affect the lives, teach people, and generate awareness. Because
the people don’t rely on doctors as well as pharmacists anymore, the
healthcare web scraping can improve the lives by providing balanced
solutions.
• With data extraction and web scraping methods, the healthcare
organization may decrease the fraud attempts, the doctors can discover
effective cures and the best practices, as well as patients, can have better
and more affordable healthcare services.
Conclusion
• From the discussed points from all the research papers, we can conclude
that use of web scrapping in healthcare can improve traditional process of
decision making.
• Targeting web discussion forums to detect risks can be helpful for any
healthcare organization as well as the individual
• As Scraper opens up another world of retrieving information without the
use of API, and mostly it is anonymously accessed.
• But the people who are doing Scraping should take into account that they
are not breaking any kind of law which could make them liable for any
offence.
References
1. https://link.springer.com/journal/13755
2. https://www.3idatascraping.com/web-scraping-for-healthcare-companies.php
3. https://www.elsevier.com/locate/imu
4. https://patient.info/forums/discuss/browse/abdominal-disorders-3321
5. https://www.quora.com/What-are-the-biggest-differences-between-web-crawling-
and-web-scraping
6. https://www.sciencedirect.com/science/article/pii/S2352914817302253
Bibliography
• Research Paper 1:
▫ Title: Big data analytics in healthcare: promise and potential
▫ Authors : Wullianallur Raghupathi and Viju Raghupathi
▫ https://link.springer.com/journal/13755
▫ Pages :1-10
• Research Paper 2:
▫ Title: Big Data Analytics in Healthcare
▫ Authors : Ashwin Belle, Raghuram Thiagarajan, S.M.Reza Soroushmehr and
Kayvan Najarian
▫ Biomedicine and Biotechnology, January 2015
▫ Pages :1-38
• Research Paper 3:
▫ Title: Big Data Analytics in Healthcare
▫ Author : Revanth Sonnati
▫ Acedemia
▫ Pages :1-5
Bibliography
• Research Paper 4:
▫ Title: Web Scraping: Data Extraction from websites
▫ Authors : Vojtech Draxl
▫ Science Direct
▫ Pages :1-38
• Research Paper 5:
▫ Title: A Dive into Web Scraper World
▫ Authors: Deepak Kumar Mahto, Lisha Singh
▫ 2016 International Conference on Computing for Sustainable Global
Development
▫ Pages :1-5
• Research Paper 6:
▫ Title: Automated scraping of structured data records from health discussion
forums using semantic analysis
▫ Authors: Umamageswari Baskaran, Kalpana Ramanujam
▫ www.elsevier.com/locate/imu
▫ Science Direct
▫ Pages :1-10
Thank You

More Related Content

Similar to Web scraping and healthcare

A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchRobert Grossman
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Introduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia ResearchIntroduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia ResearchDavid De Roure
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...Christopher Hart
 
Big Data Mining Methods in Medical Applications [Autosaved].pptx
Big Data Mining Methods in Medical Applications [Autosaved].pptxBig Data Mining Methods in Medical Applications [Autosaved].pptx
Big Data Mining Methods in Medical Applications [Autosaved].pptxHemaSenthil5
 
Electronic Data Capture (EDC) Systems: Streamlining Data Collection and Manag...
Electronic Data Capture (EDC) Systems: Streamlining Data Collection and Manag...Electronic Data Capture (EDC) Systems: Streamlining Data Collection and Manag...
Electronic Data Capture (EDC) Systems: Streamlining Data Collection and Manag...ClinosolIndia
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applicationsPadma Metta
 
Hospital Cloud Forum - thoughts for panel
Hospital Cloud Forum - thoughts for panelHospital Cloud Forum - thoughts for panel
Hospital Cloud Forum - thoughts for panelKent State University
 
From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipICPSR
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET
 
The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...Louise Corti
 

Similar to Web scraping and healthcare (20)

Innovative project1
Innovative project1Innovative project1
Innovative project1
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
Shifting the goal post – from high impact journals to high impact data
 Shifting the goal post – from high impact journals to high impact data Shifting the goal post – from high impact journals to high impact data
Shifting the goal post – from high impact journals to high impact data
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Preparing Research Data for Sharing
Preparing Research Data for SharingPreparing Research Data for Sharing
Preparing Research Data for Sharing
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
 
BLC & Digital Science: Mark Hahnel, Figshare
BLC & Digital Science: Mark Hahnel, FigshareBLC & Digital Science: Mark Hahnel, Figshare
BLC & Digital Science: Mark Hahnel, Figshare
 
Introduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia ResearchIntroduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia Research
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
 
Big Data Mining Methods in Medical Applications [Autosaved].pptx
Big Data Mining Methods in Medical Applications [Autosaved].pptxBig Data Mining Methods in Medical Applications [Autosaved].pptx
Big Data Mining Methods in Medical Applications [Autosaved].pptx
 
Harbinger Tech Session in cloud Expo 2015- Harnessing the power of linked ope...
Harbinger Tech Session in cloud Expo 2015- Harnessing the power of linked ope...Harbinger Tech Session in cloud Expo 2015- Harnessing the power of linked ope...
Harbinger Tech Session in cloud Expo 2015- Harnessing the power of linked ope...
 
Electronic Data Capture (EDC) Systems: Streamlining Data Collection and Manag...
Electronic Data Capture (EDC) Systems: Streamlining Data Collection and Manag...Electronic Data Capture (EDC) Systems: Streamlining Data Collection and Manag...
Electronic Data Capture (EDC) Systems: Streamlining Data Collection and Manag...
 
Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use caseEnabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Hospital Cloud Forum - thoughts for panel
Hospital Cloud Forum - thoughts for panelHospital Cloud Forum - thoughts for panel
Hospital Cloud Forum - thoughts for panel
 
Barbara Bierer, "Clinical Trial Data Sharing"
Barbara Bierer, "Clinical Trial Data Sharing"Barbara Bierer, "Clinical Trial Data Sharing"
Barbara Bierer, "Clinical Trial Data Sharing"
 
Information Security Forum (ISF) Congress 2013
Information Security Forum (ISF) Congress 2013 Information Security Forum (ISF) Congress 2013
Information Security Forum (ISF) Congress 2013
 
From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data Stewardship
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 
The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...
 

Recently uploaded

College Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
College Call Girls Mumbai Alia 9910780858 Independent Escort Service MumbaiCollege Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
College Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbaisonalikaur4
 
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...delhimodelshub1
 
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...High Profile Call Girls Chandigarh Aarushi
 
Call Girls in Hyderabad Lavanya 9907093804 Independent Escort Service Hyderabad
Call Girls in Hyderabad Lavanya 9907093804 Independent Escort Service HyderabadCall Girls in Hyderabad Lavanya 9907093804 Independent Escort Service Hyderabad
Call Girls in Hyderabad Lavanya 9907093804 Independent Escort Service Hyderabaddelhimodelshub1
 
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service HyderabadCall Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabaddelhimodelshub1
 
indian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana Tulsi
indian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana Tulsiindian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana Tulsi
indian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana TulsiHigh Profile Call Girls Chandigarh Aarushi
 
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service DehradunDehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service DehradunNiamh verma
 
Call Girls Gurgaon Parul 9711199012 Independent Escort Service Gurgaon
Call Girls Gurgaon Parul 9711199012 Independent Escort Service GurgaonCall Girls Gurgaon Parul 9711199012 Independent Escort Service Gurgaon
Call Girls Gurgaon Parul 9711199012 Independent Escort Service GurgaonCall Girls Service Gurgaon
 
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service GoaRussian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goanarwatsonia7
 
Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...
Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...
Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...High Profile Call Girls Chandigarh Aarushi
 
Call Girls Secunderabad 7001305949 all area service COD available Any Time
Call Girls Secunderabad 7001305949 all area service COD available Any TimeCall Girls Secunderabad 7001305949 all area service COD available Any Time
Call Girls Secunderabad 7001305949 all area service COD available Any Timedelhimodelshub1
 
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...ggsonu500
 
Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...
Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...
Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...High Profile Call Girls Chandigarh Aarushi
 
VIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service Hyderabad
VIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service HyderabadVIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service Hyderabad
VIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service Hyderabaddelhimodelshub1
 
Russian Call Girls Hyderabad Indira 9907093804 Independent Escort Service Hyd...
Russian Call Girls Hyderabad Indira 9907093804 Independent Escort Service Hyd...Russian Call Girls Hyderabad Indira 9907093804 Independent Escort Service Hyd...
Russian Call Girls Hyderabad Indira 9907093804 Independent Escort Service Hyd...delhimodelshub1
 

Recently uploaded (20)

College Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
College Call Girls Mumbai Alia 9910780858 Independent Escort Service MumbaiCollege Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
College Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
 
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
 
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
 
Call Girl Guwahati Aashi 👉 7001305949 👈 🔝 Independent Escort Service Guwahati
Call Girl Guwahati Aashi 👉 7001305949 👈 🔝 Independent Escort Service GuwahatiCall Girl Guwahati Aashi 👉 7001305949 👈 🔝 Independent Escort Service Guwahati
Call Girl Guwahati Aashi 👉 7001305949 👈 🔝 Independent Escort Service Guwahati
 
Call Girls in Hyderabad Lavanya 9907093804 Independent Escort Service Hyderabad
Call Girls in Hyderabad Lavanya 9907093804 Independent Escort Service HyderabadCall Girls in Hyderabad Lavanya 9907093804 Independent Escort Service Hyderabad
Call Girls in Hyderabad Lavanya 9907093804 Independent Escort Service Hyderabad
 
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service HyderabadCall Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
 
indian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana Tulsi
indian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana Tulsiindian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana Tulsi
indian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana Tulsi
 
Model Call Girl in Subhash Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Subhash Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Subhash Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Subhash Nagar Delhi reach out to us at 🔝9953056974🔝
 
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service DehradunDehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
 
Call Girls Gurgaon Parul 9711199012 Independent Escort Service Gurgaon
Call Girls Gurgaon Parul 9711199012 Independent Escort Service GurgaonCall Girls Gurgaon Parul 9711199012 Independent Escort Service Gurgaon
Call Girls Gurgaon Parul 9711199012 Independent Escort Service Gurgaon
 
VIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service Lucknow
VIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service LucknowVIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service Lucknow
VIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service Lucknow
 
College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...
College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...
College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...
 
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service GoaRussian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
 
Call Girl Lucknow Gauri 🔝 8923113531 🔝 🎶 Independent Escort Service Lucknow
Call Girl Lucknow Gauri 🔝 8923113531  🔝 🎶 Independent Escort Service LucknowCall Girl Lucknow Gauri 🔝 8923113531  🔝 🎶 Independent Escort Service Lucknow
Call Girl Lucknow Gauri 🔝 8923113531 🔝 🎶 Independent Escort Service Lucknow
 
Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...
Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...
Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...
 
Call Girls Secunderabad 7001305949 all area service COD available Any Time
Call Girls Secunderabad 7001305949 all area service COD available Any TimeCall Girls Secunderabad 7001305949 all area service COD available Any Time
Call Girls Secunderabad 7001305949 all area service COD available Any Time
 
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
 
Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...
Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...
Call Girl Chandigarh Mallika ❤️🍑 9907093804 👄🫦 Independent Escort Service Cha...
 
VIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service Hyderabad
VIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service HyderabadVIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service Hyderabad
VIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service Hyderabad
 
Russian Call Girls Hyderabad Indira 9907093804 Independent Escort Service Hyd...
Russian Call Girls Hyderabad Indira 9907093804 Independent Escort Service Hyd...Russian Call Girls Hyderabad Indira 9907093804 Independent Escort Service Hyd...
Russian Call Girls Hyderabad Indira 9907093804 Independent Escort Service Hyd...
 

Web scraping and healthcare

  • 1. Web Scraping and Healthcare Presented By: Avanish Kumar Giri BMSCE, Bangalore
  • 2. Contents • Introduction • Research paper-1: Big data analytics in healthcare • Research paper-2: Big data analytics in healthcare: promise and potential • Research paper-3: Improving Healthcare Using Big Data Analytics • Research paper-4: Big Data Analytics: Solution to healthcare • Research paper-5: A Dive into Web Scraper World • Research paper-6: Automated scraping of structured data records from health discussion forums using semantic analysis • Comparison of all the research papers • Proposed Framework • Conclusion • References and Bibliography
  • 3. Introduction • The internet is a massive storehouse of health related information. • But it is not freely available because ▫ Individual privacy issues ▫ Data leak ▫ Website restrictions • So, we can target health discussion forums which are easily accessible to scrape the data and analyze it for various purposes. • Web scraping, • Also known as web extraction or harvesting, is a technique to extract data from the World Wide Web (WWW) and save it to a file system or database for later retrieval or analysis.
  • 4. Research Paper-1 • Big data analytics in healthcare: promise and potential ▫ The healthcare industry historically has generated large amounts of data, driven by record keeping, compliance, regulatory requirements, and patient care. ▫ While most data is stored in hard copy form, the current trend is toward rapid digitization of these large amounts of data. ▫ Big data for U.S. healthcare will soon reach the zettabyte (1021 gigabytes) scale.[1]
  • 5. Research Paper-2 • Big data analytics in healthcare ▫ Image processing  Computed tomography (CT),  Magnetic resonance imaging (MRI),  X ray, molecular imaging, ultrasound, etc. ▫ Signal processing ▫ Data analytics in disease detection ▫ Data Analytics in medical diagnosis
  • 6. Research Paper-3 • Improving Healthcare Using Big Data Analytics ▫ 50 Petabytes of data in the health care realm, predicted to grow to 25,000 Petabytes by 2020, reported by a new info-graphic from Oracle.[2] ▫ Data analytics in public health research  With the wild expansion of public health information, we can use data analytic technique to crawl and filter out varied types of public health info data.  Hospital Information system (HIS), which includes electronic medical record system (EMRS)  Laboratory Information system (LIS)  Radiology Information system (RIS)  Clinical decision support system (CDSS), etc.
  • 7. Research Paper-4 • Web Scraping: Data Extraction from websites ▫ Web scraping, also known as web extraction or harvesting, is a technique to extract data from the World Wide Web (WWW) and save it to a file system or database for later retrieval or analysis. ▫ Scraping is mentioned as one of sources for big data collection. In the definition is also mentioned another term – Web Crawler.
  • 8. Fig 1: Web crawler Vs. Web Scraping Source: https://www.quora.com/What-are-the- biggest-differences-between-web-crawling-and- web-scraping
  • 9. Research Paper-4(cntd.) • Available Frameworks • Scrapy ▫ It is one of the advanced web scraping frameworks available. ▫ The Framework is written in Python. • FMiner ▫ It combines visual configuration with scripting features.
  • 10. Research Paper-5 • A Dive into Web Scraper World ▫ Is Web Scraping Legal?  This question is always left unanswered properly.  There are lots of different views of different people on the legal and illegal aspects of Scraping the Web. ▫ Crawling policies  Selection policies- It states the pages to download  Re-Visits Policy- It states when to check for changes to the pages ▫ Designing a custom scrapper  A Web Scraper broadly composed of two parts:  Web crawler for crawling links  Data extractor from crawled link
  • 11. Research Paper-6 • Automated scraping of structured data records from health discussion forums using semantic analysis ▫ In the context of healthcare, web scraping is gaining foothold gradually but qualitatively. ▫ Several factors have led to the use of web scraping in healthcare.  Too complex to be analysed by traditional techniques.  Web scraping along with data extraction can improve decision making.
  • 13. General Framework for data extraction[6] Fig 2: General Framework of data extraction Source:https://www.sciencedirect.com/science/article/pii/S2352914817302253
  • 14. Comparison Papers Process Advantages Disadvantages Future work 1. Big data concepts in healthcare Improved record keeping and patient care Large amount of data Analyzing size of the data 2. Image and signal processing in big data Easy processing of reports Domain knowledge is important to process such data Many other types of reports can be processed 3. Data analytics in PHR Easy processing of reports Research knowledge required Easy definition of reports 4. Web crawler and available frameworks Easy frameworks are easily available Illegal activities could take place Limitations in frameworks 5. Legal aspects and crawling policies No proper legal definition Depends on the website Defining clear legal structure 6. Scraping web discussion forums Easy to get data from such forums Website policies Less frequent requests
  • 15. Proposed Framework • Fetching data from health discussion forums. • Use of BeautifulSoup for parsing (Python Library) • Store data in JSON Format • Process data for decision making
  • 16. Fetching the data • Involves finding the endpoint - URL or URL’s • Sending HTTP requests to the server • Using requests library: ▫ import requests data=requests.get(‘https://patient.info/forums/discuss/browse/anxiety- disorders-70) ▫ html = data.content
  • 17. Use BeautifulSoup for parsing • Provides simple methods to- ▫ Search ▫ Navigate ▫ Select • Export the data ▫ Database (relational or non-relational) ▫ CSV ▫ JSON
  • 18. Impact of web scraping on healthcare • Healthcare isn’t the sector that depends completely on the person-to- person interactions. • With the current system, where all options are data-centric, healthcare web scraping can affect the lives, teach people, and generate awareness. Because the people don’t rely on doctors as well as pharmacists anymore, the healthcare web scraping can improve the lives by providing balanced solutions. • With data extraction and web scraping methods, the healthcare organization may decrease the fraud attempts, the doctors can discover effective cures and the best practices, as well as patients, can have better and more affordable healthcare services.
  • 19. Conclusion • From the discussed points from all the research papers, we can conclude that use of web scrapping in healthcare can improve traditional process of decision making. • Targeting web discussion forums to detect risks can be helpful for any healthcare organization as well as the individual • As Scraper opens up another world of retrieving information without the use of API, and mostly it is anonymously accessed. • But the people who are doing Scraping should take into account that they are not breaking any kind of law which could make them liable for any offence.
  • 20. References 1. https://link.springer.com/journal/13755 2. https://www.3idatascraping.com/web-scraping-for-healthcare-companies.php 3. https://www.elsevier.com/locate/imu 4. https://patient.info/forums/discuss/browse/abdominal-disorders-3321 5. https://www.quora.com/What-are-the-biggest-differences-between-web-crawling- and-web-scraping 6. https://www.sciencedirect.com/science/article/pii/S2352914817302253
  • 21. Bibliography • Research Paper 1: ▫ Title: Big data analytics in healthcare: promise and potential ▫ Authors : Wullianallur Raghupathi and Viju Raghupathi ▫ https://link.springer.com/journal/13755 ▫ Pages :1-10 • Research Paper 2: ▫ Title: Big Data Analytics in Healthcare ▫ Authors : Ashwin Belle, Raghuram Thiagarajan, S.M.Reza Soroushmehr and Kayvan Najarian ▫ Biomedicine and Biotechnology, January 2015 ▫ Pages :1-38 • Research Paper 3: ▫ Title: Big Data Analytics in Healthcare ▫ Author : Revanth Sonnati ▫ Acedemia ▫ Pages :1-5
  • 22. Bibliography • Research Paper 4: ▫ Title: Web Scraping: Data Extraction from websites ▫ Authors : Vojtech Draxl ▫ Science Direct ▫ Pages :1-38 • Research Paper 5: ▫ Title: A Dive into Web Scraper World ▫ Authors: Deepak Kumar Mahto, Lisha Singh ▫ 2016 International Conference on Computing for Sustainable Global Development ▫ Pages :1-5 • Research Paper 6: ▫ Title: Automated scraping of structured data records from health discussion forums using semantic analysis ▫ Authors: Umamageswari Baskaran, Kalpana Ramanujam ▫ www.elsevier.com/locate/imu ▫ Science Direct ▫ Pages :1-10