Russian Call Girls Hyderabad Indira 9907093804 Independent Escort Service Hyd...
Web scraping and healthcare
1. Web Scraping and Healthcare
Presented By:
Avanish Kumar Giri
BMSCE, Bangalore
2. Contents
• Introduction
• Research paper-1: Big data analytics in healthcare
• Research paper-2: Big data analytics in healthcare: promise and potential
• Research paper-3: Improving Healthcare Using Big Data Analytics
• Research paper-4: Big Data Analytics: Solution to healthcare
• Research paper-5: A Dive into Web Scraper World
• Research paper-6: Automated scraping of structured data records from
health discussion forums using semantic analysis
• Comparison of all the research papers
• Proposed Framework
• Conclusion
• References and Bibliography
3. Introduction
• The internet is a massive storehouse of health related information.
• But it is not freely available because
▫ Individual privacy issues
▫ Data leak
▫ Website restrictions
• So, we can target health discussion forums which are easily accessible to
scrape the data and analyze it for various purposes.
• Web scraping,
• Also known as web extraction or harvesting, is a technique to extract data from the
World Wide Web (WWW) and save it to a file system or database for later retrieval
or analysis.
4. Research Paper-1
• Big data analytics in healthcare: promise and potential
▫ The healthcare industry historically has generated large amounts of data,
driven by record keeping, compliance, regulatory requirements, and
patient care.
▫ While most data is stored in hard copy form, the current trend is toward
rapid digitization of these large amounts of data.
▫ Big data for U.S. healthcare will soon reach the zettabyte (1021 gigabytes)
scale.[1]
5. Research Paper-2
• Big data analytics in healthcare
▫ Image processing
Computed tomography (CT),
Magnetic resonance imaging (MRI),
X ray, molecular imaging, ultrasound, etc.
▫ Signal processing
▫ Data analytics in disease detection
▫ Data Analytics in medical diagnosis
6. Research Paper-3
• Improving Healthcare Using Big Data Analytics
▫ 50 Petabytes of data in the health care realm, predicted to grow to
25,000 Petabytes by 2020, reported by a new info-graphic from
Oracle.[2]
▫ Data analytics in public health research
With the wild expansion of public health information, we can use data
analytic technique to crawl and filter out varied types of public health
info data.
Hospital Information system (HIS), which includes electronic
medical record system (EMRS)
Laboratory Information system (LIS)
Radiology Information system (RIS)
Clinical decision support system (CDSS), etc.
7. Research Paper-4
• Web Scraping: Data Extraction from websites
▫ Web scraping, also known as web extraction or harvesting, is a technique
to extract data from the World Wide Web (WWW) and save it to a file
system or database for later retrieval or analysis.
▫ Scraping is mentioned as one of sources for big data collection. In the
definition is also mentioned another term – Web Crawler.
8. Fig 1: Web crawler Vs. Web Scraping
Source: https://www.quora.com/What-are-the-
biggest-differences-between-web-crawling-and-
web-scraping
9. Research Paper-4(cntd.)
• Available Frameworks
• Scrapy
▫ It is one of the advanced web scraping frameworks available.
▫ The Framework is written in Python.
• FMiner
▫ It combines visual configuration with scripting features.
10. Research Paper-5
• A Dive into Web Scraper World
▫ Is Web Scraping Legal?
This question is always left unanswered properly.
There are lots of different views of different people on the legal and
illegal aspects of Scraping the Web.
▫ Crawling policies
Selection policies- It states the pages to download
Re-Visits Policy- It states when to check for changes to the pages
▫ Designing a custom scrapper
A Web Scraper broadly composed of two parts:
Web crawler for crawling links
Data extractor from crawled link
11. Research Paper-6
• Automated scraping of structured data records from
health discussion forums using semantic analysis
▫ In the context of healthcare, web scraping is gaining foothold gradually
but qualitatively.
▫ Several factors have led to the use of web scraping in healthcare.
Too complex to be analysed by traditional techniques.
Web scraping along with data extraction can improve decision
making.
13. General Framework for data extraction[6]
Fig 2: General Framework of data extraction
Source:https://www.sciencedirect.com/science/article/pii/S2352914817302253
14. Comparison
Papers Process Advantages Disadvantages Future work
1. Big data concepts in
healthcare
Improved record
keeping and patient
care
Large amount of
data
Analyzing size of
the data
2. Image and signal
processing in big data
Easy processing of
reports
Domain knowledge
is important to
process such data
Many other types
of reports can be
processed
3. Data analytics in PHR Easy processing of
reports
Research knowledge
required
Easy definition of
reports
4. Web crawler and
available frameworks
Easy frameworks
are easily available
Illegal activities
could take place
Limitations in
frameworks
5. Legal aspects and
crawling policies
No proper legal
definition
Depends on the
website
Defining clear
legal structure
6. Scraping web
discussion forums
Easy to get data
from such forums
Website policies Less frequent
requests
15. Proposed Framework
• Fetching data from health discussion forums.
• Use of BeautifulSoup for parsing (Python Library)
• Store data in JSON Format
• Process data for decision making
16. Fetching the data
• Involves finding the endpoint - URL or URL’s
• Sending HTTP requests to the server
• Using requests library:
▫ import requests
data=requests.get(‘https://patient.info/forums/discuss/browse/anxiety-
disorders-70)
▫ html = data.content
17. Use BeautifulSoup for parsing
• Provides simple methods to-
▫ Search
▫ Navigate
▫ Select
• Export the data
▫ Database (relational or non-relational)
▫ CSV
▫ JSON
18. Impact of web scraping on healthcare
• Healthcare isn’t the sector that depends completely on the person-to-
person interactions.
• With the current system, where all options are data-centric, healthcare web
scraping can affect the lives, teach people, and generate awareness. Because
the people don’t rely on doctors as well as pharmacists anymore, the
healthcare web scraping can improve the lives by providing balanced
solutions.
• With data extraction and web scraping methods, the healthcare
organization may decrease the fraud attempts, the doctors can discover
effective cures and the best practices, as well as patients, can have better
and more affordable healthcare services.
19. Conclusion
• From the discussed points from all the research papers, we can conclude
that use of web scrapping in healthcare can improve traditional process of
decision making.
• Targeting web discussion forums to detect risks can be helpful for any
healthcare organization as well as the individual
• As Scraper opens up another world of retrieving information without the
use of API, and mostly it is anonymously accessed.
• But the people who are doing Scraping should take into account that they
are not breaking any kind of law which could make them liable for any
offence.
21. Bibliography
• Research Paper 1:
▫ Title: Big data analytics in healthcare: promise and potential
▫ Authors : Wullianallur Raghupathi and Viju Raghupathi
▫ https://link.springer.com/journal/13755
▫ Pages :1-10
• Research Paper 2:
▫ Title: Big Data Analytics in Healthcare
▫ Authors : Ashwin Belle, Raghuram Thiagarajan, S.M.Reza Soroushmehr and
Kayvan Najarian
▫ Biomedicine and Biotechnology, January 2015
▫ Pages :1-38
• Research Paper 3:
▫ Title: Big Data Analytics in Healthcare
▫ Author : Revanth Sonnati
▫ Acedemia
▫ Pages :1-5
22. Bibliography
• Research Paper 4:
▫ Title: Web Scraping: Data Extraction from websites
▫ Authors : Vojtech Draxl
▫ Science Direct
▫ Pages :1-38
• Research Paper 5:
▫ Title: A Dive into Web Scraper World
▫ Authors: Deepak Kumar Mahto, Lisha Singh
▫ 2016 International Conference on Computing for Sustainable Global
Development
▫ Pages :1-5
• Research Paper 6:
▫ Title: Automated scraping of structured data records from health discussion
forums using semantic analysis
▫ Authors: Umamageswari Baskaran, Kalpana Ramanujam
▫ www.elsevier.com/locate/imu
▫ Science Direct
▫ Pages :1-10