SlideShare a Scribd company logo
1 of 10
BIG DATA ANALYTICS FOR THE HEALTHCARE INDUSTRY:
IMPACT, APPLICATIONS, AND TOOLS
Submitted By,
Ajith M Jose
MCA 2022-2024
MCA2206
CERTIFICATE
INTRODUCTION
Various applications, devices, and geographical research activities generate data every day for weather
forecasting, weather prediction, disaster evaluation, crime detection, and the health industry, to name a few.
Big data is currently associated with core technologies and various enterprises, such as Google, Facebook,
and IBM, that extract valuable information from vast data collections. A new era of open communication in
healthcare has begun. Big data is being generated at an alarming rate in every industry, including healthcare,
for patient care, compliance, and various regulatory requirements. As the global population grows along with
the human lifespan, treatment delivery models are rapidly evolving, and some of the decisions underlying
these changes are data-driven. Healthcare investors are promised new ideas from big data, which is so-called
due to its volume, complexity, and breadth. Pharmaceutical industry experts and shareholders have begun to
routinely analyse big data to gain insight. These activities, however, are still in their early stages and must be
coordinated in order to address healthcare delivery issues and improve healthcare quality. Early systems for
big-data analytics in healthcare informatics were established in a variety of scenarios, such as the investigation
of patient characteristics and the determination of treatment cost and outcomes to pinpoint the best and most
cost-effective treatments. Health informatics is the study of healthcare information that combines healthcare
sciences, computing sciences, and information sciences. Health informatics is the acquisition, storage, and
retrieval of data in order to provide better results to healthcare providers. Data in the healthcare system is
distinguished by its heterogeneity and variety as a result of the integration of a wide range of biomedical data
sources, such as sensor data, imagery, gene arrays, laboratory tests, free text, and demographics. The majority
of data in the healthcare system (for example, doctor's notes, lab test results, and clinical data) is unstructured
and not stored electronically, so it exists only in hard copies, and its volume is rapidly increasing. Currently,
there is a strong emphasis on digitising these massive stores of hard-copy data. The revolutions in data size
are making it difficult to achieve this goal. The various terminologies and models developed to solve the
problems associated with big data focus on four issues known as the four Vs: volume, variety, velocity, and
veracity. Electronic Health Records (EHR), machine-generated/sensor data, health information exchanges,
patient registries, portals, genetic databases, and public records are some of the data classes used in healthcare
applications. In the healthcare industry, public records are significant sources of big data that require efficient
data analytics to resolve their associated healthcare problems. According to a 2012 survey, healthcare data
totaled nearly 550 petabytes and is expected to reach nearly 26000 petabytes by 2020. Given the diverse data
formats, massive volumes, and associated uncertainties in major data sources, transforming raw data into
actionable information is a daunting task. Because health features in medical data are so complex, identifying
them and selecting class attributes for health analytics requires highly sophisticated and architecturally specific
techniques and tools.
BIG DATA ANALYTICS IN HEALTH INFORMATICS
The execution of computer programming is the primary distinction between traditional health analysis
and big-data health analytics. In the past, the healthcare industry relied on other industries for extensive data
analysis. Many healthcare investors believe in information technology because it produces meaningful
results—their operating systems are functional and can process data into standardised formats. The healthcare
industry is currently confronted with the challenge of dealing with rapidly expanding big healthcare data. Big
data analytics is expanding and has the potential to provide valuable insights to the healthcare system. As
previously stated, the vast majority of the massive amounts of data generated by this system are saved in hard
copies that must then be digitised. Big data has the potential to improve healthcare delivery and lower costs
while also enabling advanced patient care, improving patient outcomes, and avoiding unnecessary costs. Big
data analytics is currently being used to predict the outcomes of physician decisions, such as the outcome of
a heart operation for a condition based on the patient's age, current situation, and health status. In essence, the
role of big data in the health sector is to manage data sets related to healthcare that are complex and difficult
to manage with current hardware, software, and management tools. Aside from the growing volume of
healthcare data, reimbursement methods are also evolving.
As a result, purposeful use and performance-based pay have emerged as critical factors in the healthcare sector.
In 2011, healthcare organisations generated more than 150 exabytes of data, all of which must be efficiently
analysed in order to be useful to the healthcare system. Healthcare-related data is stored in EHRs in a variety
of ways. A surge in data related to healthcare informatics has also been observed in bioinformatics, where
genomic sequencing generates many terabytes of data. There are numerous analytical techniques for
interpreting medical data, which can then be used to provide patient care. The disparities in the origins and
formats of big data pose a challenge to the healthcare informatics community in developing data processing
technologies. There is a high demand for a method that combines disparate data sources. Several conceptual
approaches can be used to detect irregularities in large amounts of data from various datasets. The following
frameworks are available for analysing healthcare data:
 Predictive Analytics in Healthcare: Predictive analysis has been recognised as a significant business
intelligence approach for the past two years, but its real-world applications extend far beyond the
business context. Extensive data analytics methodologies include text analytics and multimedia
analytics. However, one of the most important categories is predictive analytics, which includes
statistical approaches such as data mining and machine learning that analyse current and historical data
to forecast the future. Predictive methods are now used in hospitals to determine if a patient is at risk
of readmission. This data can help clinicians make critical patient care decisions. Predictive analysis
requires an understanding of and application of machine learning, which is commonly used in this
technique.
 Machine Learning in Healthcare: Machine learning is similar to data mining in that both processes
scan data for patterns. Unlike data mining applications, which extract data based on human
understanding, machine learning uses that data to improve the program's performance. Machine
learning recognises data patterns and modifies programme functions accordingly.
 Electronic Health Records: The most common health application of big data in healthcare is EHR.
Each patient has a medical record that includes information such as their medical history, allergies,
diagnosis, symptoms, and lab test results. A secure information system is used to share patient records
with healthcare providers in the public and private sectors. These files are customizable, allowing
doctors to make changes over time and add new medical test results without the need for paperwork
or data duplication.
FOUR Vs OF BIG DATA IN HEALTHCARE
Big data has four primary characteristics: volume, velocity, variety, and veracity.
 Volume: Big data refers to massive amounts of data collected. There is no set threshold for the
significance of this information. Typically, time is used when dealing with large amounts of data that
must be managed, stored, and analysed using traditional databases and data processing architecture.
Because of lower costs of data storage and processing architectures, as well as the need to extract
valuable insights from data to improve business processes, efficiencies, and consumer services, the
volume of data generated by modern IT and the healthcare system is increasing.
 Velocity: The primary reason for the exponential growth of data is velocity, which refers to how
quickly data is collected. Healthcare systems are producing data at ever-increasing rates. Because of
the volume and variety of structured or unstructured data collected, the velocity of data generation
after processing necessitates a decision based on its output.
 Variety: The term "variety" refers to the data's format, such as unstructured or structured text, medical
imagery, audio, video, and sensor data. Clinical data (patient record data) is an example of structured
data information that must be collected, stored, and processed by a specific device. Structured data
accounts for only 5% to 10% of healthcare data. Emails, photos, videos, audio, and other health-related
data such as hospital medical reports, physician's notes, paper prescriptions, and radiograph films are
examples of unstructured or semi-structured data.
 Veracity: The degree of assurance that the meaning of data is consistent is referred to as data integrity.
The level of data credibility and reliability varies between data sources. Big data analytics results must
be credible and error-free. Still, in healthcare, unsupervised machine learning algorithms are used by
automated machines to make decisions based on data that may be useless or misleading. Healthcare
analytics is in charge of gleaning useful insights from this data in order to treat patients and make the
best decisions possible.
IMPACT OF BIG DATA ON THE HEALTHCARE SYSTEM
Big data has the potential to revolutionise outcomes in terms of the most appropriate or accurate patient
diagnosis and the accuracy of information used in the health informatics system. As a result, analysing massive
amounts of data will have a significant impact on the medical services framework in five ways, or "pathways."
Improving patient outcomes along these pathways, as described below, will be the healthcare system's priority,
with a direct impact on the patient.
 Right Living: The term "right living" refers to the patient leading a better and healthier lifestyle.
Patients can manage themselves better by making the best decisions for themselves, based on the use
of information mining better choices and enhancing their well-being. Patients can take an active role
in leading a healthy lifestyle by selecting the best path for their daily health, which includes diet,
preventive care, exercise, and other daily activities.
 Right Care: This pathway ensures that patients receive the best available treatment and that all
providers use the same data and have the same goals to avoid planning and effort redundancy. In the
age of big data, this aspect has become more viable.
 Right Provider: By combining data from various sources such as medical equipment, public health
statistics, and socioeconomic data, healthcare providers in this pathway can obtain an overall view of
their patients. Because this information is easily accessible, human service providers can conduct
targeted investigations and develop the skills and abilities needed to identify and provide better
treatment options to patients.
 Right Innovation: This pathway recognises that new disease conditions, treatments, and medical
devices will emerge in the future. Similarly, advancements in the delivery of patient services, such as
medication upgrades and the effectiveness of research and development efforts, will open up new
avenues for promoting well-being and patient health through a national social insurance system. The
availability of early trial data is critical for stakeholders. This data can be used to identify high-potential
targets and ways to improve traditional clinical treatment methods.
 Right Value: To improve the quality and value of health-care services, providers must pay close
attention to their patients on an ongoing basis. Patients must obtain the most advantageous outcomes
as determined by their social insurance system. Identifying and destroying data misrepresentation,
manipulations, and waste, as well as improving resources, are examples of measures that could be
taken to ensure the intelligent use of data.
HADOOP – BASED APPLICATIONS FOR THE HEALTH INDUSTRY
Because healthcare data is primarily in printed form, active digitization of print-form data is required.
Because the majority of this data is unstructured, this industry faces significant challenges in extracting
meaningful information about patient care, clinical operations, and research. The Hadoop ecosystem, a
collection of software utilities, can assist the healthcare sector in managing this massive amount of data. The
following are some of the Hadoop ecosystem's applications in the healthcare sector:
 Treatment of Cancer and Genomics: Three billion base pairs make up human DNA. Large amounts
of data must be efficiently organised in order to fight cancer. Cancer mutation patterns and reactions
differ depending on individual genetics, which explains why some cancers are incurable. Oncologists
have determined that it is critical to provide specific treatment for specific cancers based on the
patient's genetic makeup when recognising cancer patterns. Hadoop's MapReduce technology allows
for the mapping of three billion DNA base pairs in order to determine the best cancer treatment for
each individual patient. Arizona State University is working on a healthcare model that will use
personal genomic data to choose a treatment based on the patient's cancer gene. This model serves as
the foundation for therapy by analysing large amounts of data in order to improve the chances of saving
patients' lives.
 Monitoring of Patient Vitals: Using big-data technology, hospital staff around the world connect
their work output. Various hospitals around the world use Hadoop-based components in the Hadoop
Distributed File System (HDFS), such as the Impala, HBase, Hive, Spark, and Flume frameworks, to
convert massive amounts of unstructured data generated by sensors that measure patient vital signs,
heartbeats per minute, blood pressure, blood sugar level, and respiratory rate. Without Hadoop, these
healthcare professionals would be unable to analyse the unstructured data generated by patient
healthcare systems. There are 6200 Intensive Care Units (ICUs) for paediatric healthcare in Atlanta,
Georgia, where children can stay for more than a month depending on their condition. These intensive
care units are outfitted with sensor technology that monitors the child's health status in terms of
heartbeat, blood pressure, and other vital signs. If a problem arises, an alert is automatically sent to
medical personnel to ensure the child's safety.
 Hospital Network: Several hospitals use the NoSQL database in the Hadoop ecosystem to collect and
manage massive amounts of real-time data from various sources related to patient care, finances, and
payroll, allowing them to identify high-risk patients while reducing day-to-day expenses.
 Healthcare Intelligence: Hadoop technology also helps hospitals and insurance companies with their
healthcare intelligence applications. Pig, Hive, and MapReduce technologies in the Hadoop ecosystem
process large datasets related to medicines, diseases, symptoms, opinions, geographic regions, and
other factors to extract meaningful information (for example, desired age) for insurance companies.
 Prevention and Detection of Frauds: In the early days of big data analytics, health-based insurance
companies used a variety of methods to detect fraud and develop methods to prevent medical fraud.
Companies use Hadoop applications with a prediction model to identify fraudsters based on data from
previous health claims, voice recordings, wages, and demographics. By utilising real-time Hadoop-
based health applications, authentic medical claim bills, weather forecasting data, voice data
recordings, and other data sources, Hadoop's NoSQL database also aids in the early detection of
medical claim fraud.
BIG DATA ANALYTICS ARCHITECTURE FOR HEALTH INFORMATICS
Currently, the primary focus of big-data analytics is to gain a comprehensive understanding and insight
into big data rather than to collect it. Data analytics is the development and application of algorithms for
analysing various complex data sets in order to extract meaningful knowledge, patterns, and information.
Researchers have recently begun to consider the appropriate architectural framework for big-data analytics-
enabled healthcare systems, one of which employs a four-layer architecture comprised of a transformation
layer, data-source layer, extensive data platform layer, and analytical layer. Each layer has its own data-
processing functionality for using the MapReduce processing model to perform specific tasks on the HDFS.
Other layers are responsible for tasks such as report generation, query passing, data mining processing, and
online analytical processing. The main requirement in big-data analytical processing is to bundle the data as
quickly as possible to reduce bundling time. The next priority in big-data analytical processing is to efficiently
update and transform queries in real time. The third requirement in big-data analytical processing is to use and
manage storage space efficiently. The final big-data analytics specification is to efficiently become acquainted
with the rapidly evolving workload notations. In terms of how big data is processed, big-data analytics
frameworks differ from traditional healthcare processing systems.
In the current healthcare system, data is processed using traditional tools installed in a single stand-
alone system, such as a desktop computer. Big data, on the other hand, is processed by clustering and scans
multiple cluster nodes in the network. To handle large medical data sets, this processing is parallelized. Health-
related data sets can be processed by freely available frameworks such as Hadoop, MapReduce, Pig, Sqoop,
Hive, and HBase Avro. Big-data technologies are scientific innovations that are similar to those used for large
datasets. The first requirement is for significant data sources for processing. Clusters with centralised big-data
processing infrastructure are at the pinnacle of performance in the second component. Most big-data analytics
processing tools, it has been discovered, use the MapReduce paradigm to provide data security, scalability,
and manageability. In the third component, big data analytics applications have a storage domain to integrate
accessed databases that use different applications. The fourth component, which includes reports, Online
Analytical Processing (OLAP), queries, and data mining, contains the most popular big-data analytics
applications in healthcare systems. EHRs, genome databases, genome data files, text and imagery
(unstructured data sources), clinical decision support systems, government-related sources, medical test labs
and pharmacies, and health insurance companies are all sources of healthcare data. These data are frequently
available in various scheme tables, are in ASCII/text, and are stored in a variety of locations. The following
section describes the various big data Hadoop-based processing tools that aid in the development of health-
based applications for the healthcare industry.
HADOOP’S TOOLS AND TECHNIQUES FOR BIG DATA
Special tools are required to manage unstructured big data that does not fit into any database. The IT
sector examines this massive dataset using the Hadoop platform and various methods developed to record,
organise, and analyse this type of data. To extract meaningful output from big data, more efficient tools are
required. Most devices, including MapReduce, Mahout, Hive, and others, are built on the Apache Hadoop
architecture. The following sections go over the various tools used in the processing of large healthcare
datasets.
Big data analytics conceptual architecture for health informatics
Hadoop system architecture
 Apache Hadoop: Hadoop has come to mean a variety of things. It began as a single software project
to support a web search engine in 2002. It has since evolved into an ecosystem of tools and applications
used to analyse large amounts and types of data. Hadoop is no longer a single project, but rather a data
processing approach that is fundamentally different from the traditional relational database model. A
more practical definition of the Hadoop ecosystem and framework is: open source tools, libraries, and
methodologies for "big data" analysis in which several data sets are collected from various sources,
such as Internet images, audio, videos, and sensor records, as both structured and unstructured data to
be processed.
 HDFS: The HDFS was created to handle large data systems. HDFS is intended for data streaming,
which involves reading large amounts of data from disc in bulk. HDFS blocks are 64 MB or 128 MB
in size. Nodes are classified into two types: name nodes and data node(s). A node with a single name
manages all of the metadata needed to store and retrieve data from data nodes. There is no data in the
name node. Files are stored in the correct order as blocks that are all the same size. The distributed
nature and dependability of HDFS are its distinguishing characteristics. Metadata and file data are
stored separately. The name node stores metadata, while the data node stores application data.
 MapReduce: Apache Hadoop is frequently associated with MapReduce computing. The MapReduce
computation model is a powerful tool that is used more frequently than most users realise in many
health applications. Its basic idea is straightforward. There are two stages in MapReduce: a mapping
stage and a reducing stage. A mapping procedure is applied to input data during the mapping stage.
When counting is finished, the reducing phase begins. The MapReduce programming phase consists
of two steps: a mapping stage that accepts key-value pairs as input and generates key-value pairs as
output, and a second reducing stage that accepts key-value pairs as input and output. In Hadoop, there
is a fixed-size data segment division step known as input splits. The Map function creates the value
pairs and key that are saved in the mapper. Any keys that are identical are merged.
 Apache Hive: Hive is a data warehousing layer built on top of Hadoop that allows for SQL-like
procedural analysis and querying. Apache Hive can run ad hoc queries, summarise data, and analyse
it. Hive is widely regarded as the de facto standard for SQL-based queries over petabytes of Hadoop
data, with features such as simple data extraction, transformation, and access to HDFS data files or
other HBase storage systems.
 Apache Pig: Apache Pig is one of the open-source platforms available for better big data analysis. Pig
is a programming tool that is similar to MapReduce. Pig, which was created as a research project by
the Yahoo web service provider, allows users to create user-defined functions and supports many
traditional data operations such as join, sort, filter, and so on.
 Apache HBase: HBase is a column-oriented NoSQL database that is used in Hadoop to store large
numbers of rows and columns. Random read/write operations are supported by HBase. It also supports
record-level updates, which HDFS does not. HBase stores data in parallel across commodity servers
using the underlying distributed file systems. Because of the tight integration of HBase and HDFS, the
file system of choice is typically HDFS. If a structured latency view of Hadoop-stored high-scale data
is required, HBase is the right choice. Its open-source code scales linearly to handle petabytes of data
on thousands of nodes.
 Apache Oozie: A sophisticated technique known as Apache Oozie is required to run a complex system
or tight system design, or if there are several interconnected stations with data dependencies. Apache
Oozie is capable of handling and running multiple Hadoop-related jobs. Oozie is divided into two
parts: workflow engines that store and execute Hadoop-based workflow collections, and a coordinator
engine that processes workflow jobs in accordance with the process schedule. Oozie is designed to
create and manage Hadoop jobs as a workflow, with one job's output serving as the input for the next.
Oozie is not a substitute for the Yarn scheduler. Action-based Directed Acyclic Graphs (DAGs) are
used to represent Oozie workflow jobs. In the cluster, Oozie serves as a service, and clients submit
their assignments for proactive or reactive execution.
 Apache Avro: Avro is a serialisation format for exchanging data between programmes written in any
language. It is frequently used to link Flume data flows. The Avro system is schema-based, with the
role of a scheme being to perform read-and-write operations while remaining language-independent.
Avro serialises data with a built-in schema. It's a framework for serialising persistent data as well as
remote procedure calls between Hadoop nodes and client programmes and Hadoop services.
 Apache Zookeeper: Zookeeper is a centralised system used by applications to maintain a healthcare
system as well as to provide organising and other elements on and between nodes. It manages the
common objects found in large cluster environments, such as configuration data and the hierarchical
naming space. These services can be used by various applications to coordinate the distributed
processing of Hadoop clusters. Zookeeper also ensures the application's dependability. When an
application master dies, the zookeeper creates a new master to take over the tasks.
 Apache Yarn: Hadoop Yarn is a distributed shell application built on Yarn that is an example of a
non-MapReduce Hadoop application. Yarn is made up of two parts: a Resource Manager (RM), which
handles all of the resources required for tasks within a cluster, and a Node Manager (NM), which is
located on each host in a collection and handles the available resources on the independent host. Both
components manage the containers, memory management, CPU throughput, and I/O system, which
run the dedicated application code, as well as the scheduling of jobs.
 Apache Sqoop: Apache Sqoop is a powerful tool that extracts data from a Relational Database
Management System (RDMS) and loads it into the Hadoop architecture for query processing. To
accomplish this, the MapReduce paradigm or other standard-level tools, such as Hive, are used. Data
can be used by Hadoop applications once it is stored in HDFS.
 Apache Flume: Apache Flume is a highly dependable service for collecting and moving large amounts
of data from disparate machines to HDFS. Data transport frequently involves several flume agents
traversing a series of devices and locations. Flume is commonly used for log files, social media data,
and email messages.
CONCLUSION
We provided an in-depth description as well as a brief overview of big data in general and in the
healthcare system in this seminar, which plays an important role in healthcare informatics and has a significant
impact on the healthcare system and the big data four Vs. in healthcare. We also proposed using a conceptual
architecture for solving healthcare problems in big data using Hadoop-based terminologies, which entails
using big data generated by various levels of medical data as well as developing methods for analysing this
data and obtaining answers to medical questions. The combination of big data and healthcare analytics can
lead to effective treatments for specific patients by allowing doctors to prescribe medications that are
appropriate for each individual rather than those that work for the majority of people. As we all know, big
data analytics is still in its early stages, and current tools and methods cannot solve the problems that come
with big data. Big data is a complex system that poses enormous challenges. As a result, extensive research in
this field will be required to address the issues confronting the healthcare system.

More Related Content

Similar to Ajith M Jose_Report1.docx

Health Informatics- Module 1-Chapter 1.pptx
Health Informatics- Module 1-Chapter 1.pptxHealth Informatics- Module 1-Chapter 1.pptx
Health Informatics- Module 1-Chapter 1.pptxArti Parab Academics
 
Application of Big Data in Medical Science brings revolution in managing heal...
Application of Big Data in Medical Science brings revolution in managing heal...Application of Big Data in Medical Science brings revolution in managing heal...
Application of Big Data in Medical Science brings revolution in managing heal...IJEEE
 
Please respond to each of the 3 posts with 3 APA sources no older th
Please respond to each of the 3 posts with 3 APA sources no older thPlease respond to each of the 3 posts with 3 APA sources no older th
Please respond to each of the 3 posts with 3 APA sources no older thmaple8qvlisbey
 
Please respond to each of the 3 posts with 3.docx
Please respond to each of the 3 posts with 3.docxPlease respond to each of the 3 posts with 3.docx
Please respond to each of the 3 posts with 3.docxbkbk37
 
Health information management system by dr. protik.pptx
Health information management system by dr. protik.pptxHealth information management system by dr. protik.pptx
Health information management system by dr. protik.pptxProtik Banik
 
Data-driven Healthcare for Payers
Data-driven Healthcare for PayersData-driven Healthcare for Payers
Data-driven Healthcare for PayersLindaWatson19
 
Data-driven Healthcare for Manufacturers
Data-driven Healthcare for ManufacturersData-driven Healthcare for Manufacturers
Data-driven Healthcare for ManufacturersLindaWatson19
 
Data-Driven Healthcare for Manufacturers
Data-Driven Healthcare for Manufacturers Data-Driven Healthcare for Manufacturers
Data-Driven Healthcare for Manufacturers Amit Mishra
 
Healthcare transformation with next BI.pdf
Healthcare transformation with next BI.pdfHealthcare transformation with next BI.pdf
Healthcare transformation with next BI.pdfSparity1
 
Healthcare transformation with next BI.pdf
Healthcare transformation with next BI.pdfHealthcare transformation with next BI.pdf
Healthcare transformation with next BI.pdfSparity1
 
IRJET- Integration of Big Data Analytics in Healthcare Systems
IRJET- Integration of Big Data Analytics in Healthcare SystemsIRJET- Integration of Big Data Analytics in Healthcare Systems
IRJET- Integration of Big Data Analytics in Healthcare SystemsIRJET Journal
 
Big Data Analytics using in Healthcare Management System
Big Data Analytics using in Healthcare Management SystemBig Data Analytics using in Healthcare Management System
Big Data Analytics using in Healthcare Management Systemijtsrd
 
Health Informatics- Module 3-Chapter 1.pptx
Health Informatics- Module 3-Chapter 1.pptxHealth Informatics- Module 3-Chapter 1.pptx
Health Informatics- Module 3-Chapter 1.pptxArti Parab Academics
 
Use of data analytics in health care
Use of data analytics in health careUse of data analytics in health care
Use of data analytics in health careAkanshabhushan
 
POST EACH DISCUSSION SEPARATELYThe way patient data is harvested.docx
POST EACH DISCUSSION SEPARATELYThe way patient data is harvested.docxPOST EACH DISCUSSION SEPARATELYThe way patient data is harvested.docx
POST EACH DISCUSSION SEPARATELYThe way patient data is harvested.docxLacieKlineeb
 
Health information technology (Health IT)
Health information technology (Health IT)Health information technology (Health IT)
Health information technology (Health IT)Mohammad Yeakub
 
Data science in healthcare-Assignment 2.pptx
Data science in healthcare-Assignment 2.pptxData science in healthcare-Assignment 2.pptx
Data science in healthcare-Assignment 2.pptxArpitaDebnath20
 

Similar to Ajith M Jose_Report1.docx (20)

Health Informatics- Module 1-Chapter 1.pptx
Health Informatics- Module 1-Chapter 1.pptxHealth Informatics- Module 1-Chapter 1.pptx
Health Informatics- Module 1-Chapter 1.pptx
 
Application of Big Data in Medical Science brings revolution in managing heal...
Application of Big Data in Medical Science brings revolution in managing heal...Application of Big Data in Medical Science brings revolution in managing heal...
Application of Big Data in Medical Science brings revolution in managing heal...
 
Please respond to each of the 3 posts with 3 APA sources no older th
Please respond to each of the 3 posts with 3 APA sources no older thPlease respond to each of the 3 posts with 3 APA sources no older th
Please respond to each of the 3 posts with 3 APA sources no older th
 
Please respond to each of the 3 posts with 3.docx
Please respond to each of the 3 posts with 3.docxPlease respond to each of the 3 posts with 3.docx
Please respond to each of the 3 posts with 3.docx
 
Health information management system by dr. protik.pptx
Health information management system by dr. protik.pptxHealth information management system by dr. protik.pptx
Health information management system by dr. protik.pptx
 
Big data for health
Big data for healthBig data for health
Big data for health
 
Data-driven Healthcare for Payers
Data-driven Healthcare for PayersData-driven Healthcare for Payers
Data-driven Healthcare for Payers
 
Data-driven Healthcare for Manufacturers
Data-driven Healthcare for ManufacturersData-driven Healthcare for Manufacturers
Data-driven Healthcare for Manufacturers
 
Data-Driven Healthcare for Manufacturers
Data-Driven Healthcare for Manufacturers Data-Driven Healthcare for Manufacturers
Data-Driven Healthcare for Manufacturers
 
Healthcare transformation with next BI.pdf
Healthcare transformation with next BI.pdfHealthcare transformation with next BI.pdf
Healthcare transformation with next BI.pdf
 
Healthcare transformation with next BI.pdf
Healthcare transformation with next BI.pdfHealthcare transformation with next BI.pdf
Healthcare transformation with next BI.pdf
 
IRJET- Integration of Big Data Analytics in Healthcare Systems
IRJET- Integration of Big Data Analytics in Healthcare SystemsIRJET- Integration of Big Data Analytics in Healthcare Systems
IRJET- Integration of Big Data Analytics in Healthcare Systems
 
Big Data Analytics using in Healthcare Management System
Big Data Analytics using in Healthcare Management SystemBig Data Analytics using in Healthcare Management System
Big Data Analytics using in Healthcare Management System
 
Health Informatics- Module 3-Chapter 1.pptx
Health Informatics- Module 3-Chapter 1.pptxHealth Informatics- Module 3-Chapter 1.pptx
Health Informatics- Module 3-Chapter 1.pptx
 
Use of data analytics in health care
Use of data analytics in health careUse of data analytics in health care
Use of data analytics in health care
 
CSC_HealthcareJourney
CSC_HealthcareJourneyCSC_HealthcareJourney
CSC_HealthcareJourney
 
POST EACH DISCUSSION SEPARATELYThe way patient data is harvested.docx
POST EACH DISCUSSION SEPARATELYThe way patient data is harvested.docxPOST EACH DISCUSSION SEPARATELYThe way patient data is harvested.docx
POST EACH DISCUSSION SEPARATELYThe way patient data is harvested.docx
 
Health information technology (Health IT)
Health information technology (Health IT)Health information technology (Health IT)
Health information technology (Health IT)
 
Big Data in He
Big Data in HeBig Data in He
Big Data in He
 
Data science in healthcare-Assignment 2.pptx
Data science in healthcare-Assignment 2.pptxData science in healthcare-Assignment 2.pptx
Data science in healthcare-Assignment 2.pptx
 

Recently uploaded

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 

Recently uploaded (20)

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 

Ajith M Jose_Report1.docx

  • 1. BIG DATA ANALYTICS FOR THE HEALTHCARE INDUSTRY: IMPACT, APPLICATIONS, AND TOOLS Submitted By, Ajith M Jose MCA 2022-2024 MCA2206
  • 3. INTRODUCTION Various applications, devices, and geographical research activities generate data every day for weather forecasting, weather prediction, disaster evaluation, crime detection, and the health industry, to name a few. Big data is currently associated with core technologies and various enterprises, such as Google, Facebook, and IBM, that extract valuable information from vast data collections. A new era of open communication in healthcare has begun. Big data is being generated at an alarming rate in every industry, including healthcare, for patient care, compliance, and various regulatory requirements. As the global population grows along with the human lifespan, treatment delivery models are rapidly evolving, and some of the decisions underlying these changes are data-driven. Healthcare investors are promised new ideas from big data, which is so-called due to its volume, complexity, and breadth. Pharmaceutical industry experts and shareholders have begun to routinely analyse big data to gain insight. These activities, however, are still in their early stages and must be coordinated in order to address healthcare delivery issues and improve healthcare quality. Early systems for big-data analytics in healthcare informatics were established in a variety of scenarios, such as the investigation of patient characteristics and the determination of treatment cost and outcomes to pinpoint the best and most cost-effective treatments. Health informatics is the study of healthcare information that combines healthcare sciences, computing sciences, and information sciences. Health informatics is the acquisition, storage, and retrieval of data in order to provide better results to healthcare providers. Data in the healthcare system is distinguished by its heterogeneity and variety as a result of the integration of a wide range of biomedical data sources, such as sensor data, imagery, gene arrays, laboratory tests, free text, and demographics. The majority of data in the healthcare system (for example, doctor's notes, lab test results, and clinical data) is unstructured and not stored electronically, so it exists only in hard copies, and its volume is rapidly increasing. Currently, there is a strong emphasis on digitising these massive stores of hard-copy data. The revolutions in data size are making it difficult to achieve this goal. The various terminologies and models developed to solve the problems associated with big data focus on four issues known as the four Vs: volume, variety, velocity, and veracity. Electronic Health Records (EHR), machine-generated/sensor data, health information exchanges, patient registries, portals, genetic databases, and public records are some of the data classes used in healthcare applications. In the healthcare industry, public records are significant sources of big data that require efficient data analytics to resolve their associated healthcare problems. According to a 2012 survey, healthcare data totaled nearly 550 petabytes and is expected to reach nearly 26000 petabytes by 2020. Given the diverse data formats, massive volumes, and associated uncertainties in major data sources, transforming raw data into actionable information is a daunting task. Because health features in medical data are so complex, identifying them and selecting class attributes for health analytics requires highly sophisticated and architecturally specific techniques and tools. BIG DATA ANALYTICS IN HEALTH INFORMATICS The execution of computer programming is the primary distinction between traditional health analysis and big-data health analytics. In the past, the healthcare industry relied on other industries for extensive data analysis. Many healthcare investors believe in information technology because it produces meaningful results—their operating systems are functional and can process data into standardised formats. The healthcare industry is currently confronted with the challenge of dealing with rapidly expanding big healthcare data. Big data analytics is expanding and has the potential to provide valuable insights to the healthcare system. As previously stated, the vast majority of the massive amounts of data generated by this system are saved in hard copies that must then be digitised. Big data has the potential to improve healthcare delivery and lower costs while also enabling advanced patient care, improving patient outcomes, and avoiding unnecessary costs. Big data analytics is currently being used to predict the outcomes of physician decisions, such as the outcome of a heart operation for a condition based on the patient's age, current situation, and health status. In essence, the role of big data in the health sector is to manage data sets related to healthcare that are complex and difficult
  • 4. to manage with current hardware, software, and management tools. Aside from the growing volume of healthcare data, reimbursement methods are also evolving. As a result, purposeful use and performance-based pay have emerged as critical factors in the healthcare sector. In 2011, healthcare organisations generated more than 150 exabytes of data, all of which must be efficiently analysed in order to be useful to the healthcare system. Healthcare-related data is stored in EHRs in a variety of ways. A surge in data related to healthcare informatics has also been observed in bioinformatics, where genomic sequencing generates many terabytes of data. There are numerous analytical techniques for interpreting medical data, which can then be used to provide patient care. The disparities in the origins and formats of big data pose a challenge to the healthcare informatics community in developing data processing technologies. There is a high demand for a method that combines disparate data sources. Several conceptual approaches can be used to detect irregularities in large amounts of data from various datasets. The following frameworks are available for analysing healthcare data:  Predictive Analytics in Healthcare: Predictive analysis has been recognised as a significant business intelligence approach for the past two years, but its real-world applications extend far beyond the business context. Extensive data analytics methodologies include text analytics and multimedia analytics. However, one of the most important categories is predictive analytics, which includes statistical approaches such as data mining and machine learning that analyse current and historical data to forecast the future. Predictive methods are now used in hospitals to determine if a patient is at risk of readmission. This data can help clinicians make critical patient care decisions. Predictive analysis requires an understanding of and application of machine learning, which is commonly used in this technique.  Machine Learning in Healthcare: Machine learning is similar to data mining in that both processes scan data for patterns. Unlike data mining applications, which extract data based on human understanding, machine learning uses that data to improve the program's performance. Machine learning recognises data patterns and modifies programme functions accordingly.  Electronic Health Records: The most common health application of big data in healthcare is EHR. Each patient has a medical record that includes information such as their medical history, allergies, diagnosis, symptoms, and lab test results. A secure information system is used to share patient records with healthcare providers in the public and private sectors. These files are customizable, allowing doctors to make changes over time and add new medical test results without the need for paperwork or data duplication. FOUR Vs OF BIG DATA IN HEALTHCARE Big data has four primary characteristics: volume, velocity, variety, and veracity.  Volume: Big data refers to massive amounts of data collected. There is no set threshold for the significance of this information. Typically, time is used when dealing with large amounts of data that must be managed, stored, and analysed using traditional databases and data processing architecture. Because of lower costs of data storage and processing architectures, as well as the need to extract valuable insights from data to improve business processes, efficiencies, and consumer services, the volume of data generated by modern IT and the healthcare system is increasing.  Velocity: The primary reason for the exponential growth of data is velocity, which refers to how quickly data is collected. Healthcare systems are producing data at ever-increasing rates. Because of the volume and variety of structured or unstructured data collected, the velocity of data generation after processing necessitates a decision based on its output.  Variety: The term "variety" refers to the data's format, such as unstructured or structured text, medical imagery, audio, video, and sensor data. Clinical data (patient record data) is an example of structured
  • 5. data information that must be collected, stored, and processed by a specific device. Structured data accounts for only 5% to 10% of healthcare data. Emails, photos, videos, audio, and other health-related data such as hospital medical reports, physician's notes, paper prescriptions, and radiograph films are examples of unstructured or semi-structured data.  Veracity: The degree of assurance that the meaning of data is consistent is referred to as data integrity. The level of data credibility and reliability varies between data sources. Big data analytics results must be credible and error-free. Still, in healthcare, unsupervised machine learning algorithms are used by automated machines to make decisions based on data that may be useless or misleading. Healthcare analytics is in charge of gleaning useful insights from this data in order to treat patients and make the best decisions possible. IMPACT OF BIG DATA ON THE HEALTHCARE SYSTEM Big data has the potential to revolutionise outcomes in terms of the most appropriate or accurate patient diagnosis and the accuracy of information used in the health informatics system. As a result, analysing massive amounts of data will have a significant impact on the medical services framework in five ways, or "pathways." Improving patient outcomes along these pathways, as described below, will be the healthcare system's priority, with a direct impact on the patient.  Right Living: The term "right living" refers to the patient leading a better and healthier lifestyle. Patients can manage themselves better by making the best decisions for themselves, based on the use of information mining better choices and enhancing their well-being. Patients can take an active role in leading a healthy lifestyle by selecting the best path for their daily health, which includes diet, preventive care, exercise, and other daily activities.  Right Care: This pathway ensures that patients receive the best available treatment and that all providers use the same data and have the same goals to avoid planning and effort redundancy. In the age of big data, this aspect has become more viable.  Right Provider: By combining data from various sources such as medical equipment, public health statistics, and socioeconomic data, healthcare providers in this pathway can obtain an overall view of their patients. Because this information is easily accessible, human service providers can conduct targeted investigations and develop the skills and abilities needed to identify and provide better treatment options to patients.  Right Innovation: This pathway recognises that new disease conditions, treatments, and medical devices will emerge in the future. Similarly, advancements in the delivery of patient services, such as medication upgrades and the effectiveness of research and development efforts, will open up new avenues for promoting well-being and patient health through a national social insurance system. The availability of early trial data is critical for stakeholders. This data can be used to identify high-potential targets and ways to improve traditional clinical treatment methods.  Right Value: To improve the quality and value of health-care services, providers must pay close attention to their patients on an ongoing basis. Patients must obtain the most advantageous outcomes as determined by their social insurance system. Identifying and destroying data misrepresentation, manipulations, and waste, as well as improving resources, are examples of measures that could be taken to ensure the intelligent use of data.
  • 6. HADOOP – BASED APPLICATIONS FOR THE HEALTH INDUSTRY Because healthcare data is primarily in printed form, active digitization of print-form data is required. Because the majority of this data is unstructured, this industry faces significant challenges in extracting meaningful information about patient care, clinical operations, and research. The Hadoop ecosystem, a collection of software utilities, can assist the healthcare sector in managing this massive amount of data. The following are some of the Hadoop ecosystem's applications in the healthcare sector:  Treatment of Cancer and Genomics: Three billion base pairs make up human DNA. Large amounts of data must be efficiently organised in order to fight cancer. Cancer mutation patterns and reactions differ depending on individual genetics, which explains why some cancers are incurable. Oncologists have determined that it is critical to provide specific treatment for specific cancers based on the patient's genetic makeup when recognising cancer patterns. Hadoop's MapReduce technology allows for the mapping of three billion DNA base pairs in order to determine the best cancer treatment for each individual patient. Arizona State University is working on a healthcare model that will use personal genomic data to choose a treatment based on the patient's cancer gene. This model serves as the foundation for therapy by analysing large amounts of data in order to improve the chances of saving patients' lives.  Monitoring of Patient Vitals: Using big-data technology, hospital staff around the world connect their work output. Various hospitals around the world use Hadoop-based components in the Hadoop Distributed File System (HDFS), such as the Impala, HBase, Hive, Spark, and Flume frameworks, to convert massive amounts of unstructured data generated by sensors that measure patient vital signs, heartbeats per minute, blood pressure, blood sugar level, and respiratory rate. Without Hadoop, these healthcare professionals would be unable to analyse the unstructured data generated by patient healthcare systems. There are 6200 Intensive Care Units (ICUs) for paediatric healthcare in Atlanta, Georgia, where children can stay for more than a month depending on their condition. These intensive care units are outfitted with sensor technology that monitors the child's health status in terms of heartbeat, blood pressure, and other vital signs. If a problem arises, an alert is automatically sent to medical personnel to ensure the child's safety.  Hospital Network: Several hospitals use the NoSQL database in the Hadoop ecosystem to collect and manage massive amounts of real-time data from various sources related to patient care, finances, and payroll, allowing them to identify high-risk patients while reducing day-to-day expenses.  Healthcare Intelligence: Hadoop technology also helps hospitals and insurance companies with their healthcare intelligence applications. Pig, Hive, and MapReduce technologies in the Hadoop ecosystem process large datasets related to medicines, diseases, symptoms, opinions, geographic regions, and other factors to extract meaningful information (for example, desired age) for insurance companies.  Prevention and Detection of Frauds: In the early days of big data analytics, health-based insurance companies used a variety of methods to detect fraud and develop methods to prevent medical fraud. Companies use Hadoop applications with a prediction model to identify fraudsters based on data from previous health claims, voice recordings, wages, and demographics. By utilising real-time Hadoop- based health applications, authentic medical claim bills, weather forecasting data, voice data recordings, and other data sources, Hadoop's NoSQL database also aids in the early detection of medical claim fraud. BIG DATA ANALYTICS ARCHITECTURE FOR HEALTH INFORMATICS Currently, the primary focus of big-data analytics is to gain a comprehensive understanding and insight into big data rather than to collect it. Data analytics is the development and application of algorithms for analysing various complex data sets in order to extract meaningful knowledge, patterns, and information. Researchers have recently begun to consider the appropriate architectural framework for big-data analytics-
  • 7. enabled healthcare systems, one of which employs a four-layer architecture comprised of a transformation layer, data-source layer, extensive data platform layer, and analytical layer. Each layer has its own data- processing functionality for using the MapReduce processing model to perform specific tasks on the HDFS. Other layers are responsible for tasks such as report generation, query passing, data mining processing, and online analytical processing. The main requirement in big-data analytical processing is to bundle the data as quickly as possible to reduce bundling time. The next priority in big-data analytical processing is to efficiently update and transform queries in real time. The third requirement in big-data analytical processing is to use and manage storage space efficiently. The final big-data analytics specification is to efficiently become acquainted with the rapidly evolving workload notations. In terms of how big data is processed, big-data analytics frameworks differ from traditional healthcare processing systems. In the current healthcare system, data is processed using traditional tools installed in a single stand- alone system, such as a desktop computer. Big data, on the other hand, is processed by clustering and scans multiple cluster nodes in the network. To handle large medical data sets, this processing is parallelized. Health- related data sets can be processed by freely available frameworks such as Hadoop, MapReduce, Pig, Sqoop, Hive, and HBase Avro. Big-data technologies are scientific innovations that are similar to those used for large datasets. The first requirement is for significant data sources for processing. Clusters with centralised big-data processing infrastructure are at the pinnacle of performance in the second component. Most big-data analytics processing tools, it has been discovered, use the MapReduce paradigm to provide data security, scalability, and manageability. In the third component, big data analytics applications have a storage domain to integrate accessed databases that use different applications. The fourth component, which includes reports, Online Analytical Processing (OLAP), queries, and data mining, contains the most popular big-data analytics applications in healthcare systems. EHRs, genome databases, genome data files, text and imagery (unstructured data sources), clinical decision support systems, government-related sources, medical test labs and pharmacies, and health insurance companies are all sources of healthcare data. These data are frequently available in various scheme tables, are in ASCII/text, and are stored in a variety of locations. The following section describes the various big data Hadoop-based processing tools that aid in the development of health- based applications for the healthcare industry. HADOOP’S TOOLS AND TECHNIQUES FOR BIG DATA Special tools are required to manage unstructured big data that does not fit into any database. The IT sector examines this massive dataset using the Hadoop platform and various methods developed to record, organise, and analyse this type of data. To extract meaningful output from big data, more efficient tools are required. Most devices, including MapReduce, Mahout, Hive, and others, are built on the Apache Hadoop architecture. The following sections go over the various tools used in the processing of large healthcare datasets.
  • 8. Big data analytics conceptual architecture for health informatics Hadoop system architecture  Apache Hadoop: Hadoop has come to mean a variety of things. It began as a single software project to support a web search engine in 2002. It has since evolved into an ecosystem of tools and applications used to analyse large amounts and types of data. Hadoop is no longer a single project, but rather a data processing approach that is fundamentally different from the traditional relational database model. A more practical definition of the Hadoop ecosystem and framework is: open source tools, libraries, and methodologies for "big data" analysis in which several data sets are collected from various sources, such as Internet images, audio, videos, and sensor records, as both structured and unstructured data to be processed.  HDFS: The HDFS was created to handle large data systems. HDFS is intended for data streaming, which involves reading large amounts of data from disc in bulk. HDFS blocks are 64 MB or 128 MB in size. Nodes are classified into two types: name nodes and data node(s). A node with a single name manages all of the metadata needed to store and retrieve data from data nodes. There is no data in the name node. Files are stored in the correct order as blocks that are all the same size. The distributed
  • 9. nature and dependability of HDFS are its distinguishing characteristics. Metadata and file data are stored separately. The name node stores metadata, while the data node stores application data.  MapReduce: Apache Hadoop is frequently associated with MapReduce computing. The MapReduce computation model is a powerful tool that is used more frequently than most users realise in many health applications. Its basic idea is straightforward. There are two stages in MapReduce: a mapping stage and a reducing stage. A mapping procedure is applied to input data during the mapping stage. When counting is finished, the reducing phase begins. The MapReduce programming phase consists of two steps: a mapping stage that accepts key-value pairs as input and generates key-value pairs as output, and a second reducing stage that accepts key-value pairs as input and output. In Hadoop, there is a fixed-size data segment division step known as input splits. The Map function creates the value pairs and key that are saved in the mapper. Any keys that are identical are merged.  Apache Hive: Hive is a data warehousing layer built on top of Hadoop that allows for SQL-like procedural analysis and querying. Apache Hive can run ad hoc queries, summarise data, and analyse it. Hive is widely regarded as the de facto standard for SQL-based queries over petabytes of Hadoop data, with features such as simple data extraction, transformation, and access to HDFS data files or other HBase storage systems.  Apache Pig: Apache Pig is one of the open-source platforms available for better big data analysis. Pig is a programming tool that is similar to MapReduce. Pig, which was created as a research project by the Yahoo web service provider, allows users to create user-defined functions and supports many traditional data operations such as join, sort, filter, and so on.  Apache HBase: HBase is a column-oriented NoSQL database that is used in Hadoop to store large numbers of rows and columns. Random read/write operations are supported by HBase. It also supports record-level updates, which HDFS does not. HBase stores data in parallel across commodity servers using the underlying distributed file systems. Because of the tight integration of HBase and HDFS, the file system of choice is typically HDFS. If a structured latency view of Hadoop-stored high-scale data is required, HBase is the right choice. Its open-source code scales linearly to handle petabytes of data on thousands of nodes.  Apache Oozie: A sophisticated technique known as Apache Oozie is required to run a complex system or tight system design, or if there are several interconnected stations with data dependencies. Apache Oozie is capable of handling and running multiple Hadoop-related jobs. Oozie is divided into two parts: workflow engines that store and execute Hadoop-based workflow collections, and a coordinator engine that processes workflow jobs in accordance with the process schedule. Oozie is designed to create and manage Hadoop jobs as a workflow, with one job's output serving as the input for the next. Oozie is not a substitute for the Yarn scheduler. Action-based Directed Acyclic Graphs (DAGs) are used to represent Oozie workflow jobs. In the cluster, Oozie serves as a service, and clients submit their assignments for proactive or reactive execution.  Apache Avro: Avro is a serialisation format for exchanging data between programmes written in any language. It is frequently used to link Flume data flows. The Avro system is schema-based, with the role of a scheme being to perform read-and-write operations while remaining language-independent. Avro serialises data with a built-in schema. It's a framework for serialising persistent data as well as remote procedure calls between Hadoop nodes and client programmes and Hadoop services.  Apache Zookeeper: Zookeeper is a centralised system used by applications to maintain a healthcare system as well as to provide organising and other elements on and between nodes. It manages the common objects found in large cluster environments, such as configuration data and the hierarchical naming space. These services can be used by various applications to coordinate the distributed processing of Hadoop clusters. Zookeeper also ensures the application's dependability. When an application master dies, the zookeeper creates a new master to take over the tasks.  Apache Yarn: Hadoop Yarn is a distributed shell application built on Yarn that is an example of a non-MapReduce Hadoop application. Yarn is made up of two parts: a Resource Manager (RM), which handles all of the resources required for tasks within a cluster, and a Node Manager (NM), which is located on each host in a collection and handles the available resources on the independent host. Both
  • 10. components manage the containers, memory management, CPU throughput, and I/O system, which run the dedicated application code, as well as the scheduling of jobs.  Apache Sqoop: Apache Sqoop is a powerful tool that extracts data from a Relational Database Management System (RDMS) and loads it into the Hadoop architecture for query processing. To accomplish this, the MapReduce paradigm or other standard-level tools, such as Hive, are used. Data can be used by Hadoop applications once it is stored in HDFS.  Apache Flume: Apache Flume is a highly dependable service for collecting and moving large amounts of data from disparate machines to HDFS. Data transport frequently involves several flume agents traversing a series of devices and locations. Flume is commonly used for log files, social media data, and email messages. CONCLUSION We provided an in-depth description as well as a brief overview of big data in general and in the healthcare system in this seminar, which plays an important role in healthcare informatics and has a significant impact on the healthcare system and the big data four Vs. in healthcare. We also proposed using a conceptual architecture for solving healthcare problems in big data using Hadoop-based terminologies, which entails using big data generated by various levels of medical data as well as developing methods for analysing this data and obtaining answers to medical questions. The combination of big data and healthcare analytics can lead to effective treatments for specific patients by allowing doctors to prescribe medications that are appropriate for each individual rather than those that work for the majority of people. As we all know, big data analytics is still in its early stages, and current tools and methods cannot solve the problems that come with big data. Big data is a complex system that poses enormous challenges. As a result, extensive research in this field will be required to address the issues confronting the healthcare system.