The Internet is the largest source of information created by humanity. It contains a variety of materials available in various formats, such as text, audio, video, and much more. In all, web scraping is one way. There is a set of strategies here in which we get information from the website instead of copying the data manually. Many webbased data extraction methods are designed to solve specific problems and work on ad hoc domains. Various tools and technologies have been developed to facilitate web scraping. Unfortunately, the appropriateness and ethics of using these web scraping tools are often overlooked. There are hundreds of web scraping software available today, most of them designed for Java, Python, and Ruby. There is also open-source software and commercial software. Web-based software such as YahooPipes, Google Web Scrapers, and Firefox extensions for Outwit are the best tools for beginners in web cutting. Web extraction is basically used to cut this manual extraction and editing process and provide an easy and better way to collect data from a web page and convert it into the desired format and save it to a local or archive directory. In this study, among other kinds of scrub, we focus on those techniques that extract the content of a web page. In particular, we use scrubbing techniques for a variety of diseases with their own symptoms and precautions.
The Internet is the largest source of information created by humanity. It contains a variety of materials available in various formats such as text, audio, video and much more. In all web scraping is one way. It is a set
of strategies here in which we get information from the website instead of copying the data manually. Many Webbased data extraction methods are designed to solve specific problems and work on ad-hoc domains. Various tools and technologies have been developed to facilitate Web Scraping. Unfortunately, the appropriateness and ethics of
using these Web Scraping tools are often overlooked. There are hundreds of web scraping software available today, most of them designed for Java, Python and Ruby. There is also open source software and commercial software.
What are the different types of web scraping approachesAparna Sharma
The importance of Web scraping is increasing day by day as the world is depending more and more on data and it will increase more in the coming future. And web applications like Newsdata.io news API that is working on Web scraping fundamentals. More and more web data applications are being created to satisfy the data-hungry infrastructures. And do check out the top 21 list of web scraping tools in 2022
What is Web Scraping and What is it Used For? | Definition and Examples EXPLAINED
For More details Visit - https://hirinfotech.com
About Web scraping for Beginners - Introduction, Definition, Application and Best Practice in Deep Explained
What is Web Scraping or Crawling? and What it is used for? Complete introduction video.
Web Scraping is widely used today from small organizations to Fortune 500 companies. A wide range of applications of web scraping a few of them are listed here.
1. Lead Generation and Marketing Purpose
2. Product and Brand Monitoring
3. Brand or Product Market Reputation Analysis
4. Opening Mining and Sentimental Analysis
5. Gathering data for machine learning
6. Competitor Analysis
7. Finance and Stock Market Data analysis
8. Price Comparison for Product or Service
9. Building a product catalog
10. Fueling Job boards with Job listings
11. MAP compliance monitoring
12. Social media Monitor and Analysis
13. Content and News monitoring
14. Scrape search engine results for SEO monitoring
15. Business-specific application
------------
Basics of web scraping using python
Python Scraping Library
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...ijnlc
Websites are regarded as domains of limitless information which anyone and everyone can access. The
new trend of technology has shaped the way we do and manage our businesses. Today, advancements in
Internet technology has given rise to the proliferation of e-commerce websites. This, in turn made the
activities and lifestyles of marketers/vendors, retailers and consumers (collectively regarded as users in
this paper) easier as it provides convenient platforms to sale/order items through the internet.
Unfortunately, these desirable benefits are not without drawbacks as these platforms require that the users
spend a lot of time and efforts searching for best product deals, products updates and offers on ecommerce websites. Furthermore, they need to filter and compare search results by themselves which takes
a lot of time and there are chances of ambiguous results. In this paper, we applied web crawling and
scraping methods on an e-commerce website to obtain HTML data for identifying products updates based
on the current time. These HTML data are preprocessed to extract details of the products such as name,
price, post date and time, etc. to serve as useful information for users.
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING ...kevig
Websites are regarded as domains of limitless information which anyone and everyone can access. The
new trend of technology has shaped the way we do and manage our businesses. Today, advancements in
Internet technology has given rise to the proliferation of e-commerce websites. This, in turn made the
activities and lifestyles of marketers/vendors, retailers and consumers (collectively regarded as users in
this paper) easier as it provides convenient platforms to sale/order items through the internet.
Unfortunately, these desirable benefits are not without drawbacks as these platforms require that the users
spend a lot of time and efforts searching for best product deals, products updates and offers on e-
commerce websites. Furthermore, they need to filter and compare search results by themselves which takes
a lot of time and there are chances of ambiguous results. In this paper, we applied web crawling and
scraping methods on an e-commerce website to obtain HTML data for identifying products updates based
on the current time. These HTML data are preprocessed to extract details of the products such as name,
price, post date and time, etc. to serve as useful information for users.
The Internet is the largest source of information created by humanity. It contains a variety of materials available in various formats such as text, audio, video and much more. In all web scraping is one way. It is a set
of strategies here in which we get information from the website instead of copying the data manually. Many Webbased data extraction methods are designed to solve specific problems and work on ad-hoc domains. Various tools and technologies have been developed to facilitate Web Scraping. Unfortunately, the appropriateness and ethics of
using these Web Scraping tools are often overlooked. There are hundreds of web scraping software available today, most of them designed for Java, Python and Ruby. There is also open source software and commercial software.
What are the different types of web scraping approachesAparna Sharma
The importance of Web scraping is increasing day by day as the world is depending more and more on data and it will increase more in the coming future. And web applications like Newsdata.io news API that is working on Web scraping fundamentals. More and more web data applications are being created to satisfy the data-hungry infrastructures. And do check out the top 21 list of web scraping tools in 2022
What is Web Scraping and What is it Used For? | Definition and Examples EXPLAINED
For More details Visit - https://hirinfotech.com
About Web scraping for Beginners - Introduction, Definition, Application and Best Practice in Deep Explained
What is Web Scraping or Crawling? and What it is used for? Complete introduction video.
Web Scraping is widely used today from small organizations to Fortune 500 companies. A wide range of applications of web scraping a few of them are listed here.
1. Lead Generation and Marketing Purpose
2. Product and Brand Monitoring
3. Brand or Product Market Reputation Analysis
4. Opening Mining and Sentimental Analysis
5. Gathering data for machine learning
6. Competitor Analysis
7. Finance and Stock Market Data analysis
8. Price Comparison for Product or Service
9. Building a product catalog
10. Fueling Job boards with Job listings
11. MAP compliance monitoring
12. Social media Monitor and Analysis
13. Content and News monitoring
14. Scrape search engine results for SEO monitoring
15. Business-specific application
------------
Basics of web scraping using python
Python Scraping Library
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING H...ijnlc
Websites are regarded as domains of limitless information which anyone and everyone can access. The
new trend of technology has shaped the way we do and manage our businesses. Today, advancements in
Internet technology has given rise to the proliferation of e-commerce websites. This, in turn made the
activities and lifestyles of marketers/vendors, retailers and consumers (collectively regarded as users in
this paper) easier as it provides convenient platforms to sale/order items through the internet.
Unfortunately, these desirable benefits are not without drawbacks as these platforms require that the users
spend a lot of time and efforts searching for best product deals, products updates and offers on ecommerce websites. Furthermore, they need to filter and compare search results by themselves which takes
a lot of time and there are chances of ambiguous results. In this paper, we applied web crawling and
scraping methods on an e-commerce website to obtain HTML data for identifying products updates based
on the current time. These HTML data are preprocessed to extract details of the products such as name,
price, post date and time, etc. to serve as useful information for users.
DEVELOPING PRODUCTS UPDATE-ALERT SYSTEM FOR E-COMMERCE WEBSITES USERS USING ...kevig
Websites are regarded as domains of limitless information which anyone and everyone can access. The
new trend of technology has shaped the way we do and manage our businesses. Today, advancements in
Internet technology has given rise to the proliferation of e-commerce websites. This, in turn made the
activities and lifestyles of marketers/vendors, retailers and consumers (collectively regarded as users in
this paper) easier as it provides convenient platforms to sale/order items through the internet.
Unfortunately, these desirable benefits are not without drawbacks as these platforms require that the users
spend a lot of time and efforts searching for best product deals, products updates and offers on e-
commerce websites. Furthermore, they need to filter and compare search results by themselves which takes
a lot of time and there are chances of ambiguous results. In this paper, we applied web crawling and
scraping methods on an e-commerce website to obtain HTML data for identifying products updates based
on the current time. These HTML data are preprocessed to extract details of the products such as name,
price, post date and time, etc. to serve as useful information for users.
A Web Extraction Using Soft Algorithm for Trinity Structureiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
a novel technique to pre-process web log data using sql server management studioINFOGAIN PUBLICATION
Web log data available at server side helps in identifying user access pattern. Analysis of Web log data poses challenges as it consists of plentiful information of a Web page. Log file contains information about User name, IP address, Access Request, Number of Bytes Transferred, Result Status, Uniform Resource Locator (URL), User Agent and Time stamp. Analysing the log file gives clear idea about the user. Data Pre-Processing is an important step in mining process. Web log data contains irrelevant data so it has to be Pre-Processed. If the collected Web log data is Pre-Processed, then it becomes easy to find the desire information about visitors and also retrieve other information from Web log data. This paper proposes a novel technique to Pre-Process the Web log data and given detailed discussion about the content of Web log data. Each Uniform Resource Locator (URL) in the Web log data is parsed into tokens based on the Web structure and then it is implemented using SQL server management studio.
In this guide, we will go over all the core concepts of large-scale web scraping and learn everything about it, from challenges to best practices. Large Scale Web Scraping is scraping web pages and extracting data from them. This can be done manually or with automated tools. The extracted data can then be used to build charts and graphs, create reports and perform other analyses on the data. It can be used to analyze large amounts of data, like traffic on a website or the number of visitors they receive. In addition, It can also be used to test different website versions so that you know which version gets more traffic than others.
Large Scale Web Scraping is an essential tool for businesses as it allows them to analyze their audience's behavior on different websites and compare which performs better. Large-scale scraping is a task that requires a lot of time, knowledge, and experience. It is not easy to do, and there are many challenges that you need to overcome in order to succeed. Performance is one of the significant challenges in large-scale web scraping.
The main reason for this is the size of web pages and the number of links resulting from the increased use of AJAX technology. This makes it difficult to scrape data from many web pages accurately and quickly. Web structure is the most crucial challenge in scraping. The structure of a web page is complex, and it is hard to extract information from it automatically. This problem can be solved using a web crawler explicitly developed for this task. Anti-Scraping Technique
Another major challenge that comes when you want to scrape the website at a large scale is anti-scraping. It is a method of blocking the scraping script from accessing the site.
If a site's server detects that it has been accessed from an external source, it will respond by blocking access to that external source and preventing scraping scripts from accessing it. Large-scale web scraping requires a lot of data and is challenging to manage. It is not a one-time process but a continuous one requiring regular updates. Here are some of the best practices for large-scale web scraping:
1. Create Crawling Path
The first thing to scrape extensive data is to create a crawling path. Crawling is systematically exploring a website and its content to gather information.
Data Warehouse
The data warehouse is a storehouse of enterprise data that is analyzed, consolidated, and analyzed to provide the business with valuable information. Proxy Service
Proxy service is a great way to scrape large-scale data. It can be used for scraping images, blog posts, and other types of data from the Internet. Detecting Bots & Blocking
Bots are a real problem for scraping. They are used to extract data from websites and make it available for human consumption. They do this by using software designed to mimic a human user so that when the bot does something on a website, it looks like a real human user was doing it.
STRATEGY AND IMPLEMENTATION OF WEB MINING TOOLSAM Publications
In the current development, millions of clients are accessing daily the internet and World Wide Web (WWW) to search the information and achieve their necessities. Web mining is a technique to automatic discovers and Extract information from www. Websites are a common stage to discussion the information between users. Web mining is one of the applications of Data mining techniques for extracting information from web data. The area of web mining is web content mining, web usage mining and web structure mining. These three category focus on Knowledge discovery from web. Web content mining involves technique for summarization, classification, clustering and the process of extracting or discovering useful information web pages, it includes image, audio, video and metadata. Web usage mining is the process of extracting information from web server logs. Web structure mining it is the process of using graph theory to analyse the node and connection structure of a website and deals with the hyperlink structure of web. Web mining is a part of data mining which relates to various research communities such as information retrieval, database management systems and Artificial intelligence.
A Novel Method for Data Cleaning and User- Session Identification for Web MiningIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
SPEEDING UP THE WEB CRAWLING PROCESS ON A MULTI-CORE PROCESSOR USING VIRTUALI...ijwscjournal
A Web crawler is an important component of the Web search engine. It demands large amount of hardware resources (CPU and memory) to crawl data from the rapidly growing and changing Web. So that the crawling process should be a continuous process performed from time-to-time to maintain up-to-date crawled data. This paper develops and investigates the performance of a new approach to speed up the crawling process on a multi-core processor through virtualization. In this approach, the multi-core processor is divided into a number of virtual-machines (VMs) that can run in parallel (concurrently)
performing different crawling tasks on different data. It presents a description, implementation, and evaluation of a VM-based distributed Web crawler. In order to estimate the speedup factor achieved by the VM-based crawler over a non-virtualization crawler, extensive crawling experiments were carried-out to
estimate the crawling times for various numbers of documents. Furthermore, the average crawling rate in documents per unit time is computed, and the effect of the number of VMs on the speedup factor is investigated. For example, on an Intel® Core™ i5-2300 CPU @2.80 GHz and 8 GB memory, a speedup
factor of ~1.48 is achieved when crawling 70000 documents on 3 and 4 VMs.
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...ijmech
Now the public traffics make the life more and more convenient. The amount of vehicles in large or medium
sized cities is also in the rapid growth. In order to take full advantage of social resources and protect the
environment, regional end-to-end public transport services are established by analyzing online travel data.
The usage of computer programs for processing of the web page is necessary for accessing to a large
number of the carpool data. In the paper, web crawlers are designed to capture the travel data from
several large service sites. In order to maximize the access to traffic data, a breadth-first algorithm is used.
The carpool data will be saved in a structured form. Additionally, the paper has provided a convenient
method of data collecting to the program.
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...ijmech
Now the public traffics make the life more and more convenient. The amount of vehicles in large or medium sized cities is also in the rapid growth. In order to take full advantage of social resources and protect the environment, regional end-to-end public transport services are established by analyzing online travel data. The usage of computer programs for processing of the web page is necessary for accessing to a large number of the carpool data. In the paper, web crawlers are designed to capture the travel data from several large service sites. In order to maximize the access to traffic data, a breadth-first algorithm is used. The carpool data will be saved in a structured form. Additionally, the paper has provided a convenient method of data collecting to the program.
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...ijmech
Now the public traffics make the life more and more convenient. The amount of vehicles in large or medium sized cities is also in the rapid growth. In order to take full advantage of social resources and protect the environment, regional end-to-end public transport services are established by analyzing online travel data. The usage of computer programs for processing of the web page is necessary for accessing to a large number of the carpool data. In the paper, web crawlers are designed to capture the travel data from several large service sites. In order to maximize the access to traffic data, a breadth-first algorithm is used. The carpool data will be saved in a structured form. Additionally, the paper has provided a convenient method of data collecting to the program.
International conference On Computer Science And technologyanchalsinghdm
ICGCET 2019 | 5th International Conference on Green Computing and Engineering Technologies. The conference will be held on 7th September - 9th September 2019 in Morocco. International Conference On Engineering Technology
The conference aims to promote the work of researchers, scientists, engineers and students from across the world on advancement in electronic and computer systems.
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...IAEME Publication
In today ’s global business, the web has been the most important means of communication. Clients and customers may find their products online, which is a benefit of doing business online. Web mining is the process of using data mining tools to analyse and extract the information from a Web pages and applications autonomously. Many firms use web structure mining to generate suitable predictions and judgments for business growth, productivity, manufacturing techniques, and more utilizing data mining business strategies. In the online booking domain, optimum web data mining analysis of web structure is a crucial component that gives a systematic manner of new application towards real-time data with various levels of implications. Web structure mining emphases on the construction of the web's hyperlinks. Linkage administration that is done correctly can lead to future connections, which can therefore increase the prediction performance of learnt models. A increased interest in Web mining, structural analysis research has expanded, resulting in a new research area that sits at the crossroads of work in the network analysis, hyperlink and the web mining, structural training, and empirical software design techniques, as well as graph mining. Web structure mining is the development of determining structure data from the web. The proposed WSM approach is a system of finding the structure of data stored over the Web. Web structure mining can encourage the clients to recover the significant records by breaking down the connection situated structure of Web content. Web structure mining has been one of the most important resources for information extraction and the knowledge discovery as the amount of data available online has increased.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Using a public dataset of images of maritime vessels provided by Analytics Vidhya, manual annotations were made on a subsample of images with Roboflow using the ground truth classifications provided by the dataset. YOLOv5, a prominent open source family of object detection models that comes with an out-of-the-box pre-training on the Common Objects in Context (COCO) dataset, was used to train on annotations of subclassifications of maritime vessels. YOLOv5 provides significant results in detecting a boat. The training, validation, and test set of images trained YOLOv5 in the cloud using Google Colab. Three of our five subclasses, namely, cruise ships, ROROs (Roll On Roll Off, typically car carriers), and military ships, have very distinct shapes and features and yielded positive results. Two of our subclasses, namely, the tanker and cargo ship, have similar characteristics when the cargo ship is unloaded and not carrying any cargo containers. This yielded interesting misclassifications that could be improved in future work. Our trained model resulted in the validation metric of mean Average Precision (mAP@.5) of 0.932 across all subclassification of ships.
Online Teaching Learning (OTL) systems are the future of the education system due to the rapid development in the field of Information Technology. Many existing OTL systems provide distance education services in the present context as well. In this paper, several types of existing OTL systems are explored in order to identify their key features, needs, working, defects and sectors for future development. For this, different aspects, types, processes, impacts, and teaching–learning strategies of various OTL systems were studied. In addition, the paper concludes with some future insights and personal interest in the further development of OTLs on the basis of previous research performed.
More Related Content
Similar to Implementation ofWeb Application for Disease Prediction Using AI
A Web Extraction Using Soft Algorithm for Trinity Structureiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
a novel technique to pre-process web log data using sql server management studioINFOGAIN PUBLICATION
Web log data available at server side helps in identifying user access pattern. Analysis of Web log data poses challenges as it consists of plentiful information of a Web page. Log file contains information about User name, IP address, Access Request, Number of Bytes Transferred, Result Status, Uniform Resource Locator (URL), User Agent and Time stamp. Analysing the log file gives clear idea about the user. Data Pre-Processing is an important step in mining process. Web log data contains irrelevant data so it has to be Pre-Processed. If the collected Web log data is Pre-Processed, then it becomes easy to find the desire information about visitors and also retrieve other information from Web log data. This paper proposes a novel technique to Pre-Process the Web log data and given detailed discussion about the content of Web log data. Each Uniform Resource Locator (URL) in the Web log data is parsed into tokens based on the Web structure and then it is implemented using SQL server management studio.
In this guide, we will go over all the core concepts of large-scale web scraping and learn everything about it, from challenges to best practices. Large Scale Web Scraping is scraping web pages and extracting data from them. This can be done manually or with automated tools. The extracted data can then be used to build charts and graphs, create reports and perform other analyses on the data. It can be used to analyze large amounts of data, like traffic on a website or the number of visitors they receive. In addition, It can also be used to test different website versions so that you know which version gets more traffic than others.
Large Scale Web Scraping is an essential tool for businesses as it allows them to analyze their audience's behavior on different websites and compare which performs better. Large-scale scraping is a task that requires a lot of time, knowledge, and experience. It is not easy to do, and there are many challenges that you need to overcome in order to succeed. Performance is one of the significant challenges in large-scale web scraping.
The main reason for this is the size of web pages and the number of links resulting from the increased use of AJAX technology. This makes it difficult to scrape data from many web pages accurately and quickly. Web structure is the most crucial challenge in scraping. The structure of a web page is complex, and it is hard to extract information from it automatically. This problem can be solved using a web crawler explicitly developed for this task. Anti-Scraping Technique
Another major challenge that comes when you want to scrape the website at a large scale is anti-scraping. It is a method of blocking the scraping script from accessing the site.
If a site's server detects that it has been accessed from an external source, it will respond by blocking access to that external source and preventing scraping scripts from accessing it. Large-scale web scraping requires a lot of data and is challenging to manage. It is not a one-time process but a continuous one requiring regular updates. Here are some of the best practices for large-scale web scraping:
1. Create Crawling Path
The first thing to scrape extensive data is to create a crawling path. Crawling is systematically exploring a website and its content to gather information.
Data Warehouse
The data warehouse is a storehouse of enterprise data that is analyzed, consolidated, and analyzed to provide the business with valuable information. Proxy Service
Proxy service is a great way to scrape large-scale data. It can be used for scraping images, blog posts, and other types of data from the Internet. Detecting Bots & Blocking
Bots are a real problem for scraping. They are used to extract data from websites and make it available for human consumption. They do this by using software designed to mimic a human user so that when the bot does something on a website, it looks like a real human user was doing it.
STRATEGY AND IMPLEMENTATION OF WEB MINING TOOLSAM Publications
In the current development, millions of clients are accessing daily the internet and World Wide Web (WWW) to search the information and achieve their necessities. Web mining is a technique to automatic discovers and Extract information from www. Websites are a common stage to discussion the information between users. Web mining is one of the applications of Data mining techniques for extracting information from web data. The area of web mining is web content mining, web usage mining and web structure mining. These three category focus on Knowledge discovery from web. Web content mining involves technique for summarization, classification, clustering and the process of extracting or discovering useful information web pages, it includes image, audio, video and metadata. Web usage mining is the process of extracting information from web server logs. Web structure mining it is the process of using graph theory to analyse the node and connection structure of a website and deals with the hyperlink structure of web. Web mining is a part of data mining which relates to various research communities such as information retrieval, database management systems and Artificial intelligence.
A Novel Method for Data Cleaning and User- Session Identification for Web MiningIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
SPEEDING UP THE WEB CRAWLING PROCESS ON A MULTI-CORE PROCESSOR USING VIRTUALI...ijwscjournal
A Web crawler is an important component of the Web search engine. It demands large amount of hardware resources (CPU and memory) to crawl data from the rapidly growing and changing Web. So that the crawling process should be a continuous process performed from time-to-time to maintain up-to-date crawled data. This paper develops and investigates the performance of a new approach to speed up the crawling process on a multi-core processor through virtualization. In this approach, the multi-core processor is divided into a number of virtual-machines (VMs) that can run in parallel (concurrently)
performing different crawling tasks on different data. It presents a description, implementation, and evaluation of a VM-based distributed Web crawler. In order to estimate the speedup factor achieved by the VM-based crawler over a non-virtualization crawler, extensive crawling experiments were carried-out to
estimate the crawling times for various numbers of documents. Furthermore, the average crawling rate in documents per unit time is computed, and the effect of the number of VMs on the speedup factor is investigated. For example, on an Intel® Core™ i5-2300 CPU @2.80 GHz and 8 GB memory, a speedup
factor of ~1.48 is achieved when crawling 70000 documents on 3 and 4 VMs.
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...ijmech
Now the public traffics make the life more and more convenient. The amount of vehicles in large or medium
sized cities is also in the rapid growth. In order to take full advantage of social resources and protect the
environment, regional end-to-end public transport services are established by analyzing online travel data.
The usage of computer programs for processing of the web page is necessary for accessing to a large
number of the carpool data. In the paper, web crawlers are designed to capture the travel data from
several large service sites. In order to maximize the access to traffic data, a breadth-first algorithm is used.
The carpool data will be saved in a structured form. Additionally, the paper has provided a convenient
method of data collecting to the program.
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...ijmech
Now the public traffics make the life more and more convenient. The amount of vehicles in large or medium sized cities is also in the rapid growth. In order to take full advantage of social resources and protect the environment, regional end-to-end public transport services are established by analyzing online travel data. The usage of computer programs for processing of the web page is necessary for accessing to a large number of the carpool data. In the paper, web crawlers are designed to capture the travel data from several large service sites. In order to maximize the access to traffic data, a breadth-first algorithm is used. The carpool data will be saved in a structured form. Additionally, the paper has provided a convenient method of data collecting to the program.
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...ijmech
Now the public traffics make the life more and more convenient. The amount of vehicles in large or medium sized cities is also in the rapid growth. In order to take full advantage of social resources and protect the environment, regional end-to-end public transport services are established by analyzing online travel data. The usage of computer programs for processing of the web page is necessary for accessing to a large number of the carpool data. In the paper, web crawlers are designed to capture the travel data from several large service sites. In order to maximize the access to traffic data, a breadth-first algorithm is used. The carpool data will be saved in a structured form. Additionally, the paper has provided a convenient method of data collecting to the program.
International conference On Computer Science And technologyanchalsinghdm
ICGCET 2019 | 5th International Conference on Green Computing and Engineering Technologies. The conference will be held on 7th September - 9th September 2019 in Morocco. International Conference On Engineering Technology
The conference aims to promote the work of researchers, scientists, engineers and students from across the world on advancement in electronic and computer systems.
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...IAEME Publication
In today ’s global business, the web has been the most important means of communication. Clients and customers may find their products online, which is a benefit of doing business online. Web mining is the process of using data mining tools to analyse and extract the information from a Web pages and applications autonomously. Many firms use web structure mining to generate suitable predictions and judgments for business growth, productivity, manufacturing techniques, and more utilizing data mining business strategies. In the online booking domain, optimum web data mining analysis of web structure is a crucial component that gives a systematic manner of new application towards real-time data with various levels of implications. Web structure mining emphases on the construction of the web's hyperlinks. Linkage administration that is done correctly can lead to future connections, which can therefore increase the prediction performance of learnt models. A increased interest in Web mining, structural analysis research has expanded, resulting in a new research area that sits at the crossroads of work in the network analysis, hyperlink and the web mining, structural training, and empirical software design techniques, as well as graph mining. Web structure mining is the development of determining structure data from the web. The proposed WSM approach is a system of finding the structure of data stored over the Web. Web structure mining can encourage the clients to recover the significant records by breaking down the connection situated structure of Web content. Web structure mining has been one of the most important resources for information extraction and the knowledge discovery as the amount of data available online has increased.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Using a public dataset of images of maritime vessels provided by Analytics Vidhya, manual annotations were made on a subsample of images with Roboflow using the ground truth classifications provided by the dataset. YOLOv5, a prominent open source family of object detection models that comes with an out-of-the-box pre-training on the Common Objects in Context (COCO) dataset, was used to train on annotations of subclassifications of maritime vessels. YOLOv5 provides significant results in detecting a boat. The training, validation, and test set of images trained YOLOv5 in the cloud using Google Colab. Three of our five subclasses, namely, cruise ships, ROROs (Roll On Roll Off, typically car carriers), and military ships, have very distinct shapes and features and yielded positive results. Two of our subclasses, namely, the tanker and cargo ship, have similar characteristics when the cargo ship is unloaded and not carrying any cargo containers. This yielded interesting misclassifications that could be improved in future work. Our trained model resulted in the validation metric of mean Average Precision (mAP@.5) of 0.932 across all subclassification of ships.
Online Teaching Learning (OTL) systems are the future of the education system due to the rapid development in the field of Information Technology. Many existing OTL systems provide distance education services in the present context as well. In this paper, several types of existing OTL systems are explored in order to identify their key features, needs, working, defects and sectors for future development. For this, different aspects, types, processes, impacts, and teaching–learning strategies of various OTL systems were studied. In addition, the paper concludes with some future insights and personal interest in the further development of OTLs on the basis of previous research performed.
The goal of cyber security is to protect the internet against online attacks. One of the most frequently used terminologies in cybersecurity is “cyber-threats,” which refers to the use of information and communication technology (ICT) for hostile purposes by a range of criminals. The complexity of cyber security architecture makes higher safety measures necessary to guard against system flaws and potential global catastrophes. One of the most important security issues today is cyber attack. a cyber breach could make nuclear systems safety and security safeguards ineffective, which is especially important for nuclear systems. India, which has a sizable and developing nuclear programme, has a similar situation. Over the past few decades, governments, including India, have invested a significant amount of time and money in building effective physical safeguards for nuclear installations, which has raised the risk of a cyber or hybrid attack. The risk of hacking, disruption, or sabotage rises as nuclear infrastructure becomes more and more reliant on cyber technology. Any cyberattack’s antagonistic goal is to take advantage of a system’s weaknesses in order to take over, operate, and keep a presence on the target system. Designing standards that can accommodate both immediate and long-term needs is crucial due to the sensitivity of nuclear materials and infrastructure. The study’s objective is to discuss the proposed arrangement of cyber security in nuclear domain in India.
Predicting disease at an early stage becomes critical, and the most difficult challenge is to predict it correctly along with the sickness. The prediction happens based on the symptoms of an individual. The model presented can work like a digital doctor for disease prediction, which helps to timely diagnose the disease and can be efficient for the person to take immediate measures. The model is much more accurate in the prediction of potential ailments. The work was tested with four machine learning algorithms and got the best accuracy with Random Forest.
Text mining and nature language processing (NLP) have become important tools in many research areas. Text mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. This task conducted a series of text mining jobs mainly based on the New York Times news titles corpus from January 2020 to April 2021. This task also did some analyses based on the US congressional speeches during the same period. The result shows that compared with the focuses of US congressional speeches, the focuses of New York Times news titles better reflected the changing hotspot issues over time.
Precision agriculture relies heavily on information technology, which also aids agronomists in their work. Weeds usually grow alongside crops, reducing the production of that crop. They are controlled by herbicides. The pesticide may harm the crop as well if the type of weed is not identified. To control weeds on farms, it is required to identify and classify them. A convolutional network or CNN, a deep learning-based computer vision technology, is used to evaluate images. A methodology is proposed to detect weeds using convolutional neural networks. There were two primary phases in this proposed methodology. The first phase is image collection and labeling, in which the features for images to be labeled for the base images are extracted. In the second phase, the convolutional neural network model is constructed by 20 layers to detect the weed. CNN architecture has three layers, namely, the convolutional layer, the pooling layer, and the dense layer. The input image is given to a convolutional layer to extract the features from the image. The features are given to the pooling layer to compress the image to reduce the computational complexity. The dense layer is used for final classification. The performance of the proposed methodology is assessed using agricultural dataset images taken from the Kaggle database.
Virtual machines are popular because of their efficiency, ease of use, and flexibility. There has been an increasing demand for the deployment of a robust distributed network for maximizing the performance of such systems and minimizing the infrastructural cost. In this study, we have discussed various levels at which virtualization can be implemented for distributed computing, which can contribute to increased efficiency and performance of distributed computing. The study gives an overview of various types of virtualization techniques and their benefits. For example, server virtualization helps to create multiple server instances from one physical server. Such techniques will decrease the infrastructure costs, make the system more scalable, and help in the full utilization of available resources.
The main aim of this study is to reframe the possibilities in healthcare with the aid of blue brain technology. In general, blue brain is usually associated with the preservation of the intelligence of individuals for the future. This study has stepped ahead by describing the other possible solutions that can be provided by implementing the blue brain technology in the medical field. The possibilities for decreasing the demise rates that occur due to the complications in the brain have been discussed. The blue brain can be used for monitoring the conditions of the brain, based on which the brain diseases can be diagnosed and cured in advance. In this study, the details about the blue brain, its functions, simulations, and upgradations of the human brain are explored in depth. The future enhancements and predictions in the field of the blue brain that can benefit humanity are also being discussed in this study.
More from BOHR International Journal of Computer Science (BIJCS) (8)
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Immunizing Image Classifiers Against Localized Adversary Attacks
Implementation ofWeb Application for Disease Prediction Using AI
1. ISSN (online) 2583-455X
BOHR International Journal of Computer Science
2021, Vol. 1, No. 1, pp. 6–10
https://doi.org/10.54646/bijcs.002
www.bohrpub.com
Implementation of Web Application for Disease Prediction Using AI
Manasvi Srivastava, Vikas Yadav and Swati Singh∗
IILM, Academy of Higher Learning, College of Engineering and Technology Greater Noida,
Uttar Pradesh, India
∗Corresponding author: swati.singh@iilm.edu
Abstract. The Internet is the largest source of information created by humanity. It contains a variety of materials
available in various formats, such as text, audio, video, and much more. In all, web scraping is one way. There is a
set of strategies here in which we get information from the website instead of copying the data manually. Many web-
based data extraction methods are designed to solve specific problems and work on ad hoc domains. Various tools
and technologies have been developed to facilitate web scraping. Unfortunately, the appropriateness and ethics of
using these web scraping tools are often overlooked. There are hundreds of web scraping software available today,
most of them designed for Java, Python, and Ruby. There is also open-source software and commercial software.
Web-based software such as YahooPipes, Google Web Scrapers, and Firefox extensions for Outwit are the best tools
for beginners in web cutting. Web extraction is basically used to cut this manual extraction and editing process and
provide an easy and better way to collect data from a web page and convert it into the desired format and save it to
a local or archive directory. In this study, among other kinds of scrub, we focus on those techniques that extract the
content of a web page. In particular, we use scrubbing techniques for a variety of diseases with their own symptoms
and precautions.
Keywords: Web Scraping, Disease, Legality, Software, Symptoms.
INTRODUCTION
Web scraper is a process for downloading and extract-
ing important data by scanning a web page. Web scrap-
ers work best when page content is either transferred,
searched, or modified. The collected information is then
copied to a spreadsheet or stored in a database for further
analysis. For the ultimate purpose of analysis, data need
to be categorized by progressively different developments,
for example, by starting with its specification collection,
editing process, cleaning process, remodeling, and using
different models and various algorithms and end result.
There are two ways to extract data from websites: the
first is the manual extraction process and the second is
the automatic extraction process. Web scrapers compile
site information in the same way that a person can do
that by removing access to a web page of the site, finding
relevant information, and moving on to the next web
page. Each website has a different structure that is why
web scrapers are usually designed to search through a
website. Web deletion can help in finding any kind of
targeted information. We will then have the opportunity
to find, analyze, and use information in the way we
need it. Web logging therefore paves the way for data
acquisition, speeds up automation, and makes it easier to
access extracted data by rendering it in comma-separated
values (CSV) pattern. Web publishing often removes a lot
of data from websites, for example, monitoring consumer
interests, price monitoring (e.g., price checking), advancing
AI models, data collection, tracking issues, and so on. So,
there is no doubt that web removal is a systematic way to
get more data from websites. It requires two stages, mainly
crawling and removal. A search engine is an algorithm
designed by a person who goes through the web to look
for specific information needed by following online links.
Deleter is a specific tool designed to extract data from sites.
Web scraper will work that way; if the patient is suffering
from any kind of illness or illness, he will add his symp-
toms and problems and when the crawl work starts, he
will start scrolling and look for a disease from the database
provided on the website and it will show the best disease
like patient symptoms. When those specific diseases show
6
2. Implementation of Web Application for Disease Prediction Using AI 7
up, they will also show the precautionary measures that
the patient needs to take care in order to overcome them
and treat the infection.
OVERVIEW OF WEB SCRAPING
Web scraping is a great way to extract random data from
websites and convert that data into organized data that can
be stored and analyzed in a database. Web scraping is also
known as web data extraction, web data removal, web har-
vesting, or screen scanning. Web scraping is a form of data
mining. The whole purpose of the web crawling process is
to extract information from websites and convert it into an
understandable format such as spreadsheets, a database, or
a CSV, as shown in Figure 1. Data such as item prices, stock
prices, various reports, market prices, and product details
can be collected with web termination. Extracting website-
based information helps to make effective decisions for
your business.
Figure 1. Web scraping structure.
PRACTICES OF WEB SCRAPING
• Data scraping
• Research
• Web mash up—integrate data from multiple sources
• Extract business details from business directory web-
sites, such as Yelp and Yellow pages
• Collect government data
• Market analysis
The web data scraper process, a software agent, also
known as a Web robot, mimics browsing communication
between web servers and a person using a normal web
browser. Step by step, the robot enters as many websites as
it needs, transfers its content to find and extract interesting
data, and builds that content as desired. AP scraping
APIs and frameworks address the most common web data
scrapers involved in achieving specific recovery goals, as
described in the following text.
Hypertext Transfer Protocol (HTTP)
This method is used to extract data from static and
dynamic web pages. Data can be retrieved by sending
HTTP requests to a remote web server using a socket
system.
Hyper Text Markup Language (HTML)
Exploring languages for query data, such as XQuery and
Hyper Text Query Language (HTQL), can be used to scan
HTML pages and retrieve and modify content on the page.
Release Structure
The main purpose is to convert the published content into
a formal representation for further analysis and retention.
Although this last step is on the side of web scraping, some
tools are aware of post-results, providing memory data
formats and text-based solutions, such as cables or files
(XML or CSV files).
LITERATURE SURVEY
Python has a rich set of libraries available for downloading
digital content online. Among the libraries available, the
following three are the most popular ones: BeautifulSoup,
LXml, and RegEx. Statistical research performed on the
available data sets indicated that RegEx was able to deliver
the requested information at an average rate of 153.6 ms.
However, RegEx has limitations of data extraction of web
pages with internal HTML tags. Because of this demerit,
RegEx is used to perform complex data extraction only.
Some libraries, such as BeautifulSoup and LXml, are able to
extract content from web pages in a complex environment
that has yielded a response rate of 457.66 and 203 ms,
respectively.
The main purpose of data analysis is to get useful
information from data and make decisions based on that
analysis. Web deletion refers to the collection of data on
the web. Web scraping is also known as data scraping. Data
analysis can be divided into several steps such as cleaning
and editing. Scrapy is the most widely used source of
information needed by the user. The main purpose of using
scrapy is to extract data from its sources. Scrapy, which
crawls on the web and is based on python programming
language, is very helpful in finding the data we need by
using the URLs needed to clear the data from its sources.
Web scraper is a useful API to retrieve data from a website.
Scrapy provides all the necessary tools to extract data from
a website, process data according to user needs, and store
data in a specific format as defined by users.
The Internet is very much looking at web pages that
include a large number of descriptive elements including
text, audio, graphics, video, etc. This process, called web
scraping, is mainly responsible for the collection of raw
data from the website. It is a process in which you extract
data automation very quickly. The process enables us to
extract specific data requested by the user. The most popu-
lar method used is to create individual web data structures
using any known language.
3. 8 Manasvi Srivastava et al.
EXPERIMENTAL WORK
Technology Used
Firebase
For database, we have used Cloud Firestore from firebase.
It is a real-time NoSQL database that stores two key-value
data in the form of collections and documents.
TensorFlow
TensorFlow is used to train the database model and to
make predictions. There are various algorithms for mod-
eling training or using line format in our project.
JavaScript Frameworks
• Node.js
Node.js is an open-source, cross-platform,
JavaScript runtime environment running back to
V8 engine and extracting JavaScript code without a
web browser.
Our rewriting code is written for Node.js as it is a
fast and platform language.
• ElectronJS
Electron is a framework for building native appli-
cations with web technologies such as JavaScript,
HTML, and CSS. As Electron is used to create a short
web application, it helps us write our code and thus
reduces the development time.
• ReactJS
React makes it less painful to create interactive
UIs. Design a simple view of each state in your app
and React will carefully review and provide relevant
sections as your data changes.
React can also render to the server using Node
and has the power of mobile applications using React
Native.
PYTHON
Python is a high-level programming language translated
into high-level translations.
In this project, various libraries, such as pandas, NumPy,
good soup, etc., are used to create our database. Pandas
and NumPy are used to filter and process data needed to
train our model by extracting and removing them from a
separate data source.
COMPATIBILITY
OS X
Only 64-bit binaries are provided for OS X, and the lower
version of OS X is supported by OS X 10.9.
Windows
Electron supports Windows 7 and later, but older versions
of the OS are not supported.
Both x86 and amd64 (x64) binary are provided for
Windows and are not supported in the ARM version of
Windows.
Software Used
– VSCode
Visual Studio Code is a freeware source code editor
developed by Microsoft for Windows, Linux, and
macOS. Features include debugging support, syntax
highlighting, intelligent coding, captions, reuse code,
and embedded Git.
– Google colab notebook
Colaboratory, or Colab for short, is a product of
Google Research that allows developers to write and
use Python code through their browsers. Google
Colab is an excellent tool for in-depth learning activi-
ties. It is a compact Jupyter notebook that needs no
setup and has an excellent free version, providing
free access to Google computer resources such as
GPUs and TPUs.
– PyCharm
PyCharm is an integrated development platform
used for computer programs, especially in the
Python language, developed by the Czech company
JetBrains.
Data Source
As we did not get more than 40 diseases, to get a dataset
we have created our own dataset. The dataset that we have
used for our training and testing process has taken from
various sources. One of them is added below.
– https://github.com/DiseaseOntology/HumanDise
aseOntology
Use of Scrapy
Scrapy is a framework for crawling and retrieving nonfic-
tion data that can be used for the size of a supportive appli-
cation, such as data mining, managed, or actual reported
data. Apart from the way it was originally expected for
Scrapy to be removed from the web, it could be used in the
same way to extract data using APIs, for example, Amazon
AWS, or as a very important web browser. Scrappy is
written in Python. Let us take a Wiki example related to
one of these problems. A simple online photo gallery can
provide three options to users as defined by HTTP GET
parameters at URL. If there are four ways to filter images
with three thumbnail-sized options, two file formats, and
4. Implementation of Web Application for Disease Prediction Using AI 9
a user-provided disabling option, then the same content
set can be accessed with different URLs, all of which can
be linked to the site. This carefully crafted combination
creates a problem for the pages as they have to plan with
an endless combination of subtitle changes to get different
content.
Methodology
The method used by the project to collect all the required
data is extracted from various sources such as the CDC’s
database and Kaggle resources. Then, analyze the extracted
data using texts written in the Python language according
to project requirements. Pandas and NumPy are widely
used to perform various functions on the database.
After sorting the data according to each need, it is then
uploaded to the database. In the database, we have used
Cloud Firestore as it is a real-time NoSQL database with
extensive API support.
Furthermore, the TensorFlow project is used to train our
model according to needs.
In this project, we predict the disease because of the
given symptoms.
Training data set: 70%
Setting test data: 30%
TensorFlow supports Linear Regression, which is used
to predict diseases based on the given indicators.
Coding
Project Frontend is written using ReactJS & TypeScript.
However, we have used the MaterialUI kit from Google
ReactJS to speed up our development process.
To provide our app, Electron is used. Our web system
supports macOS and Windows. Most of today’s web app
are written with the help of ElectronJS.
Testing
The project is tested using an Electron built-in test frame-
work called Spectron.
The project is being implemented in the browser. The
output generated turns out to be completely consistent and
the generated analysis is approximate.
Electron’s standard workflow with Spectron can involve
engineers who write unit tests in the standard TDD format
and then write integration tests to ensure that acceptance
criteria are met before approving a feature to be used.
Continuous integration servers can ensure that all these
tests are passed before they are incorporated into the
production.
Algorithm Used
Linear Regression is a standard mathematical method that
allows us to study a function or relationship in a given set
of continuous data. For example, we are given some of the
corresponding x and y data points, and we need to study
the relationship between them called a hypothesis.
In the event of a line reversal, the hypothesis is a straight
line, i.e.,
Where the vector is called Weights and b is a scale called
bias. Weights and bias are called model parameters.
All we need to do is estimate the values of w and b
from the set of data, given that the result of the assumption
has produced the minimum cost J defined by the next cost
function
where m is the number of data points in the data pro-
vided. This cost function is also called the mean squared
error.
To find the optimized value of the J’s minimum param-
eters, we will be using a widely used optimizer algorithm
called Gradient Descent. The following is a fake Gradient
Descent code:
RESULT DISCUSSION
The overall results of the project are useful in predicting
diseases with the given symptoms. The script that was
written to extract data can be used later to compile and
format it according to needs.
Users can pick up symbols by typing them themselves
or by selecting them from the given options. The training
model will predict the disease according to it. Users are
able to create their own medical profile, where they can
submit their medical records and prescribed medication;
this greatly helps us to feed our database and better predict
disease over time, as some of these diseases occur directly
during the season.
Moreover, the analysis performed showed a very sim-
ilar disease, but the training model lacks the size of the
database.
CONCLUSIONS AND FUTURE SCOPE
The use of the Python program also emphasizes under-
standing the use of pattern matching and general expres-
sions for web releases. Database data are compiled from
factual reports and sent directly to government media out-
lets for local media where it is considered reliable. A team
of experts and analysts who validate the information from
a continuous list of more than 5,000 items is likely to be the
site that collects data effectively. User-provided inputs are
analyzed and deleted from the website, and the output is
extracted as the user enters the user interface encounters.
Output is generated in the form of text. This method is
simple and straightforward to eradicate the disease from
companies and provides vigilance against that disease.
5. 10 Manasvi Srivastava et al.
For future work, we plan tests that aim to show the med-
ication that a patient can take for treatment. In addition,
we are looking to link this website to various hospitals and
pharmacies for easy use.
REFERENCES
[1] Thivaharan. S, Srivatsun. G and Sarathambekai. S, “A Survey on
Python Libraries Used for Social Media Content Scraping”, Proceed-
ings of the International Conference on Smart Electronics and Com-
munication (ICOSEC 2020) IEEE Xplore Part Number: CFP20V90-
ART; ISBN: 978-1-7281-5461-9, PP: 361–366.
[2] Shreya Upadhyay, Vishal Pant, Shivansh Bhasin and Mahantesh K
Pattanshetti “Articulating the Construction of a Web Scraper for
Massive Data Extraction”, 2017 IEEE.
[3] Amruta Kulkarni, Deepa Kalburgi and Poonam Ghuli, “Design of
Predictive Model for Healthcare Assistance Using Voice Recogni-
tion”, 2nd IEEE International Conference on Computational Sys-
tems and Information Technology for Sustainable Solutions 2017,
PP: 61–64.
[4] Dimitri Dojchinovski, Andrej Ilievski, Marjan Gusev Interactive home
healthcare system with integrated voice assistant MIPRO 2019,
PP: 284–288 Posted: 2019.
[5] Mohammad Shahnawaz, Prashant Singh, Prabhat Kumar and Dr.
Anuradha Konidena, “Grievance Redressal System”, International
Journal of Data Mining and Big Data, 2020, Vol. 1, No. 1,
PP. 1–4.
Hyperlink of Research Paper
1. https://www.sciencedirect.com/science/article/pii/S2352914817302253