SlideShare a Scribd company logo
1 of 4
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1369
Time -Travel on the Internet
Pranita Pingale1, Prithvi Prathapan2, Sahil Mulay3, Siddhesh Patil4, Dhananjay Sharma5
1Assistant Professor, Dept. of Computer Engineering, Pillai College of Engineering, Maharashtra, India.
2-6UG Student, Dept. of Computer Engineering, Pillai College of Engineering, Maharashtra, India.
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract – The World Wide Web is a unique, widely used
information tool that is used worldwide. Content is
disappearing at an alarming rate, not only endangering our
digital cultural memory but also organizational
accountability. An effective technical solution to such a
problem is the introduction of the Internet Archiving System
aka, The “Digital Preservation Handbook”. Web archiving is a
process of gathering parts of the World Wide Web to ensure
that information is archived by Digital Forensic archives,
future researchers, historians, and the public. We focus on
building a robust and software-based tool that stores and
evaluates certain web clips / videos, using Web Crawling and
data analysis to help individuals monitor and analyze online
data. In Layman's terms, our software visits a set of URLs
called seeds and identifies all hyperlinks to that seed, copying
and saving information as its "crawls" seeds. This information
is stored as an overview of the WARC file format so when it is
played again with the WARC viewer, it appears as captured.
With this project we strive to meet the primary goal of
providing a consistent, easy-to-use, manageable and secure
system that allows users with limited technical knowledge to
easily download web content for archivingpurposes. Software
will be available free of charge for the benefit of the
international community that maintains web history.
Key Words: Web crawler, Information retrieval, WARC, ,
Web Scraping, Data gathering.
1.INTRODUCTION
Web archive is a process of collecting data recorded on the
World Wide Web, storing it, ensuring that the archives are
archived, and making the collected data available for future
research. The purpose of a web archive is to compile web
content and store it for longer to deliver valuable data for
generations. The volume of content from the digital channel,
including the web, is growing exponentially, and the need to
suspend digital media storage was needed. The space data
consulting committee (CCSDS) has developed an open
reference archive (OAIS) reference system, which is ISO
14,721 for long-term digital record keeping, and many web
archiving programs are up to standard. However, the OAIS
reference model does not provide a method of operationbut
imposes a requirement to ensure OAIS-compliance. There is
a weakness in content integrity against unauthorized
practices, and there is no way to guarantee long-term
content integrity, although the OAIS reference model states
long-term preservation of digital records. Also, the user is
able to track, monitor and analyze data.
2. Literature Survey
A. An overview of Web Archiving: Jinfang Niu shared his
views on some of the methods used, such as how the
traditional concepts of archive management and theory can
be applied to the organization and the definition of archived
web resources. This approach is a review of the methods
used in various universities, as well as international
government libraries and archives, selecting, acquiring,
defining and accessing web resources for their archives.
B. Accessing web archives at scale through a cloudbased
interface: This paper introduces Archives UnleashedCloud,
a web-based visual interface and web archives on average.
The current access paradigms, largely driven by the breadth
and scope of web archives, often involve the use of a
command line and a coding code.
C. Investigating websites and webpages: This chapter
aims to show the searcher how websites work, howtheycan
help with research and how to properly document website
evidence. This chapter introduces the readertotheconcepts
of HTML, and its application in research. It also discusses
what can be found on the website and how to view metadata
in embedded images, texts, and videos.
D. The Wayback machine: Benefits of Web Storage: This
paper focuses on Web archiving, awebarchivecampaignand
shows how lost websites can be recovered using a back-up
machine.
E. Design of an Enhanced Web Archiving System: The
project proposes a BClinked archive program to maintain
content integrity using blockchain technology and attempts
to extend WARC file details.
2.1 Summary of Related Work
The summary of methods used in literature is given in Table
1.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1370
Table 1 Summary of literature survey
3. Proposed Work
We focus on building a robust and software-based tool that
stores and evaluates snippets / videos of specific websites,
using Web Crawling and data analysis to help individuals
monitor and analyze online data. In addition, a web archive
system with authentic content authentication can provide a
reliable web archive. We suggest a new blockchain-based
web archive to solve this challenge, and records webarchive
and content archive activity in blockchain databases.
Literature Advantages Disadvantages
Jinfang Niu, 2012. [1] This overviewis
based on a
comprehensive
review of
literature that
explains how
web
archiving is
being done.
However, none of
the literature
directly addresses
the knowledge and
skills required by
the professionals
in the field who
perform the daily
routine of
selecting,
acquiring and
cataloging web
archives.
Nick Ruest , Samantha
Fritz and Ryan
Deschamps.
January 2021
[2]
If the
Archives
Unleashed
Cloud, unlocks
the potential of
web archives by
developing and
providing the
tools, via a
cloud service,
for scholars
with limited –
but not none
technical
expertise to
explore
archived web
content.
The problem
encountered,
like many
digital
humanities
projects, is that of
project
sustainability
B T Sampath Kumar,
January 2015. [4]
Due to the loss
of web
informationdue
to
several
reasons like
network
failures, URL
failure, etc.
wayback
machine can be
used to recover
some
percentage of
the missing web
site.
N/A
ToddG. Shipley,
Art Bowker,2017.
(An Introduction to
Solving Crimes in
Cyberspace).
[3]
With a little
understanding
the
investigator
can identify
who owns
the site,
information
about the
contents, and
potentially
useful
metadata from
the source
code as well as
images and
other items
embedded on
the site.
But scraping and
using
information of
social media
platforms or an
individual’s
webpage
without the
owner’s
permission is
not permissible
by law.
Hwang, Hyun C., Jin G.
Shon, and Ji S.
Park 2020[5] Blockchain
technology
is that
connecting the
previous node
by using
cryptograph y
to secure all
data in a
blockchain,
and it is the
decentralizat
ion system
which can be
shared via
peer-topeer
network.
However, in the
BCLinked web
archiving system
because all
blockchain node
data should be
rebuilt even if
only one single
content, it takes
longer time.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1371
3.1Proposed System Architecture
The system architecture is given in Figure 1. Each block is
described in this Section.
Content crawling: This block covers all crawling activities
on the web. Crawler is a computer program that
automatically searches documents on the web. Searches are
scheduled for repeated actions to make browsing default. In
our case, the searcher picks up a target URL and begins to
identify the URL and all the hyperlinksembeddedinit. While
you visit the links, the searcherfindssummariesof eachpage
visited, and generates a WARC file for pages. All this
information is stored in a web archive, and is later pushed to
the blockchain domain.
Domain information storing: The first level of the
Blockchain database is the Domain blockchain. This block
enables domain information storage in the domain
blockchain. Whenever a target URL is posted, a new node or
block of Domain blockchainisgeneratedwiththesame name
as the URL domain. If the target domain already exists in the
domain blockchain, incoming content is stored in the
existing block. If the target domain is not available, a new
block is created and added during the domain information
storage. The blockchain blocks / node in this category
contains only the domain name and general URL
information. Nodes are linked to key content stored in Web
Content Nodes.
Content information storing: The second level of
Blockchain database is theWebContent blockchain.Building
a new domain block in the previous phase, also leads to the
creation of a web-based content block or node that stores
domain-related content. The need for this block arises as all
content in the WARC file can be too large to place in a
domain blockchain due to block size limitations. Therefore
only the amount that can identify content in a web content
block is added to the domain block. Snapshots and WARC
files found in the first section are stored in content blockson
the web.
Requirement Analysis
The implementation detail is given in this section.
3.1 Software
Operating System Windows 10
Programming
Language
Python
3.2 Hardware
Processor 3.5 GHz Intel
HDD 1 TB
RAM 8 GB
Implementation:
Acknowledge
It is our privilege to express our sincerest regards to our
supervisor Professor Pranita Pingale for thevaluableinputs,
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1372
able guidance, encouragement, whole-hearted cooperation
and constructive criticism throughout the duration of this
work. We deeply express our sincere thanks to our Head of
the Department Dr. Sharvari Govilkar and our Principal Dr.
Sandeep M. Joshi for encouraging and allowing us to present
this work.
Conclusion
In this paper, we have conveyed the benefits of webscraping
and analyzing the context of a single research center of
interest, small firms innovating in the raw materials
industry. We have provided a six-step approach to facilitate
duplication of our processes through website analysis and
social science communities. We hope this work provides a
useful approach to the wider community of social science
who is interested but who is not yet free to use online data.
Although the Wayback Machine is an important resourcefor
scholars in all fields, we believe that social scientists in
particular have not yet been able to tap the full potential of
online data resources. Going forward, there are significant
opportunities to incorporate additional analytical tools and
statistical models while simultaneously evaluating, refining,
and reporting on the appropriateness of measures and
results. However, one should keep in mind that both
automatic and manual effort is required to create high
quality information that can provide these benefits and
overcome the limitations of archived information.
References
1. Niu, Jinfang, "An Overview of Web Archiving"
(2012). School of Information Faculty Publications.
308.
2. Ruest, Nick & Fritz, Samantha & Deschamps,Ryan&
Lin, Jimmy & Milligan, Ian. (2021). From archive to
analysis: accessing web archives at scale through a
cloud-based interface. International Journal of
Digital Humanities. 10.1007/s42803-020-00029-6.
3. Yiu-ming Todd G. Shipley, Art Bowker, Chapter 13 -
Investigating Websites and Webpages, Editor(s):
Todd G. Shipley, Art Bowker, Investigating Internet
Crimes, Syngress, 2014.
4. Sampath Kumar, B T. (2015). Wayback machine: A
Boon for Preserving Web Sources.
5. Hwang, H.C.; Shon, J.G.; Park, J.S. Design of an
Enhanced Web Archiving System for Preserving
Content Integrity withBlockchain. Electronics2020,
9, 1255.
https://doi.org/10.3390/electronics9081255

More Related Content

Similar to Time -Travel on the Internet

Web tech chapter 1 (1)
Web tech chapter 1 (1)Web tech chapter 1 (1)
Web tech chapter 1 (1)Mohammad Faizan
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesSawood Alam
 
pWeb: A P2P Web Hosting Framework
pWeb: A P2P Web Hosting FrameworkpWeb: A P2P Web Hosting Framework
pWeb: A P2P Web Hosting Frameworkdegarden
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
IRJET- A Two-Way Smart Web Spider
IRJET- A Two-Way Smart Web SpiderIRJET- A Two-Way Smart Web Spider
IRJET- A Two-Way Smart Web SpiderIRJET Journal
 
Case Study for Ego-centric Citation Network
Case Study for Ego-centric Citation NetworkCase Study for Ego-centric Citation Network
Case Study for Ego-centric Citation NetworkMike Taylor
 
Review on an automatic extraction of educational digital objects and metadata...
Review on an automatic extraction of educational digital objects and metadata...Review on an automatic extraction of educational digital objects and metadata...
Review on an automatic extraction of educational digital objects and metadata...IRJET Journal
 
WEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTION
WEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTIONWEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTION
WEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTIONIRJET Journal
 
IWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy Issues
IWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy IssuesIWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy Issues
IWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy IssuesIWMW
 
A_Cooperative_PoW_and_Incentive_Mechanism_for_Blockchain_in_Edge_Computing.pdf
A_Cooperative_PoW_and_Incentive_Mechanism_for_Blockchain_in_Edge_Computing.pdfA_Cooperative_PoW_and_Incentive_Mechanism_for_Blockchain_in_Edge_Computing.pdf
A_Cooperative_PoW_and_Incentive_Mechanism_for_Blockchain_in_Edge_Computing.pdfnilesh405711
 
Search Engine Scrapper
Search Engine ScrapperSearch Engine Scrapper
Search Engine ScrapperIRJET Journal
 
HIGWGET-A Model for Crawling Secure Hidden WebPages
HIGWGET-A Model for Crawling Secure Hidden WebPagesHIGWGET-A Model for Crawling Secure Hidden WebPages
HIGWGET-A Model for Crawling Secure Hidden WebPagesijdkp
 
Web-Application Framework for E-Business Solution
Web-Application Framework for E-Business SolutionWeb-Application Framework for E-Business Solution
Web-Application Framework for E-Business SolutionIRJET Journal
 
Study on Web Content Extraction Techniques
Study on Web Content Extraction TechniquesStudy on Web Content Extraction Techniques
Study on Web Content Extraction Techniquesijtsrd
 
D033017020
D033017020D033017020
D033017020ijceronline
 
IRJET- Secure Data Deduplication for Cloud Server using HMAC Algorithm
IRJET- Secure Data Deduplication for Cloud Server using HMAC AlgorithmIRJET- Secure Data Deduplication for Cloud Server using HMAC Algorithm
IRJET- Secure Data Deduplication for Cloud Server using HMAC AlgorithmIRJET Journal
 

Similar to Time -Travel on the Internet (20)

Web tech chapter 1 (1)
Web tech chapter 1 (1)Web tech chapter 1 (1)
Web tech chapter 1 (1)
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web Bundles
 
pWeb: A P2P Web Hosting Framework
pWeb: A P2P Web Hosting FrameworkpWeb: A P2P Web Hosting Framework
pWeb: A P2P Web Hosting Framework
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
IRJET- A Two-Way Smart Web Spider
IRJET- A Two-Way Smart Web SpiderIRJET- A Two-Way Smart Web Spider
IRJET- A Two-Way Smart Web Spider
 
Case Study for Ego-centric Citation Network
Case Study for Ego-centric Citation NetworkCase Study for Ego-centric Citation Network
Case Study for Ego-centric Citation Network
 
Review on an automatic extraction of educational digital objects and metadata...
Review on an automatic extraction of educational digital objects and metadata...Review on an automatic extraction of educational digital objects and metadata...
Review on an automatic extraction of educational digital objects and metadata...
 
WEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTION
WEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTIONWEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTION
WEB BASED AND BLOCKCHAIN APPLICATION FOR EDUCATIONAL INSTITUTION
 
Unit 1
Unit 1 Unit 1
Unit 1
 
IWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy Issues
IWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy IssuesIWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy Issues
IWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy Issues
 
A_Cooperative_PoW_and_Incentive_Mechanism_for_Blockchain_in_Edge_Computing.pdf
A_Cooperative_PoW_and_Incentive_Mechanism_for_Blockchain_in_Edge_Computing.pdfA_Cooperative_PoW_and_Incentive_Mechanism_for_Blockchain_in_Edge_Computing.pdf
A_Cooperative_PoW_and_Incentive_Mechanism_for_Blockchain_in_Edge_Computing.pdf
 
Df25632640
Df25632640Df25632640
Df25632640
 
Unit 1 Webtechnology
Unit 1  WebtechnologyUnit 1  Webtechnology
Unit 1 Webtechnology
 
Web storage
Web storage Web storage
Web storage
 
Search Engine Scrapper
Search Engine ScrapperSearch Engine Scrapper
Search Engine Scrapper
 
HIGWGET-A Model for Crawling Secure Hidden WebPages
HIGWGET-A Model for Crawling Secure Hidden WebPagesHIGWGET-A Model for Crawling Secure Hidden WebPages
HIGWGET-A Model for Crawling Secure Hidden WebPages
 
Web-Application Framework for E-Business Solution
Web-Application Framework for E-Business SolutionWeb-Application Framework for E-Business Solution
Web-Application Framework for E-Business Solution
 
Study on Web Content Extraction Techniques
Study on Web Content Extraction TechniquesStudy on Web Content Extraction Techniques
Study on Web Content Extraction Techniques
 
D033017020
D033017020D033017020
D033017020
 
IRJET- Secure Data Deduplication for Cloud Server using HMAC Algorithm
IRJET- Secure Data Deduplication for Cloud Server using HMAC AlgorithmIRJET- Secure Data Deduplication for Cloud Server using HMAC Algorithm
IRJET- Secure Data Deduplication for Cloud Server using HMAC Algorithm
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 

Recently uploaded (20)

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 

Time -Travel on the Internet

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1369 Time -Travel on the Internet Pranita Pingale1, Prithvi Prathapan2, Sahil Mulay3, Siddhesh Patil4, Dhananjay Sharma5 1Assistant Professor, Dept. of Computer Engineering, Pillai College of Engineering, Maharashtra, India. 2-6UG Student, Dept. of Computer Engineering, Pillai College of Engineering, Maharashtra, India. ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract – The World Wide Web is a unique, widely used information tool that is used worldwide. Content is disappearing at an alarming rate, not only endangering our digital cultural memory but also organizational accountability. An effective technical solution to such a problem is the introduction of the Internet Archiving System aka, The “Digital Preservation Handbook”. Web archiving is a process of gathering parts of the World Wide Web to ensure that information is archived by Digital Forensic archives, future researchers, historians, and the public. We focus on building a robust and software-based tool that stores and evaluates certain web clips / videos, using Web Crawling and data analysis to help individuals monitor and analyze online data. In Layman's terms, our software visits a set of URLs called seeds and identifies all hyperlinks to that seed, copying and saving information as its "crawls" seeds. This information is stored as an overview of the WARC file format so when it is played again with the WARC viewer, it appears as captured. With this project we strive to meet the primary goal of providing a consistent, easy-to-use, manageable and secure system that allows users with limited technical knowledge to easily download web content for archivingpurposes. Software will be available free of charge for the benefit of the international community that maintains web history. Key Words: Web crawler, Information retrieval, WARC, , Web Scraping, Data gathering. 1.INTRODUCTION Web archive is a process of collecting data recorded on the World Wide Web, storing it, ensuring that the archives are archived, and making the collected data available for future research. The purpose of a web archive is to compile web content and store it for longer to deliver valuable data for generations. The volume of content from the digital channel, including the web, is growing exponentially, and the need to suspend digital media storage was needed. The space data consulting committee (CCSDS) has developed an open reference archive (OAIS) reference system, which is ISO 14,721 for long-term digital record keeping, and many web archiving programs are up to standard. However, the OAIS reference model does not provide a method of operationbut imposes a requirement to ensure OAIS-compliance. There is a weakness in content integrity against unauthorized practices, and there is no way to guarantee long-term content integrity, although the OAIS reference model states long-term preservation of digital records. Also, the user is able to track, monitor and analyze data. 2. Literature Survey A. An overview of Web Archiving: Jinfang Niu shared his views on some of the methods used, such as how the traditional concepts of archive management and theory can be applied to the organization and the definition of archived web resources. This approach is a review of the methods used in various universities, as well as international government libraries and archives, selecting, acquiring, defining and accessing web resources for their archives. B. Accessing web archives at scale through a cloudbased interface: This paper introduces Archives UnleashedCloud, a web-based visual interface and web archives on average. The current access paradigms, largely driven by the breadth and scope of web archives, often involve the use of a command line and a coding code. C. Investigating websites and webpages: This chapter aims to show the searcher how websites work, howtheycan help with research and how to properly document website evidence. This chapter introduces the readertotheconcepts of HTML, and its application in research. It also discusses what can be found on the website and how to view metadata in embedded images, texts, and videos. D. The Wayback machine: Benefits of Web Storage: This paper focuses on Web archiving, awebarchivecampaignand shows how lost websites can be recovered using a back-up machine. E. Design of an Enhanced Web Archiving System: The project proposes a BClinked archive program to maintain content integrity using blockchain technology and attempts to extend WARC file details. 2.1 Summary of Related Work The summary of methods used in literature is given in Table 1.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1370 Table 1 Summary of literature survey 3. Proposed Work We focus on building a robust and software-based tool that stores and evaluates snippets / videos of specific websites, using Web Crawling and data analysis to help individuals monitor and analyze online data. In addition, a web archive system with authentic content authentication can provide a reliable web archive. We suggest a new blockchain-based web archive to solve this challenge, and records webarchive and content archive activity in blockchain databases. Literature Advantages Disadvantages Jinfang Niu, 2012. [1] This overviewis based on a comprehensive review of literature that explains how web archiving is being done. However, none of the literature directly addresses the knowledge and skills required by the professionals in the field who perform the daily routine of selecting, acquiring and cataloging web archives. Nick Ruest , Samantha Fritz and Ryan Deschamps. January 2021 [2] If the Archives Unleashed Cloud, unlocks the potential of web archives by developing and providing the tools, via a cloud service, for scholars with limited – but not none technical expertise to explore archived web content. The problem encountered, like many digital humanities projects, is that of project sustainability B T Sampath Kumar, January 2015. [4] Due to the loss of web informationdue to several reasons like network failures, URL failure, etc. wayback machine can be used to recover some percentage of the missing web site. N/A ToddG. Shipley, Art Bowker,2017. (An Introduction to Solving Crimes in Cyberspace). [3] With a little understanding the investigator can identify who owns the site, information about the contents, and potentially useful metadata from the source code as well as images and other items embedded on the site. But scraping and using information of social media platforms or an individual’s webpage without the owner’s permission is not permissible by law. Hwang, Hyun C., Jin G. Shon, and Ji S. Park 2020[5] Blockchain technology is that connecting the previous node by using cryptograph y to secure all data in a blockchain, and it is the decentralizat ion system which can be shared via peer-topeer network. However, in the BCLinked web archiving system because all blockchain node data should be rebuilt even if only one single content, it takes longer time.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1371 3.1Proposed System Architecture The system architecture is given in Figure 1. Each block is described in this Section. Content crawling: This block covers all crawling activities on the web. Crawler is a computer program that automatically searches documents on the web. Searches are scheduled for repeated actions to make browsing default. In our case, the searcher picks up a target URL and begins to identify the URL and all the hyperlinksembeddedinit. While you visit the links, the searcherfindssummariesof eachpage visited, and generates a WARC file for pages. All this information is stored in a web archive, and is later pushed to the blockchain domain. Domain information storing: The first level of the Blockchain database is the Domain blockchain. This block enables domain information storage in the domain blockchain. Whenever a target URL is posted, a new node or block of Domain blockchainisgeneratedwiththesame name as the URL domain. If the target domain already exists in the domain blockchain, incoming content is stored in the existing block. If the target domain is not available, a new block is created and added during the domain information storage. The blockchain blocks / node in this category contains only the domain name and general URL information. Nodes are linked to key content stored in Web Content Nodes. Content information storing: The second level of Blockchain database is theWebContent blockchain.Building a new domain block in the previous phase, also leads to the creation of a web-based content block or node that stores domain-related content. The need for this block arises as all content in the WARC file can be too large to place in a domain blockchain due to block size limitations. Therefore only the amount that can identify content in a web content block is added to the domain block. Snapshots and WARC files found in the first section are stored in content blockson the web. Requirement Analysis The implementation detail is given in this section. 3.1 Software Operating System Windows 10 Programming Language Python 3.2 Hardware Processor 3.5 GHz Intel HDD 1 TB RAM 8 GB Implementation: Acknowledge It is our privilege to express our sincerest regards to our supervisor Professor Pranita Pingale for thevaluableinputs,
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1372 able guidance, encouragement, whole-hearted cooperation and constructive criticism throughout the duration of this work. We deeply express our sincere thanks to our Head of the Department Dr. Sharvari Govilkar and our Principal Dr. Sandeep M. Joshi for encouraging and allowing us to present this work. Conclusion In this paper, we have conveyed the benefits of webscraping and analyzing the context of a single research center of interest, small firms innovating in the raw materials industry. We have provided a six-step approach to facilitate duplication of our processes through website analysis and social science communities. We hope this work provides a useful approach to the wider community of social science who is interested but who is not yet free to use online data. Although the Wayback Machine is an important resourcefor scholars in all fields, we believe that social scientists in particular have not yet been able to tap the full potential of online data resources. Going forward, there are significant opportunities to incorporate additional analytical tools and statistical models while simultaneously evaluating, refining, and reporting on the appropriateness of measures and results. However, one should keep in mind that both automatic and manual effort is required to create high quality information that can provide these benefits and overcome the limitations of archived information. References 1. Niu, Jinfang, "An Overview of Web Archiving" (2012). School of Information Faculty Publications. 308. 2. Ruest, Nick & Fritz, Samantha & Deschamps,Ryan& Lin, Jimmy & Milligan, Ian. (2021). From archive to analysis: accessing web archives at scale through a cloud-based interface. International Journal of Digital Humanities. 10.1007/s42803-020-00029-6. 3. Yiu-ming Todd G. Shipley, Art Bowker, Chapter 13 - Investigating Websites and Webpages, Editor(s): Todd G. Shipley, Art Bowker, Investigating Internet Crimes, Syngress, 2014. 4. Sampath Kumar, B T. (2015). Wayback machine: A Boon for Preserving Web Sources. 5. Hwang, H.C.; Shon, J.G.; Park, J.S. Design of an Enhanced Web Archiving System for Preserving Content Integrity withBlockchain. Electronics2020, 9, 1255. https://doi.org/10.3390/electronics9081255