SlideShare a Scribd company logo
1 of 3
Download to read offline
Data Lakes a tool for minimizing expenditure on
storage – A survey
Muvvala Sai Phanindra
3rd
year B.Tech, Department of Information Technology,
Hindustan Institute of Technology and Science,
#1,IT Expressway, Bay Range Campus, Padur, Chennai–
603103, Tamil Nadu, India.
18132003@student.hindustanuniv.ac.in
Dr. C.V. Suresh Babu
Professor, Department of Information Technology,
Hindustan Institute of Technology and Science,
#1,IT Expressway, Bay Range Campus, Padur, Chennai–
603103, Tamil Nadu, India.
pt.cvsuresh@hindustanuniv.ac.in
Abstract— This paper was an intensive survey study on the
Data Lakes, on how it has been used for minimizing expenditure
on the increasing need of storage.
Keywords—Cost Benefit Analysis, Economics, Data Lakes,
Storage.
I. INTRODUCTION
Often System failure affects the process of storing data and is
making more difficult to work with. Data Lakes can be
created for a permanent connection between the device, that
is sending data, and the system that is receiving it, as a
solution to this problem.
II. RATIONALE BACKGROUND:
The basic need for this study is increases in Expenditure on
storage devices due to rapid data growth
III. OBJECTIVES
Primary objective: This expenditure on storage can be
minimized using data lakes.
Secondary Objectives: Using this data lakes we can
minimize expenditure on storage devices without
compromising the security of the raw data we are storing.
IV. REVIEW OF LITERATURE
 The implementation of Real-Time Analytics tools may be
expensive, it will eventually save a lot of money. Some
tools of it like Hadoop and Cloud-Based Analytics can
bring cost advantages to business when large amounts of
data are to be stored and these tools also help in
identifying more efficient ways of doing business.
(Abdelrahman Elsharawy., January 1, 2019).
 Modern tools are allowing analysts to analyze more data,
more quickly, which increases their personal productivity.
In addition, the insights gained from that analytics often
allow organizations to increase productivity more broadly
throughout the company. (Amy ElMahalawy., January 1,
2019).
 Scientists and experts are among the most highly coveted
and highly paid workers in the IT field. Respondents
ranked skills and staff as the second biggest challenge
when creating a data lake. Hiring or training staff can
increase costs considerably, and the process of acquiring
skills can take considerable time. (Khaled Gad., January
1, 2019).
 Now a days we are using mostly incompatible tools.
Hadoop is the most commonly used tool for analytics.
However, the standard version of Hadoop is not currently
able to handle real-time analysis. (Islam Mousa., January
1, 2019).
 Many of today’s tools rely on open-source technology,
which dramatically reduces software costs, but
enterprises still face significant expenses related to
staffing, hardware, maintenance, and related services. It’s
not uncommon for big data analytics initiatives to run
significantly over budget and to take more time to deploy
than IT managers had originally anticipated. (Mostafa
Elshahawy., January 1, 2019).
 A large part of this new data on which researchers work
belongs to companies (which aggregate them from their
clientele), and the benefits for these companies of
benefiting from researchers' knowledge of these data are
not always comparable to the costs of disclosing the data.
The unstructured nature of the data, which represents a
challenge in econometric terms - just to separate the
dependencies between the series studied; this is the most
important technical challenge with this type of data,
which requires the development of new regression tools.
(Alex Bekker,.march 21 2018)
 The need for economists brought to develop new skills -
and more specifically at the level of advanced software
and languages (SQL, R and Xlstat) as well as machine
learning algorithms - in order to be able to combine the
framework conceptual of economic research with the
ability to apply ideas on massive databases; the highly
publicized profession of "data scientist", which consists
of analyzing data in order to find empirical models, is
exactly at the crossroads of computer science and
econometric analysis. The extraction and synthesis of the
various variables and the search for relations between
them will therefore become important parts of the work
of economists and require new skills in computer science
and databases (Alex Bekker,.march 21 2018)
 User-Level Algorithms Have Difficulty Answering
“Why”Largely speaking, there are only two ways to
analyze user-level data: one is to aggregate it into a
“smaller” data set in some way and then apply statistical
or heuristic analysis; the other is to analyze the data set
directly using algorithmic methods. Both can result in
predictions and recommendations (e.g. move spend from
campaign A to B), but algorithmic analyses tend to have
difficulty answering “why” questions (e.g. why should we
move spend) in a manner comprehensible to the average
marketer. Certain types of algorithms such as neural
networks are black boxes even to the data scientists who
designed it. Which leads to the next limitation: (Balar
Khalid, T., 2017)
 User Data Is Not Suited For Producing Learnings This
will probably strike you as counter-intuitive. Big data =
big insights = big learnings, right?
 Wrong! For example, let’s say you apply big data to
personalize your website, increasing overall conversion
rates by 20%. While certainly a fantastic result, the only
learning you get from the exercise is that you should
indeed personalize your website. While this result
certainly raises the bar on marketing, but it does nothing
to raise the bar for marketers. . (matthew.aslett, 2016).
 Actionable learnings that require user-level data – for
instance, applying a look-alike model to discover
previously untapped customer segments – are relatively
few and far in between, and require tons of effort to
uncover. Boring, ol’ small data remains far more efficient
at producing practical real-world learnings that you can
apply to execution today. (matthew.aslett, 2016).
 Bigdata is realization of competitive advantage based on
the fact that it is now more economically feasible to store
and process data that was previously ignored due to cost
and functional limitations of traditional data management
technologies to handle its volume, velocity and variety.
(matthew.aslett, 2016).
 Storing and analyzing large volumes of data that is
crucial for a company to work requires a vast and complex
hardware infrastructure. If more and complex data is
stored, more hardware systems will be needed (Alexandru
Adrian TOLE,., 2013).
 A hardware system can only be reliable over a certain
period of time. Intensive use and, rarely, production faults
will most certainly result in a system malfunction.
Companies can’t afford to lose data that they gathered in
the past years, neither to lose their clients. For avoiding
such catastrophic events they use a backup system that
does the simple operation of storing all data. By doing
this, companies obtain continuity, even if they are drawn
back temporary. The challenge is to maintain the level of
services that they provide when (Alexandru Adrian
TOLE,., 2013).
 A server malfunction occurs right when a client is
uploading files on it. To achieve continuity, hardware
systems are backed by software solutions that respond in
order to maintain fluency by redirecting traffic to another
system. When a fault occurs, usually a user is not affected
and he/she continues work without even noticing that
something has happened. System failure The flow of data
must not be interrupted in order to obtain accurate
information. For example, Google is sending one search
request to multiple servers, rather than sending it to only
one. By doing this, the response time is shortened and also
there is no inconsistency in the data that users sends –
receives. (Alexandru Adrian TOLE,., 2013).
 To avoid this from happening, for any content that is
transmitted, the sender must generate a “key”. This key is
then transferred to the receiver to compare it with the key
that it generated regarding the data that was received. If
both keys are identical than the “send-receive” process
was successfully completed. For better understanding,
this solution is similar with the MD5 Hash that is
generated over a compressed content. But, in this case, the
keys are compared automatically (Alexandru Adrian
TOLE,., 2013).
 Loosing data is not always a hardware problem. Software
can as well malfunction and cause irreparable and more
dangerous data loss. If one hard drive fails, there is
usually another one to back it up, so there is no harm done
to data, but when software fails due to programming
“bug” or a flaw in the design, data is lost forever. To
overcome this problem, programmers developed series of
tools that will reduce the impact of a software failure. A
simple example is Microsoft Word, which saves from
time to time the work that a user is doing in order to
prevent the loss of it in case of hardware or software
failure. This is the basic idea of preventing complete data
loss. (Alexandru Adrian TOLE,., 2013).
 Analytics can be hard to scale as an organization and the
amount of data it collects grows. Collecting information
and creating reports becomes increasingly complex. A
system that can grow with the organization is crucial to
manage this issue. While overcoming these challenges
may take some time, the benefits of data analysis are well
worth the effort. Improve your organization today and
consider investing in a data analytics system. (Rebecca
Webb,. November 25, 2020).
 There is a skills shortage for data scientists. Closing this
gap, however, is proving to be extremely difficult. It’s not
just a matter of training people to work with big data
analytics solutions, either. “The data science field has an
experience shortage,” explains Daniel Zhao, a senior
economist at Glassdoor. “There are plenty of recent grads
who can throw a hodgepodge of models at a data set, but
there’s a serious shortage of experienced and qualified
workers who have the full combination of technical skills,
business expertise, and domain knowledge.” (Justin
Reynolds., February 3, 2020)
 Many organizations reduce the pain of the data science
skills gap using automated machine learning (AutoML),
which involves automating repetitive tasks. With
AutoML, data scientists can use their time to focus on
business problems instead of getting bogged down with
code. AutoML isn’t the complete answer to the data
science skills crisis. But it can help analytics teams
accomplish more when they lack experienced personnel.
(Justin Reynolds., February 3, 2020)
 CapGemini's report found that 37% of companies have
trouble finding skilled data analysts to make use of their
data. Their best bet is to form one common data analysis
team for the company, either through re-skilling your
current workers or recruiting new workers specialized in
big data. You need to find employees that not only
understand data from a scientific perspective, but who
also understand the business and its customers, and how
their data findings apply directly to them. (Ewout Meyns,.
January 31, 2020)
 If you’re using multiple channels to capture data, such as
through your website, customer care centre and marketing
leads, you’re running the risk of collecting duplicate
information. There are tools to help you remove duplicate
data - for instance, if you work with Google Contacts, you
can merge your contacts. . (Ewout Meyns,. January 31,
2020)
Summary of Review of Literature
From the review of literature, we took away how we can
minimize the data using the methods such as compression,
deduplication, and tiring. It also helped me to integrate my
technology related problem statement with economics and
move further.
V. FUTURE SCOPE OF THE STUDY
We all know that due to rapid data growth throughout the
world we need more efficient and more secured and cheaper
and also easier way to store that huge data. We can further
research on all the possible storage types and take forward
this efficiency level of storage further.
VI. CONCLUSION
Now a days every company is using huge cloud data to store
their data and process the data using big data techniques. This
costs them lots of money to buy cloud storage. One of the
alternatives for this cloud storage is data lakes.
Data lakes: data lakes is huge storage which we can buy with
affordable price and can store raw data.
Disadvantage of data lakes is that the data stored in data lakes
cannot be processed, it’s stored in the form of raw data.
Solution for data lake disadvantage with an example:
Suppose if a company named x exist from the past 50 years.
That company if want to store data, it normally uses any
cloud-based support for storage and process the data using a
database or big data.
But our solution can decrease the amount of money spent on
data storage. That company x can buy data lake with
comparatively cheaper price and store their whole 50 years
data in it. And suppose if the company got some work and
have to process some data from the year of 1998 then the
company can import that particular year data from the data
lake to any local storage and can process the data through big
data.
ACKNOWLEDGMENT
We thank all our Faculty members of our Department and
our classmates and other anonymous reviewers for their
valuable comments on our draft paper.
DISCLOSURE STATEMENT
No potential conflict of interest was reported by the
authors.
REFERENCES
[1] Abdelrahman Elsharawy (January 1, 2019) Advantages and
disadvantages of big data https://www.vapulus.com/en/advantages-
and-disadvantages-of-big-data/
[2] Balar Khalid(august 2019, BIG DATA IN ECONOMIC ANALYSIS:
ADVANTAGES AND CHALLENGES
https://www.researchgate.net/publication/335234998_BIG_DATA_I
N_ECONOMIC_ANALYSIS_ADVANTAGES_AND_CHALLENG
ES
[3] Kulraj Smagh,(October 7 2017) limitations of big data analytics.
https://www.ciklum.com/blog/limitations-of-big-data-analytics/
[4] Liran Einav, Jonathan Levin(November 7, 2014) Economics in the age
of big data Vol. 346, Issue 6210, 1243089 DOI:
10.1126/science.1243089
https://science.sciencemag.org/content/346/6210/1243089
[5] Mila Slesar(January 2020) Pros and cons of big data for businesses
https://onix-systems.com/blog/the-pros-and-cons-of-big-data-for-
businesses

More Related Content

What's hot

IRJET- Survey of Crop Recommendation Systems
IRJET- Survey of Crop Recommendation SystemsIRJET- Survey of Crop Recommendation Systems
IRJET- Survey of Crop Recommendation SystemsIRJET Journal
 
Better ways of using Analytics in Agriculture in india
Better ways of using Analytics in Agriculture in indiaBetter ways of using Analytics in Agriculture in india
Better ways of using Analytics in Agriculture in indiaYagnesh Shetty
 
Analysis of Indian Agriculture
Analysis of Indian AgricultureAnalysis of Indian Agriculture
Analysis of Indian Agriculturesushantparte
 
Data analytics for agriculture
Data analytics for agricultureData analytics for agriculture
Data analytics for agricultureData Portal India
 
Mining Social Media Data for Understanding Drugs Usage
Mining Social Media Data for Understanding Drugs  UsageMining Social Media Data for Understanding Drugs  Usage
Mining Social Media Data for Understanding Drugs UsageIRJET Journal
 
Big Data in Agriculture, the SemaGrow and agINFRA experience
Big Data in Agriculture, the SemaGrow and agINFRA experienceBig Data in Agriculture, the SemaGrow and agINFRA experience
Big Data in Agriculture, the SemaGrow and agINFRA experienceAndreas Drakos
 
A Case Analysis on Involvement of Big Data during Natural Disaster and Pandem...
A Case Analysis on Involvement of Big Data during Natural Disaster and Pandem...A Case Analysis on Involvement of Big Data during Natural Disaster and Pandem...
A Case Analysis on Involvement of Big Data during Natural Disaster and Pandem...YogeshIJTSRD
 
SC2 Workshop 1: Big Data challenges and solutions in agricultural and environ...
SC2 Workshop 1: Big Data challenges and solutions in agricultural and environ...SC2 Workshop 1: Big Data challenges and solutions in agricultural and environ...
SC2 Workshop 1: Big Data challenges and solutions in agricultural and environ...BigData_Europe
 
IRJET- Farmer’s Friend
IRJET-  	  Farmer’s FriendIRJET-  	  Farmer’s Friend
IRJET- Farmer’s FriendIRJET Journal
 
IRJET- SMART KRISHI- A Proposed System for Farmers
IRJET-  	  SMART KRISHI- A Proposed System for FarmersIRJET-  	  SMART KRISHI- A Proposed System for Farmers
IRJET- SMART KRISHI- A Proposed System for FarmersIRJET Journal
 
Big data in precision agriculture
Big data in precision agriculture Big data in precision agriculture
Big data in precision agriculture Self
 
IRJET- Smart Agriculture Assistant and Crop Price Pediction
IRJET-  	  Smart Agriculture Assistant and Crop Price PedictionIRJET-  	  Smart Agriculture Assistant and Crop Price Pediction
IRJET- Smart Agriculture Assistant and Crop Price PedictionIRJET Journal
 
CGIAR Platform for Big Data in Agriculture
CGIAR Platform for Big Data in AgricultureCGIAR Platform for Big Data in Agriculture
CGIAR Platform for Big Data in AgricultureCIAT
 
IRJET-Clustering Techniques for Mushroom Dataset
IRJET-Clustering Techniques for Mushroom DatasetIRJET-Clustering Techniques for Mushroom Dataset
IRJET-Clustering Techniques for Mushroom DatasetIRJET Journal
 
11 16110 paper 103 ijeecs(edit)new
11 16110 paper 103 ijeecs(edit)new11 16110 paper 103 ijeecs(edit)new
11 16110 paper 103 ijeecs(edit)newIAESIJEECS
 
IRJET - Smart Agriculture with IoT and Cloud Computing
IRJET - Smart Agriculture with IoT and Cloud ComputingIRJET - Smart Agriculture with IoT and Cloud Computing
IRJET - Smart Agriculture with IoT and Cloud ComputingIRJET Journal
 
Big Data Summit-Hudson Panel
Big Data Summit-Hudson PanelBig Data Summit-Hudson Panel
Big Data Summit-Hudson PanelMadison Ingold
 

What's hot (20)

IRJET- Survey of Crop Recommendation Systems
IRJET- Survey of Crop Recommendation SystemsIRJET- Survey of Crop Recommendation Systems
IRJET- Survey of Crop Recommendation Systems
 
Better ways of using Analytics in Agriculture in india
Better ways of using Analytics in Agriculture in indiaBetter ways of using Analytics in Agriculture in india
Better ways of using Analytics in Agriculture in india
 
Analysis of Indian Agriculture
Analysis of Indian AgricultureAnalysis of Indian Agriculture
Analysis of Indian Agriculture
 
Data analytics for agriculture
Data analytics for agricultureData analytics for agriculture
Data analytics for agriculture
 
Mining Social Media Data for Understanding Drugs Usage
Mining Social Media Data for Understanding Drugs  UsageMining Social Media Data for Understanding Drugs  Usage
Mining Social Media Data for Understanding Drugs Usage
 
Big Data in Agriculture, the SemaGrow and agINFRA experience
Big Data in Agriculture, the SemaGrow and agINFRA experienceBig Data in Agriculture, the SemaGrow and agINFRA experience
Big Data in Agriculture, the SemaGrow and agINFRA experience
 
A Case Analysis on Involvement of Big Data during Natural Disaster and Pandem...
A Case Analysis on Involvement of Big Data during Natural Disaster and Pandem...A Case Analysis on Involvement of Big Data during Natural Disaster and Pandem...
A Case Analysis on Involvement of Big Data during Natural Disaster and Pandem...
 
SC2 Workshop 1: Big Data challenges and solutions in agricultural and environ...
SC2 Workshop 1: Big Data challenges and solutions in agricultural and environ...SC2 Workshop 1: Big Data challenges and solutions in agricultural and environ...
SC2 Workshop 1: Big Data challenges and solutions in agricultural and environ...
 
Big Data in Agriculture : Opportunities for data driven agronomy
Big Data in Agriculture : Opportunities for data driven agronomyBig Data in Agriculture : Opportunities for data driven agronomy
Big Data in Agriculture : Opportunities for data driven agronomy
 
IRJET- Farmer’s Friend
IRJET-  	  Farmer’s FriendIRJET-  	  Farmer’s Friend
IRJET- Farmer’s Friend
 
IRJET- SMART KRISHI- A Proposed System for Farmers
IRJET-  	  SMART KRISHI- A Proposed System for FarmersIRJET-  	  SMART KRISHI- A Proposed System for Farmers
IRJET- SMART KRISHI- A Proposed System for Farmers
 
Big data in precision agriculture
Big data in precision agriculture Big data in precision agriculture
Big data in precision agriculture
 
The Role of Big Data Management and Analytics in Higher Education
The Role of Big Data Management and Analytics in Higher EducationThe Role of Big Data Management and Analytics in Higher Education
The Role of Big Data Management and Analytics in Higher Education
 
IRJET- Smart Agriculture Assistant and Crop Price Pediction
IRJET-  	  Smart Agriculture Assistant and Crop Price PedictionIRJET-  	  Smart Agriculture Assistant and Crop Price Pediction
IRJET- Smart Agriculture Assistant and Crop Price Pediction
 
CGIAR Platform for Big Data in Agriculture
CGIAR Platform for Big Data in AgricultureCGIAR Platform for Big Data in Agriculture
CGIAR Platform for Big Data in Agriculture
 
Data analytics for agriculture
Data analytics for agricultureData analytics for agriculture
Data analytics for agriculture
 
IRJET-Clustering Techniques for Mushroom Dataset
IRJET-Clustering Techniques for Mushroom DatasetIRJET-Clustering Techniques for Mushroom Dataset
IRJET-Clustering Techniques for Mushroom Dataset
 
11 16110 paper 103 ijeecs(edit)new
11 16110 paper 103 ijeecs(edit)new11 16110 paper 103 ijeecs(edit)new
11 16110 paper 103 ijeecs(edit)new
 
IRJET - Smart Agriculture with IoT and Cloud Computing
IRJET - Smart Agriculture with IoT and Cloud ComputingIRJET - Smart Agriculture with IoT and Cloud Computing
IRJET - Smart Agriculture with IoT and Cloud Computing
 
Big Data Summit-Hudson Panel
Big Data Summit-Hudson PanelBig Data Summit-Hudson Panel
Big Data Summit-Hudson Panel
 

Similar to Data lakes a tool for minimizing expenditure on storage

Discussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moDiscussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moVinaOconner450
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
 
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET Journal
 
Summary of: "Automating Data Preparation: Can We? Should We? Must We?"
Summary of: "Automating Data Preparation: Can We? Should We? Must We?"Summary of: "Automating Data Preparation: Can We? Should We? Must We?"
Summary of: "Automating Data Preparation: Can We? Should We? Must We?"SamueleBertollo1
 
INN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementINN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementSimen Smaaberg
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Mr.Sameer Kumar Das
 
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...IRJET Journal
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
Challenges and outlook with Big Data
Challenges and outlook with Big Data Challenges and outlook with Big Data
Challenges and outlook with Big Data IJCERT JOURNAL
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Thingspateelhs
 
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Science Council of America
 
Overview of mit sloan case study on ge data and analytics initiative titled g...
Overview of mit sloan case study on ge data and analytics initiative titled g...Overview of mit sloan case study on ge data and analytics initiative titled g...
Overview of mit sloan case study on ge data and analytics initiative titled g...Gregg Barrett
 
Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Anastasija Nikiforova
 
6. ijece guideforauthors 2012_2 eidt sat
6. ijece guideforauthors 2012_2 eidt sat6. ijece guideforauthors 2012_2 eidt sat
6. ijece guideforauthors 2012_2 eidt satIAESIJEECS
 
Business_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanBusiness_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanLuke Caratan
 

Similar to Data lakes a tool for minimizing expenditure on storage (20)

Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Discussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moDiscussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated mo
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
 
Summary of: "Automating Data Preparation: Can We? Should We? Must We?"
Summary of: "Automating Data Preparation: Can We? Should We? Must We?"Summary of: "Automating Data Preparation: Can We? Should We? Must We?"
Summary of: "Automating Data Preparation: Can We? Should We? Must We?"
 
INN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementINN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for management
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53
 
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
Challenges and outlook with Big Data
Challenges and outlook with Big Data Challenges and outlook with Big Data
Challenges and outlook with Big Data
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
 
Big data upload
Big data uploadBig data upload
Big data upload
 
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdfData Observability- The Next Frontier of Data Engineering Pdf.pdf
Data Observability- The Next Frontier of Data Engineering Pdf.pdf
 
Overview of mit sloan case study on ge data and analytics initiative titled g...
Overview of mit sloan case study on ge data and analytics initiative titled g...Overview of mit sloan case study on ge data and analytics initiative titled g...
Overview of mit sloan case study on ge data and analytics initiative titled g...
 
F035431037
F035431037F035431037
F035431037
 
Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...
 
6. ijece guideforauthors 2012_2 eidt sat
6. ijece guideforauthors 2012_2 eidt sat6. ijece guideforauthors 2012_2 eidt sat
6. ijece guideforauthors 2012_2 eidt sat
 
Business_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanBusiness_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_Caratan
 

More from Dr. C.V. Suresh Babu (20)

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Association rules
Association rulesAssociation rules
Association rules
 
Clustering
ClusteringClustering
Clustering
 
Classification
ClassificationClassification
Classification
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
DART
DARTDART
DART
 
Mycin
MycinMycin
Mycin
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Bayes network
Bayes networkBayes network
Bayes network
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
Rule based system
Rule based systemRule based system
Rule based system
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
 
Production based system
Production based systemProduction based system
Production based system
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Data lakes a tool for minimizing expenditure on storage

  • 1. Data Lakes a tool for minimizing expenditure on storage – A survey Muvvala Sai Phanindra 3rd year B.Tech, Department of Information Technology, Hindustan Institute of Technology and Science, #1,IT Expressway, Bay Range Campus, Padur, Chennai– 603103, Tamil Nadu, India. 18132003@student.hindustanuniv.ac.in Dr. C.V. Suresh Babu Professor, Department of Information Technology, Hindustan Institute of Technology and Science, #1,IT Expressway, Bay Range Campus, Padur, Chennai– 603103, Tamil Nadu, India. pt.cvsuresh@hindustanuniv.ac.in Abstract— This paper was an intensive survey study on the Data Lakes, on how it has been used for minimizing expenditure on the increasing need of storage. Keywords—Cost Benefit Analysis, Economics, Data Lakes, Storage. I. INTRODUCTION Often System failure affects the process of storing data and is making more difficult to work with. Data Lakes can be created for a permanent connection between the device, that is sending data, and the system that is receiving it, as a solution to this problem. II. RATIONALE BACKGROUND: The basic need for this study is increases in Expenditure on storage devices due to rapid data growth III. OBJECTIVES Primary objective: This expenditure on storage can be minimized using data lakes. Secondary Objectives: Using this data lakes we can minimize expenditure on storage devices without compromising the security of the raw data we are storing. IV. REVIEW OF LITERATURE  The implementation of Real-Time Analytics tools may be expensive, it will eventually save a lot of money. Some tools of it like Hadoop and Cloud-Based Analytics can bring cost advantages to business when large amounts of data are to be stored and these tools also help in identifying more efficient ways of doing business. (Abdelrahman Elsharawy., January 1, 2019).  Modern tools are allowing analysts to analyze more data, more quickly, which increases their personal productivity. In addition, the insights gained from that analytics often allow organizations to increase productivity more broadly throughout the company. (Amy ElMahalawy., January 1, 2019).  Scientists and experts are among the most highly coveted and highly paid workers in the IT field. Respondents ranked skills and staff as the second biggest challenge when creating a data lake. Hiring or training staff can increase costs considerably, and the process of acquiring skills can take considerable time. (Khaled Gad., January 1, 2019).  Now a days we are using mostly incompatible tools. Hadoop is the most commonly used tool for analytics. However, the standard version of Hadoop is not currently able to handle real-time analysis. (Islam Mousa., January 1, 2019).  Many of today’s tools rely on open-source technology, which dramatically reduces software costs, but enterprises still face significant expenses related to staffing, hardware, maintenance, and related services. It’s not uncommon for big data analytics initiatives to run significantly over budget and to take more time to deploy than IT managers had originally anticipated. (Mostafa Elshahawy., January 1, 2019).  A large part of this new data on which researchers work belongs to companies (which aggregate them from their clientele), and the benefits for these companies of benefiting from researchers' knowledge of these data are not always comparable to the costs of disclosing the data. The unstructured nature of the data, which represents a challenge in econometric terms - just to separate the dependencies between the series studied; this is the most important technical challenge with this type of data, which requires the development of new regression tools. (Alex Bekker,.march 21 2018)  The need for economists brought to develop new skills - and more specifically at the level of advanced software and languages (SQL, R and Xlstat) as well as machine learning algorithms - in order to be able to combine the framework conceptual of economic research with the ability to apply ideas on massive databases; the highly publicized profession of "data scientist", which consists of analyzing data in order to find empirical models, is exactly at the crossroads of computer science and econometric analysis. The extraction and synthesis of the various variables and the search for relations between them will therefore become important parts of the work of economists and require new skills in computer science and databases (Alex Bekker,.march 21 2018)  User-Level Algorithms Have Difficulty Answering “Why”Largely speaking, there are only two ways to analyze user-level data: one is to aggregate it into a “smaller” data set in some way and then apply statistical or heuristic analysis; the other is to analyze the data set directly using algorithmic methods. Both can result in predictions and recommendations (e.g. move spend from campaign A to B), but algorithmic analyses tend to have difficulty answering “why” questions (e.g. why should we move spend) in a manner comprehensible to the average marketer. Certain types of algorithms such as neural networks are black boxes even to the data scientists who designed it. Which leads to the next limitation: (Balar Khalid, T., 2017)
  • 2.  User Data Is Not Suited For Producing Learnings This will probably strike you as counter-intuitive. Big data = big insights = big learnings, right?  Wrong! For example, let’s say you apply big data to personalize your website, increasing overall conversion rates by 20%. While certainly a fantastic result, the only learning you get from the exercise is that you should indeed personalize your website. While this result certainly raises the bar on marketing, but it does nothing to raise the bar for marketers. . (matthew.aslett, 2016).  Actionable learnings that require user-level data – for instance, applying a look-alike model to discover previously untapped customer segments – are relatively few and far in between, and require tons of effort to uncover. Boring, ol’ small data remains far more efficient at producing practical real-world learnings that you can apply to execution today. (matthew.aslett, 2016).  Bigdata is realization of competitive advantage based on the fact that it is now more economically feasible to store and process data that was previously ignored due to cost and functional limitations of traditional data management technologies to handle its volume, velocity and variety. (matthew.aslett, 2016).  Storing and analyzing large volumes of data that is crucial for a company to work requires a vast and complex hardware infrastructure. If more and complex data is stored, more hardware systems will be needed (Alexandru Adrian TOLE,., 2013).  A hardware system can only be reliable over a certain period of time. Intensive use and, rarely, production faults will most certainly result in a system malfunction. Companies can’t afford to lose data that they gathered in the past years, neither to lose their clients. For avoiding such catastrophic events they use a backup system that does the simple operation of storing all data. By doing this, companies obtain continuity, even if they are drawn back temporary. The challenge is to maintain the level of services that they provide when (Alexandru Adrian TOLE,., 2013).  A server malfunction occurs right when a client is uploading files on it. To achieve continuity, hardware systems are backed by software solutions that respond in order to maintain fluency by redirecting traffic to another system. When a fault occurs, usually a user is not affected and he/she continues work without even noticing that something has happened. System failure The flow of data must not be interrupted in order to obtain accurate information. For example, Google is sending one search request to multiple servers, rather than sending it to only one. By doing this, the response time is shortened and also there is no inconsistency in the data that users sends – receives. (Alexandru Adrian TOLE,., 2013).  To avoid this from happening, for any content that is transmitted, the sender must generate a “key”. This key is then transferred to the receiver to compare it with the key that it generated regarding the data that was received. If both keys are identical than the “send-receive” process was successfully completed. For better understanding, this solution is similar with the MD5 Hash that is generated over a compressed content. But, in this case, the keys are compared automatically (Alexandru Adrian TOLE,., 2013).  Loosing data is not always a hardware problem. Software can as well malfunction and cause irreparable and more dangerous data loss. If one hard drive fails, there is usually another one to back it up, so there is no harm done to data, but when software fails due to programming “bug” or a flaw in the design, data is lost forever. To overcome this problem, programmers developed series of tools that will reduce the impact of a software failure. A simple example is Microsoft Word, which saves from time to time the work that a user is doing in order to prevent the loss of it in case of hardware or software failure. This is the basic idea of preventing complete data loss. (Alexandru Adrian TOLE,., 2013).  Analytics can be hard to scale as an organization and the amount of data it collects grows. Collecting information and creating reports becomes increasingly complex. A system that can grow with the organization is crucial to manage this issue. While overcoming these challenges may take some time, the benefits of data analysis are well worth the effort. Improve your organization today and consider investing in a data analytics system. (Rebecca Webb,. November 25, 2020).  There is a skills shortage for data scientists. Closing this gap, however, is proving to be extremely difficult. It’s not just a matter of training people to work with big data analytics solutions, either. “The data science field has an experience shortage,” explains Daniel Zhao, a senior economist at Glassdoor. “There are plenty of recent grads who can throw a hodgepodge of models at a data set, but there’s a serious shortage of experienced and qualified workers who have the full combination of technical skills, business expertise, and domain knowledge.” (Justin Reynolds., February 3, 2020)  Many organizations reduce the pain of the data science skills gap using automated machine learning (AutoML), which involves automating repetitive tasks. With AutoML, data scientists can use their time to focus on business problems instead of getting bogged down with code. AutoML isn’t the complete answer to the data science skills crisis. But it can help analytics teams accomplish more when they lack experienced personnel. (Justin Reynolds., February 3, 2020)  CapGemini's report found that 37% of companies have trouble finding skilled data analysts to make use of their data. Their best bet is to form one common data analysis team for the company, either through re-skilling your current workers or recruiting new workers specialized in big data. You need to find employees that not only understand data from a scientific perspective, but who also understand the business and its customers, and how their data findings apply directly to them. (Ewout Meyns,. January 31, 2020)  If you’re using multiple channels to capture data, such as through your website, customer care centre and marketing leads, you’re running the risk of collecting duplicate information. There are tools to help you remove duplicate data - for instance, if you work with Google Contacts, you can merge your contacts. . (Ewout Meyns,. January 31, 2020)
  • 3. Summary of Review of Literature From the review of literature, we took away how we can minimize the data using the methods such as compression, deduplication, and tiring. It also helped me to integrate my technology related problem statement with economics and move further. V. FUTURE SCOPE OF THE STUDY We all know that due to rapid data growth throughout the world we need more efficient and more secured and cheaper and also easier way to store that huge data. We can further research on all the possible storage types and take forward this efficiency level of storage further. VI. CONCLUSION Now a days every company is using huge cloud data to store their data and process the data using big data techniques. This costs them lots of money to buy cloud storage. One of the alternatives for this cloud storage is data lakes. Data lakes: data lakes is huge storage which we can buy with affordable price and can store raw data. Disadvantage of data lakes is that the data stored in data lakes cannot be processed, it’s stored in the form of raw data. Solution for data lake disadvantage with an example: Suppose if a company named x exist from the past 50 years. That company if want to store data, it normally uses any cloud-based support for storage and process the data using a database or big data. But our solution can decrease the amount of money spent on data storage. That company x can buy data lake with comparatively cheaper price and store their whole 50 years data in it. And suppose if the company got some work and have to process some data from the year of 1998 then the company can import that particular year data from the data lake to any local storage and can process the data through big data. ACKNOWLEDGMENT We thank all our Faculty members of our Department and our classmates and other anonymous reviewers for their valuable comments on our draft paper. DISCLOSURE STATEMENT No potential conflict of interest was reported by the authors. REFERENCES [1] Abdelrahman Elsharawy (January 1, 2019) Advantages and disadvantages of big data https://www.vapulus.com/en/advantages- and-disadvantages-of-big-data/ [2] Balar Khalid(august 2019, BIG DATA IN ECONOMIC ANALYSIS: ADVANTAGES AND CHALLENGES https://www.researchgate.net/publication/335234998_BIG_DATA_I N_ECONOMIC_ANALYSIS_ADVANTAGES_AND_CHALLENG ES [3] Kulraj Smagh,(October 7 2017) limitations of big data analytics. https://www.ciklum.com/blog/limitations-of-big-data-analytics/ [4] Liran Einav, Jonathan Levin(November 7, 2014) Economics in the age of big data Vol. 346, Issue 6210, 1243089 DOI: 10.1126/science.1243089 https://science.sciencemag.org/content/346/6210/1243089 [5] Mila Slesar(January 2020) Pros and cons of big data for businesses https://onix-systems.com/blog/the-pros-and-cons-of-big-data-for- businesses