Submit Search
Upload
An Overview of Data Lake
•
0 likes
•
4 views
IRJET Journal
Follow
https://www.irjet.net/archives/V9/i7/IRJET-V9I7412.pdf
Read less
Read more
Engineering
Report
Share
Report
Share
1 of 4
Download now
Download to read offline
Recommended
MODERN DATA PIPELINE
MODERN DATA PIPELINE
IRJET Journal
IRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its Challenges
IRJET Journal
Benefits of a data lake
Benefits of a data lake
Sun Technologies
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
52023374-5ab1-4b99-8b31-bdc4ee5a7d89.pdf
52023374-5ab1-4b99-8b31-bdc4ee5a7d89.pdf
vitm11
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud Storage
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud Storage
IRJET Journal
Comparing and analyzing various method of data integration in big data
Comparing and analyzing various method of data integration in big data
IRJET Journal
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
Denodo
Recommended
MODERN DATA PIPELINE
MODERN DATA PIPELINE
IRJET Journal
IRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its Challenges
IRJET Journal
Benefits of a data lake
Benefits of a data lake
Sun Technologies
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
52023374-5ab1-4b99-8b31-bdc4ee5a7d89.pdf
52023374-5ab1-4b99-8b31-bdc4ee5a7d89.pdf
vitm11
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud Storage
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud Storage
IRJET Journal
Comparing and analyzing various method of data integration in big data
Comparing and analyzing various method of data integration in big data
IRJET Journal
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
Denodo
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
IRJET Journal
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
IRJET Journal
The Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their Usage
IRJET Journal
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
Denodo
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
Denodo
A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)
Denodo
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
DATAVERSITY
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Denodo
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using Qlik
IRJET Journal
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)
Denodo
Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...
Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...
IRJET Journal
An Overview of General Data Mining Tools
An Overview of General Data Mining Tools
IRJET Journal
Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012
Bert Taube
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
Analysis on Deduplication Techniques for Storage of Data in Cloud
Analysis on Deduplication Techniques for Storage of Data in Cloud
IRJET Journal
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
Denodo
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
Denodo
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Denodo
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
YogeshIJTSRD
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
Cindy Irby
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
IRJET Journal
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
IRJET Journal
More Related Content
Similar to An Overview of Data Lake
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
IRJET Journal
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
IRJET Journal
The Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their Usage
IRJET Journal
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
Denodo
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
Denodo
A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)
Denodo
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
DATAVERSITY
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Denodo
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using Qlik
IRJET Journal
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)
Denodo
Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...
Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...
IRJET Journal
An Overview of General Data Mining Tools
An Overview of General Data Mining Tools
IRJET Journal
Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012
Bert Taube
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
Analysis on Deduplication Techniques for Storage of Data in Cloud
Analysis on Deduplication Techniques for Storage of Data in Cloud
IRJET Journal
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
Denodo
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
Denodo
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Denodo
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
YogeshIJTSRD
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
Cindy Irby
Similar to An Overview of Data Lake
(20)
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
The Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their Usage
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using Qlik
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)
Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...
Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...
An Overview of General Data Mining Tools
An Overview of General Data Mining Tools
Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Analysis on Deduplication Techniques for Storage of Data in Cloud
Analysis on Deduplication Techniques for Storage of Data in Cloud
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
More from IRJET Journal
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
IRJET Journal
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
IRJET Journal
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
IRJET Journal
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
IRJET Journal
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
IRJET Journal
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
IRJET Journal
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
IRJET Journal
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
IRJET Journal
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
IRJET Journal
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
IRJET Journal
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
IRJET Journal
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
IRJET Journal
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
IRJET Journal
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
IRJET Journal
React based fullstack edtech web application
React based fullstack edtech web application
IRJET Journal
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
IRJET Journal
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
IRJET Journal
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
IRJET Journal
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
IRJET Journal
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
IRJET Journal
More from IRJET Journal
(20)
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
React based fullstack edtech web application
React based fullstack edtech web application
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Recently uploaded
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
srsj9000
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
959SahilShah
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
RajaP95
Internship report on mechanical engineering
Internship report on mechanical engineering
malavadedarshan25
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
britheesh05
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
wendy cai
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
DeelipZope
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
null - The Open Security Community
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
Asst.prof M.Gokilavani
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
KurinjimalarL3
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Anamika Sarkar
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
Asst.prof M.Gokilavani
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
Mark Billinghurst
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
ranjana rawat
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
rehmti665
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Low Rate Call Girls In Saket, Delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
9953056974 Low Rate Call Girls In Saket, Delhi NCR
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
NikhilNagaraju
Recently uploaded
(20)
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
Internship report on mechanical engineering
Internship report on mechanical engineering
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
An Overview of Data Lake
1.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 07 | July 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2241 An Overview of Data Lake Pragati Kumai1, Smitha G R2 1,2 R.V College of Engineering, Bengaluru, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Data Lake is one of the contentious concepts that emerged during the Big Data era. The ideaforDataLakecame from the business world rather than the academic world. As Data Lake is a newly conceived concept with revolutionary concepts, its adoption presents numerous challenges. The potential to change the data landscape, on the other hand, makes Data Lake research worthwhile and the data lake is a highly flexible repository that can store both structured and unstructured data and employs a schema-on-read strategy. It is an effective approach for today's problem on Big Data Storage. However, it has a few defects, such as inadequate security andiauthenticationimechanisms. Apache Hadoop is usually recognized as a data lake industry standard. Its parallel processing mechanisms enable rapid processing of huge amounts of data. Many businesses have attempted to create wrappers for Hadoop in order to address issues about its raw state and poor data security. This includes platforms such as Amazon Web Services (AWS) Data Lake and Azure Data Lake. AWS Data Lakes offer simple solution with safeguards to prevent data loss, whereas Azure Data Lakes offer greater adaptability and organisation-level security. Key Words: Big Data, Data Warehouse, OLAP, OLTP, Data Lake, Apache Hadoop 1. INTRODUCTION A data lake offers businesses a scalable and safe platform that enables them to: ingest any data from any systematany speed,iwhether it originates from on-premises, cloud, or edge computing systems; store any type or volumeofdata in full fidelity; process data in real time or batch mode; and analyse data using SQL, Python, R, or any other language, third-party data, or analytics applications. Businesses that successfully get business value from their data will perform better than their competitors. According to an Aberdeenistudy, businesses who used data lakes outperformed comparable businesses in terms of organic revenue growth by 9%. These leaders were able to use fresh data from sources includinglogfiles,click-streamdata,social media, and internet-connected devices housed in the data lake to perform new forms of analytics like machine learning. This made it easier for them to recognise and take advantage of business growth prospects by bringing in and keeping clients, increasing productivity, maintaining equipment proactively, and making wise judgments. A typical organisation will need both a data warehouse and a data lake, depending on the requirements,ias they eachfulfil different needs and use cases. A data warehouse is a database designed for the analysis of relational data fromcorporateapplicationsandtransactional systems. In order to optimise for quick SQL queries, the data structure and schema are set in advance. The results are often utilised for operational reportingandanalysis.Inorder for data to serve as thei"single sourceoftruth"thatuserscan rely on, information is cleansed, enriched, and transformed. 2. DATA LAKE CONCEPTS The fundamental goal of a data lake is to ingest unprocessed data and process it later. Therefore, data lakes keep all the information and have a good flexibility. However, data lakes, which include a large number of datasets lacking precise models or descriptions,areprone to quickly becoming invisible, impenetrable, and inaccessible. establishing a metadata controlisystem for DL is therefore required. In fact, many articles have underlined the value of metadata [10]. Big data can also refer to technological advancements in data processing and storage that enable handling of exponential growth in data volumeinanytypeof format [3]. Thei3-V model [7], which consists of three dimensions of issues in data growth: volume, velocity, and variety, is the foundation for another widely accepted definition of big data. The increasing amount of data is referred to as volume. Theiterms "velocity" and "accessibility" refer to the rates at which new data are generated and made available forfurtheranalysis.Therange of various data types and sources is characterised by variety.Ref [9] proposed more Data Lake specifications, particularly from the perspective of the business domain rather than the scientific community. The data is loadedifrom the source systems No data is rejected. At the leaf level, data is stored in an untransformed or in an untransformed state. A DL can be accessed by multiple people and ingests and stores a variety of data kinds. Numerous problems with accessing, searching, and analysing data may arise if properimanagement techniques are not in place [2]. Governance for data lakes is necessary in this situation. To uncover some partial solutions in the cutting-edge work. [6] suggests a "Just-enough Governance" for DLs with policies for data quality, data on-boarding, metadata management, andicompliance & audit. According to [4], data governance must guarantee data availability and quality throughout the
2.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 07 | July 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2242 whole data life-cycle.[5]provideddata governance with data life-cycle management,idata lineage, data quality, and security. 2.1 Data Lake (Architectural Implementation) The architecture of a Data Lake can be divided into five layers : 1. IngestioniLayer 2. DistillationiLayer 3. Processing Layer 4. InsightsiLayer 5. Unified OperationsiLayer Ingestion of Raw Data into the Data Lake is the goal of the Data Lake Architecture's Ingestion Layer. In this layer, there is no data alteration. The layer can take in Raw Data in batches or in real-time, and itiorganises the data into a logical folder structure. The Ingestion Layer can retrieve data from a variety of external sources, including social networkingiwebsites, wearable technology, Internet of Things (IoT) devices, and data streaming equipment. This layer's advantage is that it can swiftly assimilate any kind of data, such as: Streams of video from surveillance cameras. Health monitoring equipment data in real time. Many telemetry data types. Geographical information, pictures, and videos taken with mobile devices. Thisisecond phaseentailsstrengtheningthecapacityfordata transformation andanalysis.Companiesemploythetool that best suits their skill set at this point. More data are being collected, and applications are being developed. The enterprise data warehouse and data lake's capabilities are combined here. In thirdilayer, data can be kept in files or tables after being processed into usable data sets. At this point, both the data's purpose and its structure are known. Before this layer, you should anticipate purging and transformations. Denormalization and consolidation of various objects are also frequent. This is the most complicated element of the entire Data Lake solution because of everything mentioned before. The framework is pretty plain and easy to understand in terms of how to organise your data. For instance: Files, Type, and Purpose. End users typically only have access to this tier. Theifourth layer consists of the Data Lake's query interface and output interface. It makes requests for or retrieves data from the Data Lake using SQL and NoSQL queries. Typically, enterprise users who require access to the data run the queries. The layer that displays the data to the user for viewing after it has been fetchedfromtheData Lake.Reports and dashboards are frequently the results of searches, and they make it simple for users to draw conclusions from theiunderlying data. Speaking of the final phase of general data flow ina data lake design is the consumption zone. Through the analytic consumption tools and SQL and non-SQL queryicapabilities at this layer, the findings and business insights fromanalytic projects are made available to the targeted users, be they technical decision-makers or business analysts. 2.2 Proposal for Improvement of Data Lake Because a complete investigation has not been performed, analysts in data lakes sometimes struggle to assess the qualityiof the data. Additionally, since there is no account of the history of conclusions made by earlier analysts, it is impossible to incorporate insights from others who have worked with the data. Finally,securityandaccesscontrol are two of the main dangers associated with data lakes. Without any control, data can be thrown into a lake, some of which may have privacy and legal restrictions that other dataidoes not. The Swamp Problem : Aimassive data lake will accept any data you offer it. Unless all the data in your present systems is flawless and of the greatest calibre possible: there are no formatting errors, no missing values, no typos, and no data ever placed into the wrong field. Additionally, theiinformation in all of your present systems is flawlessly consistent with one another. Then there won't be an issue. Solution : There is more to a data lake than just having one. The goal isito transform data into knowledge, knowledge into action, and knowledge into action—all guided by the individuals who are most familiar with your data. Metadata management is how the tale of the data by supplying context and explanations about the data's origin, usefulness, quality,andsignificance.Evenwith relativelytiny volumes of data, most organisations stillihave trouble managing metadata. Machine learning has the potential to change the game because it can extract tacit knowledge from those who understand data the best and transform it into algorithms that can be utilised to scale-up automated data processing.
3.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 07 | July 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2243 The developmentiof data quality systems is the topic of another type of frameworks. In order to understand best practises in mature businesses and pinpoint areas for growth, they strive to evaluatetheDQmanagementmaturity level. Popular examples of these frameworks include CapabilityiMaturity Model Integration (CMMI) and Total DataiQuality Management (TDQM) (CMMI),etc. An efficientidata quality model gives business data integrity actions direction so that they are all built upon the same comprehensiveunderstanding.Itlessenscomplexityiwithout limiting the ability to iterate and evolveascomprehensionof the data advances. 3. COMPARISON BETWEEN DATA LAKE & DATA WAREHOUSE Both data lakes and data warehouses are often used for holding huge data, however the phrases are not exchangeable. A data lake isa largepool ofunstructureddata with no clear purpose. A data warehouse is a storage location for organized, filtered data that has previouslybeen processed for a particular purpose. The two methodsofdata storage are frequently mistaken, yet they are far more dissimilar than they are similar. The only true resemblance between them is the high-level purpose of data storage. The distinction is significant since they perform various functions and need the employment of separate sets of eyes to be adequately maximized. A data lake may be appropriate for one firm, whereas a data warehouse may be more appropriate for another. The main difference between them are as follows: iData structure: iiraw vs. iiprocessed: Most of the data in data lakes is unprocessed andraw.Unprocesseddata for a purpose is known as raw data. Raw data is convenient for machine learning and is simple to examine. Whereas, data warehouses, hold processed data. In contrast to raw data, processed data is easily comprehended by a wide number of individuals. It also saves money on costly storage space. The concern is that a big volume of data may occasionally turn into data swamps, with some data never being utilised. iPurpose: iUndetermined vs iiOperational:Compared to data warehouses, data lakes contain less structure and filtration of the data. It is referred to asprocessed data whenever the raw data is used for a certain purpose. This indicates that the data that is stored is not squandered and will be put to use inside the company. iUsers: Dataiiscientists vs business iiprofessionals: People who are not used to working with raw data frequently find it challenging to exploredata lakes.To comprehend and transform raw, unstructured data for any specific commercial application, it often takes a data scientist and special equipment. To make processed data readable by the majority, if not all, of the personnel at a corporation, it is utilized in charts, spreadsheets, tables, and other formats. Processed data, such as that kept in data warehouses, simply needs the user to be knowledgeable about thesubject matter. iAccessibility: iiFlexible vs iisecure: The lack of structure in data lake design makes it flexible and simple to modify. Data lakes have extremely minimal restrictions, thus any modifications to the data may be made fast. Data warehouses are highly organizedibyidesign, which makes it expensive and harder to alter them. The fact that data is processed and structured in a way that makes it easier to understand is one of the main benefits of data warehouse design. 4. CHALLENGES OF DATA LAKE It may be quite expensive to set up and manage datalakes. Although managedicloud data platforms are simpler to implement, they are still challenging to administeriandhave high costs. Data lakes may be challenging to maintain even for experienced engineers. Whether a basiciopen-source cloud data platform or a managed service is utilized, host infrastructure has the bandwidth for the data lake to keep expanding, dealingiwith duplicateidata is made sure, safeguarding all of the data, and other suchichores are all challenging jobs. Traditional cloud data platformsiare effective at storing data but not so good at safeguarding it or enforcing data governancepolicies.Thathavetoaddsecurity and governance. That means extraitime, money, and managementiheadaches. All members of the organization have access to the data lake. Its practical use, however,isnot equally available to everyone. Since the data lake also contains unstructured data, non-technicaliusers may find it challenging to understand the data. 5. DISCUSSION AND CONCLUSION The data lake is a data storage depository that provides companies with a strong potential due to its capacity to handle massive amounts of all sorts of data. This ability should be treated seriously in order to avoid a data swamp of outdated, streaming, and forgotten information. Theideal method to enhance the benefits of this repository type is to choose a solution in which the stream goes both ways: the group's data is streamed in to consolidate, backup, and protect from a single central hub, while AI application enables for an overflow of transparency and business insights. Data Lake is a relatively new notion that is expanding in parallel with the popularity of cloud, data science, and artificial intelligence applications. It has acquired popularity in the market because to its adaptable architecture and the application or data type it supports, which enables businesses to gather a comprehensive
4.
International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 07 | July 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2244 pictureiof patterns in data. Data lake solutions are useful for record keeping because they offer enterprises with a centralised location to backup and secure their data. Each data element in the lake is labelled with a set of metadata tags, allowing users to derive its most information from whatever is saved. Rather than having separate data collections, all data to be stored is collected in DL to address the old issue. The new problem is dealing with the difficulties of the big data period, i.e., data lakes attempt to overcome the difficulties demanded by big data V's - value, volume, variety, verity, andvelocity.Data silosareverylikely if data produced or developed by various departments within the enterprise is only stored in their data stores.Data Lake is a relatively new notion that is expanding in parallel with the popularity of cloud, data science, and artificial intelligence applications. It has acquired popularity in the market because to its adaptable architecture and the application or data type it supports, which enables businesses to gather a comprehensive picture of patterns in data. ACKNOWLEDGEMENT This work would not have been possible without the contribution of Smitha G R, AssistantProfessor-RVCollegeof Engineering, Aggarwal Akash, Data Analytics Lead-Koch Business Solutions. I am very thankful for all the help they provided throughout the course of this work. REFERENCES [1] Brian Stein, Alan Morrison,” The enterprise data lake: Better integration and deeper analytics, Technology Forecast: Rethinking integration”, Issue 1, 2014, Retrieved 25, Aug. 2017. [2] Timothy King “The Emergence of Data Lake: Pros and Cons”, March 3, 2016, Retrieved Sep 15, 2017. [3] Timothy King “The Emergence of Data Lake: Pros and Cons”, March 3, 2016, Retrieved Sep 15, 2017. [4] Huang Fang, Managing Data Lakes in Big Data Era: What’s a data lake and why has it became popular in data management ecosystem, The 5th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems, June 8-12, 2015, Shenyang, China. [5] Huang Fang, Managing Data Lakes in Big Data Era: What’s a data lake and why has it became popular in data management ecosystem, The 5th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems, June 8-12, 2015, Shenyang, China. [6] Huang Fang, Managing Data Lakes in Big Data Era: What’s a data lake and why has it became popular in data management ecosystem, The 5th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems, June 8-12, 2015, Shenyang, China. [7] Huang Fang, Managing Data Lakes in Big Data Era: What’s a data lake and why has it became popular in data management ecosystem, The 5th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems, June 8-12, 2015, Shenyang, China.
Download now