SlideShare a Scribd company logo
1 of 30
Handling and Processing Big
Data
Big Data & IoT
Lecture #2
Umair Shafique (21015956-003)
Scholar MS Information Technology - University of Gujrat
Recap
• What is Big Data?
• Why Is Big Data Important?
• Big Data Analytics
• Benefits of Big Data Analytics
• Types of Big Data
• Characteristics of Big Data
• Source of Big Data
• Big Data Tools and Software
What is Big Data?
• Big data is the big buzz nowadays and there are no second thoughts
on that.
• Basically, big data is data that is generated in high volume, variety,
and velocity. There are many other concepts, theories, and facts
related to big data and its popularity.
What Is Big Data?
• In simple words, big data is defined as
mass amounts of data that may
involve complex, unstructured data,
as well as semi-structured data.
• Previously, it was too difficult to
interpret huge data accurately and
efficiently with traditional database
management systems. But big data
tools like Apache Hadoop and Apache
Spark make it easier. For example, a
human genome, which took about ten
years to process, can now be
processed in just about one week.
How Big Is Big Data?
• It's not possible to put a number on what quantifies big data, but it
generally refers to figures around petabytes and exabytes. It includes
vast amounts of data sources gathered from a given company, its
customers, its channel partners, and suppliers, as well as external
data sources.
• Big data analytics is the often complex process of examining big data
to uncover information -- such as hidden patterns, correlations,
market trends and customer preferences -- that can help
organizations make informed business decisions.
Characteristics of Big Data
Handling and Processing Big Data
• Big Data management is the systematic organization, administration as well as
governance of massive amounts of data.
• The process includes management of both unstructured and structured data.
• The primary objective is to ensure the data is of high quality and accessible for
business intelligence along with big data analytics applications.
• To contend with the rapidly growing data pools, government agencies,
corporations and other large organizations have begun implementing Big Data
management solutions.
• The data involves several terabytes or even petabytes of data that has been
saved in a broad range of file formats.
• Effective Big Data management enables an organization to find valuable
information with ease irrespective of how large or unstructured the data is. The
data is gathered from different sources such as call records, system logs and social
media sites.
Handling Big Data
• Here are some ways to effectively handle Big Data:
1. Outline Your Goals
• The first tick on the checklist when it comes to handling Big Data is knowing what data to gather
and the data that need not be collected. To do this one has to determine clearly defined goals.
Failure to accomplish this will lead one to gather large amounts of data which isn’t aligned with a
business’ continuous requirements.
• Many enterprises eventually collect unnecessary data as they would not have clearly defined
goals, well mapped strategies for achieving the said goals. It is of paramount importance that
organizations should collect data with a laser focus to benefit business objectives.
2. Do Not Ignore Audit Regulations
• Offsite Database Managers should maintain the right database components especially when an
audit is in hand. Irrespective of the data nature being payment data, credit scores or data of
lesser importance, the data should be managed accordingly. One should steer clear of liability and
progressively earn the client’s trust.
Handling Big Data
3. Secure the Data
• The next step in managing Big Data is to ensure the relevant data collected is secured with a broad range of
measures. To ensure the data secured is both accessible and secure, it must be protected by firewall
security measures, spam filtering, malware scanning and elimination, along with most importantly team
permission control.
• Since data has the immense power to drive your business to new heights of success, or crash into oblivion.
Therefore it is wise not to take data management lightly since securing organizational data is the highest
priority in Big Data Management.
4. Keep the Data Protected
• A database is susceptible to threats from not just human influences and synthetic anomalies, but also is
prone to damage from the elements of nature such as heat, humidity, and extreme cold. All of which can
easily corrupt data. Whenever data is damaged, system failures are bound to follow leading to expensive
downtimes and related overheads.
• Organizations have to safeguard databases against adverse environmental situations which would damage
data and put forth considerable efforts to protect their data. It is essential to create and maintain/update a
backup of the database elsewhere, in addition to implementation of safety features. The updates should be
at planned at frequent intervals.
Handling Big Data
5. Data Has to Be Interlinked
• Since organizational databases are bound to be accessed by a number of channels, it is not
recommended to use different software for the required solutions. In essence, all organizational
data must be able to talk to each other. If there are communication hassles between applications
and data and the converse of this as well can lead huge problems.
• Cloud Storage solution is the perfect answer to data interlinking issue. Also useful in this
circumstance would be a remote database administrator among other tools. The objective is to
generate seamless data synchronization. This will be needed all the more when more than just
team will be accessing and working on the same data simultaneously.
6. Know the Data You Need to Capture
• The key to successful Big Data management is knowing which data will suit a particular solution.
This will mean one will be aware which data is needed to be collected for different situations.
• Organizations are required to know which data has to be collected and also when. To do this
correctly, objectives will have to be clearly known and a plan must be formulated on how to
accomplish them.
7. Adapt to the New Changes
• One of the most important aspects of Big Data Management is
keeping up with the latest trends in the same. Software and data in all
its forms change constantly and almost on a daily basis, globally.
Keeping up with the newest technologies and strategies for adoption
will enable organizations to stay ahead of the curve and build highly
optimized and efficient databases. Being flexible and open to new
trends and technologies will go a long way in giving you an edge over
the competition.
Meta Data for Big Data Handling and
Processing
• Traditionally in the world of data management, metadata has been
often ignored and implemented as a post implementation process.
• When you start looking at Big Data, you need to create a strong
metadata library, as you will be having no idea about the content of
the data format that you need to process. Remember in the Big Data
world, we ingest and process data, then tag it, and after these steps,
consume it for processing.
• Fundamentally nine types of metadata that are useful for information
technology and data management
Meta Data for Big Data Handling and
Processing
• Technical metadata
• Data transformation rules, data storage structures,
semantic layers, and interface layers
• Business metadata
• Data describing the content i.e. structure, values
etc. of attributes etc.
• Contextual metadata
• Context to large objects like text, images, and
videos
• Process design–level metadata
• Source and target table, algorithms, business
rules etc.
• Program-level metadata
• ETL information,
• Infrastructure metadata
• Source and Targeted platforms, network,
contacts etc.
• Core business metadata
• Frequency of update, valid entries, basic
business metadata etc.
• Operational metadata
• Usage, record count, processing time, security
etc.
• Business intelligence metadata
• BI metadata contains information about how
data is queried, filtered, analyzed, and displayed
in business intelligence and analytics
• Data mining metadata ( data sets, algorithms, and
queries)
• OLAP metadata (dimensions, cubes, measures
(metrics), hierarchies, levels, and drill paths.)
Big Data Processing Requirements
• What is unique about Big Data processing?
• What makes it different or mandates new thinking?
• To understand this better let us look at the underlying requirements.
• We can classify Big Data requirements based on its five main
characteristics:
1. Volume:
● Size of data to be processed is large—it needs to be broken into manageable chunks.
● Data needs to be processed in parallel across multiple systems.
● Data needs to be processed across several program modules simultaneously
Data needs to be processed once and processed to completion due to volumes.
● Data needs to be processed from any point of failure, since it is extremely large to
restart the process from the beginning
Big Data Processing Requirements
2. Velocity:
● Data needs to be processed at streaming speeds during data collection.
● Data needs to be processed for multiple acquisition points.
3. Variety:
● Data of different formats needs to be processed.
● Data of different types needs to be processed.
● Data of different structures needs to be processed.
● Data from different regions needs to be processed.
4. Ambiguity:
● Big Data is ambiguous by nature due to the lack of relevant metadata and context in many
cases. An example is the use of M and F in a sentence—it can mean, respectively, Monday and
Friday, male and female, or mother and father.
● Big Data that is within the corporation also exhibits this ambiguity to a lesser degree. For
example, employment agreements have standard and custom sections and the latter is
ambiguous without the right context.
Big Data Processing Requirements
5. Complexity:
● Big Data complexity needs to use many algorithms to process data quickly
and efficiently.
● Several types of data need multi pass processing and scalability is extremely
important.
• Processing large-scale data requires an extremely high-performance
computing environment that can be managed with the greatest ease
and can performance tune with linear scalability.
Processing Limitations
• There are a couple of processing limitations for processing Big Data:
● Write-once model—with Big Data there is no update processing logic due to
the intrinsic nature of the data that is being processed. Data with changes will
be processed as new data.
● Data fracturing—due to the intrinsic storage design, data can be fractured
across the Big Data infrastructure. Processing logic needs to understand the
appropriate metadata schema used in loading the data. If this match is missed,
then errors could creep into processing the data.
• Big Data processing can have combinations of these limitations and
complexities, which will need to be accommodated in the processing
of the data.
Processing Big Data
• Big Data processing involves steps
very similar to processing data in
the transactional or data
warehouse environments.
• Figure shows the different stages
involved in the processing of Big
Data; the approach to processing
Big Data is:
● Gather the data.
● Analyze the data.
● Process the data.
● Distribute the data. Processing Big Data
Processing Big Data
• While the stages are similar to
traditional data processing the key
differences are:
● Data is first analyzed and then
processed.
● Data standardization occurs in the
analyze stage, which forms the
foundation for the distribute stage
where the data warehouse integration
happens.
● There is not special emphasis on data
quality except the use of metadata,
master data, and semantic libraries to
enhance and enrich the data.
● Data is prepared in the analyze stage
for further processing and integration.
Processing Big Data
1. Gather stage
• Data is acquired from multiple
sources including real-time
systems, near-real-time systems,
and batch-oriented applications.
The data is collected and loaded to
a storage environment like Hadoop
or NoSQL.
• Another option is to process the
data through a knowledge
discovery platform and store the
output rather than the whole data
set.
2. Analysis stage
• The analysis stage is the data
discovery stage for processing Big
Data and preparing it for
integration to the structured
analytical platforms or the data
warehouse.
• The analysis stage consists of
tagging, classification, and
categorization of data, which
closely resembles the subject area
creation data model definition
stage in the data warehouse.
Processing Big Data
3. Process stage
• Processing Big Data has several substages, and the
data transformation at each substage is significant
to produce the correct or incorrect output.
Context processing
• Context processing relates to exploring the context of
occurrence of data within the unstructured or Big Data
environment. The relevancy of the context will help the
processing of the appropriate metadata and master
data set with the Big Data.
Metadata, master data, and semantic linkage
• The most important step in creating the integration of
Big Data into a data warehouse is the ability to use
metadata, semantic libraries, and master data as the
integration links.
Standardize
• Preparing and processing Big Data for integration with
the data warehouse requires standardizing of data,
which will improve the quality of the data.
4. Distribute stage
• Big Data is distributed to downstream systems by
processing it within analytical applications and
reporting systems. Using the data processing
outputs from the processing stage where the
metadata, master data, and metatags are available,
the data is loaded into these systems for further
processing.
• Another distribution technique involves exporting
the data as flat files for use in other applications
like web reporting and content management
platforms.
• From here big data analytics starts.
Technologies for Big Data Processing
• There are various technologies that are foundations of Big Data
processing.
• The evolution and implementation of these technologies evolve
around
● Data movement
● Data storage
● Data management
Technologies for Big Data Processing
• Hadoop
• Hadoop has taken the world by storm in providing the solution architecture to solve Big Data processing on a
cheaper commodity platform with faster scalability and parallel processing.
• Google file system
• Google discovered that its requirements could not be met by traditional file systems, and thus was born the
need to create a file system that could meet the demands and rigor of an extremely high-performance file
system for large-scale data processing on commodity hardware clusters
• MapReduce
• MapReduce is a programming model for processing extremely large data sets and was originally developed by
Google in the early 2000s for solving the scalability of search computation.
• Zookeeper
• Zookeeper is an open-source, in-memory, distributed NoSQL database that is used for coordination services
for managing distributed applications. It consists of a simple set of functions that can be used to build services
for synchronization, configuration maintenance, groups, and naming.
• Pig
• Analyzing large data sets introduces dataflow complexities that become harder to implement in a MapReduce
program as data volumes and processing complexities increase
Technologies for Big Data Processing
• HBase
• HBase is an open-source, nonrelational, column-
oriented, multidimensional, distributed database
developed on Google’s BigTable architecture. It is
designed with high availability and high
performance as drivers to support storage and
processing of large data sets on the Hadoop
framework.
• Hive
• Hive is an open-source data warehousing
solution that has been built on top of Hadoop.
• Chukwa
• Chukwa is an open-source data collection system
for monitoring large distributed systems. Chukwa
is built on top of HDFS (Hadoop Distributed File
System ) and MapReduce frameworks. There is a
flexible and powerful toolkit for displaying,
monitoring, and analyzing results to make the
best use of the collected data available in
Chukwa.
Big Data Career
Big Data Recent Research Trends
• Big Data in Retail
• Big Data in Healthcare
• Big Data in Education
• Big Data in E-commerce
• Big Data in Media and
Entertainment
• Big Data in Finance
• Big Data in Travel Industry
• Big Data in Telecom
• Big Data in Automobile
References
• Big Data Databases: the Essence
https://www.scnsoft.com/analytics/big-data/databases
• Big Data Applications – A manifestation of the hottest buzzword
https://data-flair.training/blogs/big-data-applications/
• Big Data Tutorial For Beginners | What Is Big Data?
https://www.softwaretestinghelp.com/big-data-
tutorial/#Big_Data_Benefits_Over_Traditional_Database
• Healthcare Big Data and the Promise of Value-Based Care
https://catalyst.nejm.org/doi/full/10.1056/CAT.18.0290
S.No. TRADITIONAL DATA BIG DATA
01. Traditional data is generated in enterprise level. Big data is generated in outside and enterprise level.
02. Its volume ranges from Gigabytes to Terabytes. Its volume ranges from Petabytes to Zettabytes or Exabytes.
03. Traditional database system deals with structured data.
Big data system deals with structured, semi structured and
unstructured data.
04. Traditional data is generated per hour or per day or more. But big data is generated more frequently mainly per seconds.
05.
Traditional data source is centralized and it is managed in
centralized form.
Big data source is distributed and it is managed in distributed
form.
06. Data integration is very easy. Data integration is very difficult.
07.
Normal system configuration is capable to process traditional
data.
High system configuration is required to process big data.
08. The size of the data is very small. The size is more than the traditional data size.
09.
Traditional data base tools are required to perform any data
base operation.
Special kind of data base tools are required to perform any data
base operation.
10. Normal functions can manipulate data. Special kind of functions can manipulate data.
11. Its data model is strict schema based and it is static. Its data model is flat schema based and it is dynamic.
12.. Traditional data is stable and inter relationship. Big data is not stable and unknown relationship.
13. Traditional data is in manageable volume. Big data is in huge volume which becomes unmanageable.
14. It is easy to manage and manipulate the data. It is difficult to manage and manipulate the data.
15.
Its data sources includes ERP transaction data, CRM
transaction data, financial data, organizational data, web
transaction data etc.
Its data sources includes social media, device data, sensor data,
video, images, audio etc.

More Related Content

What's hot

3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...PROWEBSCRAPER
 
Big Data and ML on Google Cloud
Big Data and ML on Google CloudBig Data and ML on Google Cloud
Big Data and ML on Google CloudWlodek Bielski
 
Data Modeling & Metadata Management
Data Modeling & Metadata ManagementData Modeling & Metadata Management
Data Modeling & Metadata ManagementDATAVERSITY
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
Crisp dm
Crisp dmCrisp dm
Crisp dmakbkck
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
 
Skillshare - Introduction to Data Scraping
Skillshare - Introduction to Data ScrapingSkillshare - Introduction to Data Scraping
Skillshare - Introduction to Data ScrapingSchool of Data
 
Big data introduction
Big data introductionBig data introduction
Big data introductionChirag Ahuja
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 
Tableau software data visualisation
Tableau software data visualisationTableau software data visualisation
Tableau software data visualisationAS Stitou
 
Big data Presentation
Big data PresentationBig data Presentation
Big data PresentationAswadmehar
 
Full-stack Data Scientist
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data ScientistAlexey Grigorev
 
The role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practiceThe role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practiceJoseph Benjamin Ilagan
 
Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...
Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...
Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...Christopher Gutknecht
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 

What's hot (20)

3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...
 
Big Data and ML on Google Cloud
Big Data and ML on Google CloudBig Data and ML on Google Cloud
Big Data and ML on Google Cloud
 
Data Modeling & Metadata Management
Data Modeling & Metadata ManagementData Modeling & Metadata Management
Data Modeling & Metadata Management
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Crisp dm
Crisp dmCrisp dm
Crisp dm
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Skillshare - Introduction to Data Scraping
Skillshare - Introduction to Data ScrapingSkillshare - Introduction to Data Scraping
Skillshare - Introduction to Data Scraping
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Data science
Data scienceData science
Data science
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Tableau software data visualisation
Tableau software data visualisationTableau software data visualisation
Tableau software data visualisation
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Full-stack Data Scientist
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data Scientist
 
The role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practiceThe role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practice
 
Big data
Big dataBig data
Big data
 
Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...
Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...
Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...
 
Pandas
PandasPandas
Pandas
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 

Similar to Handling and Processing Big Data

Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxsalutiontechnology
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challengesDilpreet kaur Virk
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptxinfinix8
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1RUHULAMINHAZARIKA
 
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfJerichoGerance
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data AnalyticsUtkarsh Sharma
 
Group 2 Handling and Processing of big data (1).pptx
Group 2 Handling and Processing of big data (1).pptxGroup 2 Handling and Processing of big data (1).pptx
Group 2 Handling and Processing of big data (1).pptxNATASHABANO
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataUmair Shafique
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataSpringPeople
 
A Survey on Big Data Analytics
A Survey on Big Data AnalyticsA Survey on Big Data Analytics
A Survey on Big Data AnalyticsBHARATH KUMAR
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfPridesys IT Ltd.
 

Similar to Handling and Processing Big Data (20)

Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptx
 
Big data
Big dataBig data
Big data
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challenges
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big_Data.pptx
Big_Data.pptxBig_Data.pptx
Big_Data.pptx
 
Big data
Big dataBig data
Big data
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
What is big data
What is big dataWhat is big data
What is big data
 
BD1.pptx
BD1.pptxBD1.pptx
BD1.pptx
 
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Group 2 Handling and Processing of big data (1).pptx
Group 2 Handling and Processing of big data (1).pptxGroup 2 Handling and Processing of big data (1).pptx
Group 2 Handling and Processing of big data (1).pptx
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
A Survey on Big Data Analytics
A Survey on Big Data AnalyticsA Survey on Big Data Analytics
A Survey on Big Data Analytics
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Ch~2.pdf
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdf
 

Recently uploaded

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Handling and Processing Big Data

  • 1. Handling and Processing Big Data Big Data & IoT Lecture #2 Umair Shafique (21015956-003) Scholar MS Information Technology - University of Gujrat
  • 2. Recap • What is Big Data? • Why Is Big Data Important? • Big Data Analytics • Benefits of Big Data Analytics • Types of Big Data • Characteristics of Big Data • Source of Big Data • Big Data Tools and Software
  • 3. What is Big Data? • Big data is the big buzz nowadays and there are no second thoughts on that. • Basically, big data is data that is generated in high volume, variety, and velocity. There are many other concepts, theories, and facts related to big data and its popularity.
  • 4.
  • 5. What Is Big Data? • In simple words, big data is defined as mass amounts of data that may involve complex, unstructured data, as well as semi-structured data. • Previously, it was too difficult to interpret huge data accurately and efficiently with traditional database management systems. But big data tools like Apache Hadoop and Apache Spark make it easier. For example, a human genome, which took about ten years to process, can now be processed in just about one week.
  • 6. How Big Is Big Data? • It's not possible to put a number on what quantifies big data, but it generally refers to figures around petabytes and exabytes. It includes vast amounts of data sources gathered from a given company, its customers, its channel partners, and suppliers, as well as external data sources. • Big data analytics is the often complex process of examining big data to uncover information -- such as hidden patterns, correlations, market trends and customer preferences -- that can help organizations make informed business decisions.
  • 8. Handling and Processing Big Data • Big Data management is the systematic organization, administration as well as governance of massive amounts of data. • The process includes management of both unstructured and structured data. • The primary objective is to ensure the data is of high quality and accessible for business intelligence along with big data analytics applications. • To contend with the rapidly growing data pools, government agencies, corporations and other large organizations have begun implementing Big Data management solutions. • The data involves several terabytes or even petabytes of data that has been saved in a broad range of file formats. • Effective Big Data management enables an organization to find valuable information with ease irrespective of how large or unstructured the data is. The data is gathered from different sources such as call records, system logs and social media sites.
  • 9. Handling Big Data • Here are some ways to effectively handle Big Data: 1. Outline Your Goals • The first tick on the checklist when it comes to handling Big Data is knowing what data to gather and the data that need not be collected. To do this one has to determine clearly defined goals. Failure to accomplish this will lead one to gather large amounts of data which isn’t aligned with a business’ continuous requirements. • Many enterprises eventually collect unnecessary data as they would not have clearly defined goals, well mapped strategies for achieving the said goals. It is of paramount importance that organizations should collect data with a laser focus to benefit business objectives. 2. Do Not Ignore Audit Regulations • Offsite Database Managers should maintain the right database components especially when an audit is in hand. Irrespective of the data nature being payment data, credit scores or data of lesser importance, the data should be managed accordingly. One should steer clear of liability and progressively earn the client’s trust.
  • 10. Handling Big Data 3. Secure the Data • The next step in managing Big Data is to ensure the relevant data collected is secured with a broad range of measures. To ensure the data secured is both accessible and secure, it must be protected by firewall security measures, spam filtering, malware scanning and elimination, along with most importantly team permission control. • Since data has the immense power to drive your business to new heights of success, or crash into oblivion. Therefore it is wise not to take data management lightly since securing organizational data is the highest priority in Big Data Management. 4. Keep the Data Protected • A database is susceptible to threats from not just human influences and synthetic anomalies, but also is prone to damage from the elements of nature such as heat, humidity, and extreme cold. All of which can easily corrupt data. Whenever data is damaged, system failures are bound to follow leading to expensive downtimes and related overheads. • Organizations have to safeguard databases against adverse environmental situations which would damage data and put forth considerable efforts to protect their data. It is essential to create and maintain/update a backup of the database elsewhere, in addition to implementation of safety features. The updates should be at planned at frequent intervals.
  • 11. Handling Big Data 5. Data Has to Be Interlinked • Since organizational databases are bound to be accessed by a number of channels, it is not recommended to use different software for the required solutions. In essence, all organizational data must be able to talk to each other. If there are communication hassles between applications and data and the converse of this as well can lead huge problems. • Cloud Storage solution is the perfect answer to data interlinking issue. Also useful in this circumstance would be a remote database administrator among other tools. The objective is to generate seamless data synchronization. This will be needed all the more when more than just team will be accessing and working on the same data simultaneously. 6. Know the Data You Need to Capture • The key to successful Big Data management is knowing which data will suit a particular solution. This will mean one will be aware which data is needed to be collected for different situations. • Organizations are required to know which data has to be collected and also when. To do this correctly, objectives will have to be clearly known and a plan must be formulated on how to accomplish them.
  • 12. 7. Adapt to the New Changes • One of the most important aspects of Big Data Management is keeping up with the latest trends in the same. Software and data in all its forms change constantly and almost on a daily basis, globally. Keeping up with the newest technologies and strategies for adoption will enable organizations to stay ahead of the curve and build highly optimized and efficient databases. Being flexible and open to new trends and technologies will go a long way in giving you an edge over the competition.
  • 13. Meta Data for Big Data Handling and Processing • Traditionally in the world of data management, metadata has been often ignored and implemented as a post implementation process. • When you start looking at Big Data, you need to create a strong metadata library, as you will be having no idea about the content of the data format that you need to process. Remember in the Big Data world, we ingest and process data, then tag it, and after these steps, consume it for processing. • Fundamentally nine types of metadata that are useful for information technology and data management
  • 14. Meta Data for Big Data Handling and Processing • Technical metadata • Data transformation rules, data storage structures, semantic layers, and interface layers • Business metadata • Data describing the content i.e. structure, values etc. of attributes etc. • Contextual metadata • Context to large objects like text, images, and videos • Process design–level metadata • Source and target table, algorithms, business rules etc. • Program-level metadata • ETL information, • Infrastructure metadata • Source and Targeted platforms, network, contacts etc. • Core business metadata • Frequency of update, valid entries, basic business metadata etc. • Operational metadata • Usage, record count, processing time, security etc. • Business intelligence metadata • BI metadata contains information about how data is queried, filtered, analyzed, and displayed in business intelligence and analytics • Data mining metadata ( data sets, algorithms, and queries) • OLAP metadata (dimensions, cubes, measures (metrics), hierarchies, levels, and drill paths.)
  • 15. Big Data Processing Requirements • What is unique about Big Data processing? • What makes it different or mandates new thinking? • To understand this better let us look at the underlying requirements. • We can classify Big Data requirements based on its five main characteristics: 1. Volume: ● Size of data to be processed is large—it needs to be broken into manageable chunks. ● Data needs to be processed in parallel across multiple systems. ● Data needs to be processed across several program modules simultaneously Data needs to be processed once and processed to completion due to volumes. ● Data needs to be processed from any point of failure, since it is extremely large to restart the process from the beginning
  • 16. Big Data Processing Requirements 2. Velocity: ● Data needs to be processed at streaming speeds during data collection. ● Data needs to be processed for multiple acquisition points. 3. Variety: ● Data of different formats needs to be processed. ● Data of different types needs to be processed. ● Data of different structures needs to be processed. ● Data from different regions needs to be processed. 4. Ambiguity: ● Big Data is ambiguous by nature due to the lack of relevant metadata and context in many cases. An example is the use of M and F in a sentence—it can mean, respectively, Monday and Friday, male and female, or mother and father. ● Big Data that is within the corporation also exhibits this ambiguity to a lesser degree. For example, employment agreements have standard and custom sections and the latter is ambiguous without the right context.
  • 17. Big Data Processing Requirements 5. Complexity: ● Big Data complexity needs to use many algorithms to process data quickly and efficiently. ● Several types of data need multi pass processing and scalability is extremely important. • Processing large-scale data requires an extremely high-performance computing environment that can be managed with the greatest ease and can performance tune with linear scalability.
  • 18. Processing Limitations • There are a couple of processing limitations for processing Big Data: ● Write-once model—with Big Data there is no update processing logic due to the intrinsic nature of the data that is being processed. Data with changes will be processed as new data. ● Data fracturing—due to the intrinsic storage design, data can be fractured across the Big Data infrastructure. Processing logic needs to understand the appropriate metadata schema used in loading the data. If this match is missed, then errors could creep into processing the data. • Big Data processing can have combinations of these limitations and complexities, which will need to be accommodated in the processing of the data.
  • 19. Processing Big Data • Big Data processing involves steps very similar to processing data in the transactional or data warehouse environments. • Figure shows the different stages involved in the processing of Big Data; the approach to processing Big Data is: ● Gather the data. ● Analyze the data. ● Process the data. ● Distribute the data. Processing Big Data
  • 20. Processing Big Data • While the stages are similar to traditional data processing the key differences are: ● Data is first analyzed and then processed. ● Data standardization occurs in the analyze stage, which forms the foundation for the distribute stage where the data warehouse integration happens. ● There is not special emphasis on data quality except the use of metadata, master data, and semantic libraries to enhance and enrich the data. ● Data is prepared in the analyze stage for further processing and integration.
  • 21. Processing Big Data 1. Gather stage • Data is acquired from multiple sources including real-time systems, near-real-time systems, and batch-oriented applications. The data is collected and loaded to a storage environment like Hadoop or NoSQL. • Another option is to process the data through a knowledge discovery platform and store the output rather than the whole data set. 2. Analysis stage • The analysis stage is the data discovery stage for processing Big Data and preparing it for integration to the structured analytical platforms or the data warehouse. • The analysis stage consists of tagging, classification, and categorization of data, which closely resembles the subject area creation data model definition stage in the data warehouse.
  • 22. Processing Big Data 3. Process stage • Processing Big Data has several substages, and the data transformation at each substage is significant to produce the correct or incorrect output. Context processing • Context processing relates to exploring the context of occurrence of data within the unstructured or Big Data environment. The relevancy of the context will help the processing of the appropriate metadata and master data set with the Big Data. Metadata, master data, and semantic linkage • The most important step in creating the integration of Big Data into a data warehouse is the ability to use metadata, semantic libraries, and master data as the integration links. Standardize • Preparing and processing Big Data for integration with the data warehouse requires standardizing of data, which will improve the quality of the data. 4. Distribute stage • Big Data is distributed to downstream systems by processing it within analytical applications and reporting systems. Using the data processing outputs from the processing stage where the metadata, master data, and metatags are available, the data is loaded into these systems for further processing. • Another distribution technique involves exporting the data as flat files for use in other applications like web reporting and content management platforms. • From here big data analytics starts.
  • 23. Technologies for Big Data Processing • There are various technologies that are foundations of Big Data processing. • The evolution and implementation of these technologies evolve around ● Data movement ● Data storage ● Data management
  • 24. Technologies for Big Data Processing • Hadoop • Hadoop has taken the world by storm in providing the solution architecture to solve Big Data processing on a cheaper commodity platform with faster scalability and parallel processing. • Google file system • Google discovered that its requirements could not be met by traditional file systems, and thus was born the need to create a file system that could meet the demands and rigor of an extremely high-performance file system for large-scale data processing on commodity hardware clusters • MapReduce • MapReduce is a programming model for processing extremely large data sets and was originally developed by Google in the early 2000s for solving the scalability of search computation. • Zookeeper • Zookeeper is an open-source, in-memory, distributed NoSQL database that is used for coordination services for managing distributed applications. It consists of a simple set of functions that can be used to build services for synchronization, configuration maintenance, groups, and naming. • Pig • Analyzing large data sets introduces dataflow complexities that become harder to implement in a MapReduce program as data volumes and processing complexities increase
  • 25. Technologies for Big Data Processing • HBase • HBase is an open-source, nonrelational, column- oriented, multidimensional, distributed database developed on Google’s BigTable architecture. It is designed with high availability and high performance as drivers to support storage and processing of large data sets on the Hadoop framework. • Hive • Hive is an open-source data warehousing solution that has been built on top of Hadoop. • Chukwa • Chukwa is an open-source data collection system for monitoring large distributed systems. Chukwa is built on top of HDFS (Hadoop Distributed File System ) and MapReduce frameworks. There is a flexible and powerful toolkit for displaying, monitoring, and analyzing results to make the best use of the collected data available in Chukwa.
  • 27. Big Data Recent Research Trends • Big Data in Retail • Big Data in Healthcare • Big Data in Education • Big Data in E-commerce • Big Data in Media and Entertainment • Big Data in Finance • Big Data in Travel Industry • Big Data in Telecom • Big Data in Automobile
  • 28.
  • 29. References • Big Data Databases: the Essence https://www.scnsoft.com/analytics/big-data/databases • Big Data Applications – A manifestation of the hottest buzzword https://data-flair.training/blogs/big-data-applications/ • Big Data Tutorial For Beginners | What Is Big Data? https://www.softwaretestinghelp.com/big-data- tutorial/#Big_Data_Benefits_Over_Traditional_Database • Healthcare Big Data and the Promise of Value-Based Care https://catalyst.nejm.org/doi/full/10.1056/CAT.18.0290
  • 30. S.No. TRADITIONAL DATA BIG DATA 01. Traditional data is generated in enterprise level. Big data is generated in outside and enterprise level. 02. Its volume ranges from Gigabytes to Terabytes. Its volume ranges from Petabytes to Zettabytes or Exabytes. 03. Traditional database system deals with structured data. Big data system deals with structured, semi structured and unstructured data. 04. Traditional data is generated per hour or per day or more. But big data is generated more frequently mainly per seconds. 05. Traditional data source is centralized and it is managed in centralized form. Big data source is distributed and it is managed in distributed form. 06. Data integration is very easy. Data integration is very difficult. 07. Normal system configuration is capable to process traditional data. High system configuration is required to process big data. 08. The size of the data is very small. The size is more than the traditional data size. 09. Traditional data base tools are required to perform any data base operation. Special kind of data base tools are required to perform any data base operation. 10. Normal functions can manipulate data. Special kind of functions can manipulate data. 11. Its data model is strict schema based and it is static. Its data model is flat schema based and it is dynamic. 12.. Traditional data is stable and inter relationship. Big data is not stable and unknown relationship. 13. Traditional data is in manageable volume. Big data is in huge volume which becomes unmanageable. 14. It is easy to manage and manipulate the data. It is difficult to manage and manipulate the data. 15. Its data sources includes ERP transaction data, CRM transaction data, financial data, organizational data, web transaction data etc. Its data sources includes social media, device data, sensor data, video, images, audio etc.

Editor's Notes

  1. Structured. If your data is structured, it means that it is already organized and convenient to work with. An example is data in Excel or SQL databases that is tagged in a standardized format and can be easily sorted, updated, and extracted. Unstructured. Unstructured data does not have any pre-defined order. Google search results are an example of what unstructured data can look like: articles, e-books, videos, and images. Semi-structured. Semi-structured data has been pre-processed but it doesn’t look like a ‘normal’ SQL database. It can contain some tags, such as data formats. JSON or XML files are examples of semi-structured data. Some tools for data analytics can work with them. Quasi-structured. It is something in between unstructured and semi-structured data. An example is textual content with erratic data formats such as the information about what web pages a user visited and in what order.
  2. Secure Data : While most organizations gather data from customers via interactions with their websites and products, not many businesses spend time to employ measures to guarantee the security of the data collected. In the situation that collected data is damaged, it might damage the relationship with the customer through loss of trust, business bankrupt, or have it collapse due to lack of essential customer data.
  3. EDW Enterprise Data Warehouse
  4. Linkage of different units of data from multiple data sets is not a new concept by itself. This process can be repeated multiple times for a given data set, as the business rule for each component is different.
  5. Flume Oozie Hcatalog Sqoop