SlideShare a Scribd company logo
Architectural
Aspects And
Design Hypothesis
Of The Data
Ingestion Pipeline
Introduction
We stand at the cusp of a technological revolution that is completely data driven.
The functionality of different systems and processes is dependent upon the way
we process data and handle it from the stage of ingestion to execution. The data
ingestion pipeline involves various stages ranging from data collection to data
analytics. The data pipeline operates upon raw data from different platforms and
databases and turns it into useful information with the help of powerful business
intelligence tools.
Architectural Aspects
● The architectural aspects of a data pipeline are fabricated in such a manner that the
cleansing and transformation of data becomes as simple as possible.
● We need to extract data from warehouses and data lakes and put it into useful and crisp
facts that can be converted as informatics. This informatics further becomes the base of
knowledge engineering systems.
● One of the unique features of a data pipeline is the speed through which it processes
data. This is primarily dependent upon three critical factors.
● The first is called the throughput rate which defines the amount of data that can be
processed in a given amount of time.
● The second is called data reliability which ensures that an effective validation mechanism
is established in the data pipeline to maintain high data quality.
● The third important factor is latency. In order to ensure that the response rate is high and
volume of data processed is large, it is pertinent to ensure low-latency. Low latency means
that the delay in the processing time should be as minimal as possible.
The Design Hypotheses
● There are a large number of ways in which we can design a data pipeline. We mention the
important stage through which we can layer the data pipeline architecture.
● The first stage is the stage of data extraction and involves mining of data across data
warehouses and data lakes. It is at this stage that we validate data sets and ensure quality
control.
● The next stage is the ingestion stage. It is at this stage that we read data from data sources
with the help of an application programming interface. We also follow the process of
extracting data sets of choice with help of data profiling. We examine the various
characteristics of data and evaluate it from a business point of view.
● We now move to the stage of data transformation. This stage involves a lot of filters
through which data passes and yields a qualitative output. This qualitative output can then
be utilized for analytics processes and business intelligence.
● After all the stages have been completed, it is important to monitor data on the basis of
various parameters and fix various issues that may arise. For this purpose, data quality
engineers are employed to keep a constant vigil of the data pipeline.
Concluding remarks
The architectural pathways of a data pipeline may be diverse but follow a certain hierarchy
of steps. Right from the process of ingestion to the process of analytics, the aim is to come
up with state-of-the-art analytics which can help in transformative business intelligence.

More Related Content

Similar to Architectural aspects and design hypothesis of the data ingestion pipeline

An example of a successful proof of concept
An example of a successful proof of conceptAn example of a successful proof of concept
An example of a successful proof of concept
ETLSolutions
 
20171019 data migration (rk)
20171019 data migration (rk)20171019 data migration (rk)
20171019 data migration (rk)
Ruud Kapteijn
 
Data Design - the x factor for a successful data migration v1.3
Data Design - the x factor for a successful data migration v1.3Data Design - the x factor for a successful data migration v1.3
Data Design - the x factor for a successful data migration v1.3
Richard Neale
 
Analysis of economic data using big data
Analysis of economic data using big data Analysis of economic data using big data
Analysis of economic data using big data
Shivu Manjesh
 
IP final project
IP final project IP final project
IP final project
SantySS
 
2. INFORMATION GATHERING.pptx Computer Applications in Pharmacy
2. INFORMATION GATHERING.pptx Computer Applications in Pharmacy2. INFORMATION GATHERING.pptx Computer Applications in Pharmacy
2. INFORMATION GATHERING.pptx Computer Applications in Pharmacy
Vedika Narvekar
 
Webinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSync
Webinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSyncWebinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSync
Webinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSync
APPSeCONNECT
 
Understand your data dependencies – Key enabler to efficient modernisation
 Understand your data dependencies – Key enabler to efficient modernisation  Understand your data dependencies – Key enabler to efficient modernisation
Understand your data dependencies – Key enabler to efficient modernisation
Profinit
 
Industry 4.0 Is your ERP system ready for the digital era?.pptx
Industry 4.0 Is your ERP system  ready for the digital era?.pptxIndustry 4.0 Is your ERP system  ready for the digital era?.pptx
Industry 4.0 Is your ERP system ready for the digital era?.pptx
Erandika Gamage
 
Modern Software Architectures - Overview
Modern Software Architectures - Overview Modern Software Architectures - Overview
Modern Software Architectures - Overview
CodeOps Technologies LLP
 
Software development life cycle
Software development life cycle Software development life cycle
Software development life cycle
shefali mishra
 
Data Mining Implementation process.pptx
Data Mining Implementation process.pptxData Mining Implementation process.pptx
Data Mining Implementation process.pptx
Lithal Fragrance
 
Data Integrity webinar - Essentials & Solutions
Data Integrity webinar - Essentials & SolutionsData Integrity webinar - Essentials & Solutions
Data Integrity webinar - Essentials & Solutions
pi
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Online Crime Management System.ppt.pptx
Online Crime Management System.ppt.pptxOnline Crime Management System.ppt.pptx
Online Crime Management System.ppt.pptx
Ashutoshmahale3
 
IRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank Website
IRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank WebsiteIRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank Website
IRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank Website
IRJET Journal
 
IRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank Website
IRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank WebsiteIRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank Website
IRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank Website
IRJET Journal
 
Understanding big data testing
Understanding big data testingUnderstanding big data testing
Understanding big data testing
Narola Infotech
 
La metro measure using Dashboards - Oracle Primavera P6 Collaborate 14
La metro measure using Dashboards  - Oracle Primavera P6 Collaborate 14La metro measure using Dashboards  - Oracle Primavera P6 Collaborate 14
La metro measure using Dashboards - Oracle Primavera P6 Collaborate 14
p6academy
 
IP Final project 12th
IP Final project 12thIP Final project 12th
IP Final project 12th
SantySS
 

Similar to Architectural aspects and design hypothesis of the data ingestion pipeline (20)

An example of a successful proof of concept
An example of a successful proof of conceptAn example of a successful proof of concept
An example of a successful proof of concept
 
20171019 data migration (rk)
20171019 data migration (rk)20171019 data migration (rk)
20171019 data migration (rk)
 
Data Design - the x factor for a successful data migration v1.3
Data Design - the x factor for a successful data migration v1.3Data Design - the x factor for a successful data migration v1.3
Data Design - the x factor for a successful data migration v1.3
 
Analysis of economic data using big data
Analysis of economic data using big data Analysis of economic data using big data
Analysis of economic data using big data
 
IP final project
IP final project IP final project
IP final project
 
2. INFORMATION GATHERING.pptx Computer Applications in Pharmacy
2. INFORMATION GATHERING.pptx Computer Applications in Pharmacy2. INFORMATION GATHERING.pptx Computer Applications in Pharmacy
2. INFORMATION GATHERING.pptx Computer Applications in Pharmacy
 
Webinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSync
Webinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSyncWebinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSync
Webinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSync
 
Understand your data dependencies – Key enabler to efficient modernisation
 Understand your data dependencies – Key enabler to efficient modernisation  Understand your data dependencies – Key enabler to efficient modernisation
Understand your data dependencies – Key enabler to efficient modernisation
 
Industry 4.0 Is your ERP system ready for the digital era?.pptx
Industry 4.0 Is your ERP system  ready for the digital era?.pptxIndustry 4.0 Is your ERP system  ready for the digital era?.pptx
Industry 4.0 Is your ERP system ready for the digital era?.pptx
 
Modern Software Architectures - Overview
Modern Software Architectures - Overview Modern Software Architectures - Overview
Modern Software Architectures - Overview
 
Software development life cycle
Software development life cycle Software development life cycle
Software development life cycle
 
Data Mining Implementation process.pptx
Data Mining Implementation process.pptxData Mining Implementation process.pptx
Data Mining Implementation process.pptx
 
Data Integrity webinar - Essentials & Solutions
Data Integrity webinar - Essentials & SolutionsData Integrity webinar - Essentials & Solutions
Data Integrity webinar - Essentials & Solutions
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Online Crime Management System.ppt.pptx
Online Crime Management System.ppt.pptxOnline Crime Management System.ppt.pptx
Online Crime Management System.ppt.pptx
 
IRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank Website
IRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank WebsiteIRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank Website
IRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank Website
 
IRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank Website
IRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank WebsiteIRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank Website
IRJET- Web Scraping Techniques to Collect Bank Offer Data from Bank Website
 
Understanding big data testing
Understanding big data testingUnderstanding big data testing
Understanding big data testing
 
La metro measure using Dashboards - Oracle Primavera P6 Collaborate 14
La metro measure using Dashboards  - Oracle Primavera P6 Collaborate 14La metro measure using Dashboards  - Oracle Primavera P6 Collaborate 14
La metro measure using Dashboards - Oracle Primavera P6 Collaborate 14
 
IP Final project 12th
IP Final project 12thIP Final project 12th
IP Final project 12th
 

Recently uploaded

Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 

Recently uploaded (20)

Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 

Architectural aspects and design hypothesis of the data ingestion pipeline

  • 2. Introduction We stand at the cusp of a technological revolution that is completely data driven. The functionality of different systems and processes is dependent upon the way we process data and handle it from the stage of ingestion to execution. The data ingestion pipeline involves various stages ranging from data collection to data analytics. The data pipeline operates upon raw data from different platforms and databases and turns it into useful information with the help of powerful business intelligence tools.
  • 3. Architectural Aspects ● The architectural aspects of a data pipeline are fabricated in such a manner that the cleansing and transformation of data becomes as simple as possible. ● We need to extract data from warehouses and data lakes and put it into useful and crisp facts that can be converted as informatics. This informatics further becomes the base of knowledge engineering systems. ● One of the unique features of a data pipeline is the speed through which it processes data. This is primarily dependent upon three critical factors. ● The first is called the throughput rate which defines the amount of data that can be processed in a given amount of time. ● The second is called data reliability which ensures that an effective validation mechanism is established in the data pipeline to maintain high data quality. ● The third important factor is latency. In order to ensure that the response rate is high and volume of data processed is large, it is pertinent to ensure low-latency. Low latency means that the delay in the processing time should be as minimal as possible.
  • 4. The Design Hypotheses ● There are a large number of ways in which we can design a data pipeline. We mention the important stage through which we can layer the data pipeline architecture. ● The first stage is the stage of data extraction and involves mining of data across data warehouses and data lakes. It is at this stage that we validate data sets and ensure quality control. ● The next stage is the ingestion stage. It is at this stage that we read data from data sources with the help of an application programming interface. We also follow the process of extracting data sets of choice with help of data profiling. We examine the various characteristics of data and evaluate it from a business point of view. ● We now move to the stage of data transformation. This stage involves a lot of filters through which data passes and yields a qualitative output. This qualitative output can then be utilized for analytics processes and business intelligence. ● After all the stages have been completed, it is important to monitor data on the basis of various parameters and fix various issues that may arise. For this purpose, data quality engineers are employed to keep a constant vigil of the data pipeline.
  • 5. Concluding remarks The architectural pathways of a data pipeline may be diverse but follow a certain hierarchy of steps. Right from the process of ingestion to the process of analytics, the aim is to come up with state-of-the-art analytics which can help in transformative business intelligence.