SlideShare a Scribd company logo
Introduction to Data
Architecture
Lecture # 1
Dr. Saif Ur Rehman Malik
Introduction (cont…)
• Corporate data include everything found in the corporation in the way of
data.
• The most basic division of corporate data is by structured data and
unstructured data.
• As a rule, there are much more unstructured data than structured data.
• Unstructured data have two basic divisions—
• repetitive data and nonrepetitive data.
• Big data is made up of unstructured data.
Introduction (cont…)
• Nonrepetitive big data has a fundamentally different form than repetitive
unstructured big data.
• The differences between nonrepetitive big data and repetitive big data are
so large that they can be called the boundaries of the “great divide.”
• As a rule, nonrepetitive big data has MUCH greater business value than
repetitive big data.
Data Architecture
• Data architecture is about the larger picture of data and how it fits together in a typical organization.
Subdividing Data
Corporate Data
Structured Data
• Structured data is when data is in a standardized format, has a well-
defined structure, complies to a data model, follows a persistent
order, and is easily accessed by humans and programs. This data type
is generally stored in
z a database.
• Examples: SQL, Excel, or any relational database.
Unstructured Data
Unstructured data is information that is not arranged according to a preset data model or schema, and
therefore cannot be stored in a traditional relational database or RDBMS. Text and multimedia are two
common types of unstructured content.
Repetitive Unstructured
• A typical form of repetitive unstructured data in the corporation might be the data generated by an
analog machine.
• For example, a farmer has a machine that reads the identification of railroad cars as the railroad
cars pass through the farmer's property. Trains pass through the property night and day. The
electronic eye reads and records the passage of each car on the track.
Nonrepetitive Unstructured Data
• Nonrepetitive unstructured data are data that are nonrepetitive, such as e-mails.
• Each email can be long or short. The e-mail can be in English or Spanish (or some other
languages.) The author of the e-mail can say anything that he/she pleases. It is only a pure accident
if the contents of any e-mail are identical to the contents of any other email.
• And there are many forms of nonrepetitive unstructured data. There are voice recordings, there are
contracts, there are customer feedback messages, etc.
The Great Divide of Data
The Great Divide of Data
It is hardly obvious why there should be this great divide of data.
But there are some very
• good reasons for the divide:
• Repetitive data usually have very limited business value, while
nonrepetitive data are rich in business value.
• Repetitive data can be handled one way; nonrepetitive data are
handled very differently.
• Repetitive data can be analyzed one way, while nonrepetitive
data can be analyzed in a very different manner.
Textual/Nontextual Data
• Nonrepetitive unstructured data can be divided into textual and nontextual data.
Business Value
The Data Infrastructure
data_architecture.pptx
data_architecture.pptx
data_architecture.pptx

More Related Content

Similar to data_architecture.pptx

What is Data?
What is Data?What is Data?
What is Data?
Ranjit Nambisan
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
RUHULAMINHAZARIKA
 
Big data
Big dataBig data
Big data
Big dataBig data
Data science unit1
Data science unit1Data science unit1
Data science unit1
varshakumar21
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOP
Dr Geetha Mohan
 
Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Top BI trends and predictions for 2017
Top BI trends and predictions for 2017
Panorama Software
 
Big data
Big dataBig data
Big data
Sakshi Chawla
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
omama munir 58.pptx
omama munir 58.pptxomama munir 58.pptx
omama munir 58.pptx
OmamaNoor2
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
Tomy Rhymond
 
Big data
Big dataBig data
Big data
Riya
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Deepika ParthaSarathy
 
Connecting Data in a Data Fabric for Modern Business Analytics
Connecting Data in a Data Fabric for Modern Business AnalyticsConnecting Data in a Data Fabric for Modern Business Analytics
Connecting Data in a Data Fabric for Modern Business Analytics
Pete Aven
 
Big_Data.pptx
Big_Data.pptxBig_Data.pptx
Big_Data.pptx
mohamedibrahim946387
 
Unit 3 gathering information and data
Unit 3   gathering information and dataUnit 3   gathering information and data
Unit 3 gathering information and data
mrcox
 
Big-Data 5V of big data engineering.pptx
Big-Data 5V of big data engineering.pptxBig-Data 5V of big data engineering.pptx
Big-Data 5V of big data engineering.pptx
ajajkhan16
 
Lesson 6 value & importance of information
Lesson 6 value & importance of informationLesson 6 value & importance of information
Lesson 6 value & importance of information
Oneil Powers
 
File and data base management
File and data base managementFile and data base management
File and data base management
Asad Ahmed
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
Terry Bunio
 

Similar to data_architecture.pptx (20)

What is Data?
What is Data?What is Data?
What is Data?
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOP
 
Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Top BI trends and predictions for 2017
Top BI trends and predictions for 2017
 
Big data
Big dataBig data
Big data
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
omama munir 58.pptx
omama munir 58.pptxomama munir 58.pptx
omama munir 58.pptx
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Connecting Data in a Data Fabric for Modern Business Analytics
Connecting Data in a Data Fabric for Modern Business AnalyticsConnecting Data in a Data Fabric for Modern Business Analytics
Connecting Data in a Data Fabric for Modern Business Analytics
 
Big_Data.pptx
Big_Data.pptxBig_Data.pptx
Big_Data.pptx
 
Unit 3 gathering information and data
Unit 3   gathering information and dataUnit 3   gathering information and data
Unit 3 gathering information and data
 
Big-Data 5V of big data engineering.pptx
Big-Data 5V of big data engineering.pptxBig-Data 5V of big data engineering.pptx
Big-Data 5V of big data engineering.pptx
 
Lesson 6 value & importance of information
Lesson 6 value & importance of informationLesson 6 value & importance of information
Lesson 6 value & importance of information
 
File and data base management
File and data base managementFile and data base management
File and data base management
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 

data_architecture.pptx

  • 1. Introduction to Data Architecture Lecture # 1 Dr. Saif Ur Rehman Malik
  • 2. Introduction (cont…) • Corporate data include everything found in the corporation in the way of data. • The most basic division of corporate data is by structured data and unstructured data. • As a rule, there are much more unstructured data than structured data. • Unstructured data have two basic divisions— • repetitive data and nonrepetitive data. • Big data is made up of unstructured data.
  • 3. Introduction (cont…) • Nonrepetitive big data has a fundamentally different form than repetitive unstructured big data. • The differences between nonrepetitive big data and repetitive big data are so large that they can be called the boundaries of the “great divide.” • As a rule, nonrepetitive big data has MUCH greater business value than repetitive big data.
  • 4. Data Architecture • Data architecture is about the larger picture of data and how it fits together in a typical organization.
  • 6. Structured Data • Structured data is when data is in a standardized format, has a well- defined structure, complies to a data model, follows a persistent order, and is easily accessed by humans and programs. This data type is generally stored in z a database. • Examples: SQL, Excel, or any relational database.
  • 7. Unstructured Data Unstructured data is information that is not arranged according to a preset data model or schema, and therefore cannot be stored in a traditional relational database or RDBMS. Text and multimedia are two common types of unstructured content.
  • 8. Repetitive Unstructured • A typical form of repetitive unstructured data in the corporation might be the data generated by an analog machine. • For example, a farmer has a machine that reads the identification of railroad cars as the railroad cars pass through the farmer's property. Trains pass through the property night and day. The electronic eye reads and records the passage of each car on the track.
  • 9. Nonrepetitive Unstructured Data • Nonrepetitive unstructured data are data that are nonrepetitive, such as e-mails. • Each email can be long or short. The e-mail can be in English or Spanish (or some other languages.) The author of the e-mail can say anything that he/she pleases. It is only a pure accident if the contents of any e-mail are identical to the contents of any other email. • And there are many forms of nonrepetitive unstructured data. There are voice recordings, there are contracts, there are customer feedback messages, etc.
  • 10. The Great Divide of Data
  • 11. The Great Divide of Data It is hardly obvious why there should be this great divide of data. But there are some very • good reasons for the divide: • Repetitive data usually have very limited business value, while nonrepetitive data are rich in business value. • Repetitive data can be handled one way; nonrepetitive data are handled very differently. • Repetitive data can be analyzed one way, while nonrepetitive data can be analyzed in a very different manner.
  • 12. Textual/Nontextual Data • Nonrepetitive unstructured data can be divided into textual and nontextual data.

Editor's Notes

  1. Structure of something is architecture
  2. Depicst, emails, all transactions, telephone conversations, chats, etc
  3. There are many ways to subdivide the data shown in Fig. 1.1.1. The way that is shown is only one of many ways data can be understood. One way to understand the data found in the corporation is to look at structured data and nonstructured data. Fig. 1.1.2 shows this subdivision of data.
  4. Structured data is highly specific and is stored in a predefined format, where unstructured data is a conglomeration of many varied types of data that are stored in their native formats. This means that structured data takes advantage of schema-on-write and unstructured data employs schema-on-read.
  5. It is not obvious at all, but the dividing line in unstructured data between unstructured repetitive data and unstructured nonrepetitive data is very significant. In fact, the dividing line between unstructured repetitive data and unstructured nonrepetitive data is so important that the division can be called the “great divide” of data.
  6. Tools and techniques that work in one world simply are not applicable to the other world and vice versa.
  7. The basic divisions of data that are shown in Fig. 1.1.6 are important for a lot of reasons. Each of the divisions of data requires their own infrastructure, their own technology, and their own treatment. Even though all forms of data exist in the same corporation, each of the forms of data may as well exist on different planets. They simply require their own treatment and their own unique infrastructure.
  8. there is a very high degree of business value for structured data. As an example of the value of structured data, it is really important to the business to have the correct bank account balance, both to the bank and to the customer. Textual data contain even more highly valued business data. When customers talk to an agent of the company through a call center, everything the customer says is valuable. And there is significantly less business value for nonrepetitive nontextual data and unstructured repetitive data.