SlideShare a Scribd company logo
1 of 17
Introduction to Data
Architecture
Lecture # 1
Dr. Saif Ur Rehman Malik
Introduction (cont…)
• Corporate data include everything found in the corporation in the way of
data.
• The most basic division of corporate data is by structured data and
unstructured data.
• As a rule, there are much more unstructured data than structured data.
• Unstructured data have two basic divisions—
• repetitive data and nonrepetitive data.
• Big data is made up of unstructured data.
Introduction (cont…)
• Nonrepetitive big data has a fundamentally different form than repetitive
unstructured big data.
• The differences between nonrepetitive big data and repetitive big data are
so large that they can be called the boundaries of the “great divide.”
• As a rule, nonrepetitive big data has MUCH greater business value than
repetitive big data.
Data Architecture
• Data architecture is about the larger picture of data and how it fits together in a typical organization.
Subdividing Data
Corporate Data
Structured Data
• Structured data is when data is in a standardized format, has a well-
defined structure, complies to a data model, follows a persistent
order, and is easily accessed by humans and programs. This data type
is generally stored in
z a database.
• Examples: SQL, Excel, or any relational database.
Unstructured Data
Unstructured data is information that is not arranged according to a preset data model or schema, and
therefore cannot be stored in a traditional relational database or RDBMS. Text and multimedia are two
common types of unstructured content.
Repetitive Unstructured
• A typical form of repetitive unstructured data in the corporation might be the data generated by an
analog machine.
• For example, a farmer has a machine that reads the identification of railroad cars as the railroad
cars pass through the farmer's property. Trains pass through the property night and day. The
electronic eye reads and records the passage of each car on the track.
Nonrepetitive Unstructured Data
• Nonrepetitive unstructured data are data that are nonrepetitive, such as e-mails.
• Each email can be long or short. The e-mail can be in English or Spanish (or some other
languages.) The author of the e-mail can say anything that he/she pleases. It is only a pure accident
if the contents of any e-mail are identical to the contents of any other email.
• And there are many forms of nonrepetitive unstructured data. There are voice recordings, there are
contracts, there are customer feedback messages, etc.
The Great Divide of Data
The Great Divide of Data
It is hardly obvious why there should be this great divide of data.
But there are some very
• good reasons for the divide:
• Repetitive data usually have very limited business value, while
nonrepetitive data are rich in business value.
• Repetitive data can be handled one way; nonrepetitive data are
handled very differently.
• Repetitive data can be analyzed one way, while nonrepetitive
data can be analyzed in a very different manner.
Textual/Nontextual Data
• Nonrepetitive unstructured data can be divided into textual and nontextual data.
Business Value
The Data Infrastructure
data_architecture.pptx
data_architecture.pptx
data_architecture.pptx

More Related Content

Similar to data_architecture.pptx

Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1RUHULAMINHAZARIKA
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPDr Geetha Mohan
 
Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Panorama Software
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
omama munir 58.pptx
omama munir 58.pptxomama munir 58.pptx
omama munir 58.pptxOmamaNoor2
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Big data
Big dataBig data
Big dataRiya
 
Connecting Data in a Data Fabric for Modern Business Analytics
Connecting Data in a Data Fabric for Modern Business AnalyticsConnecting Data in a Data Fabric for Modern Business Analytics
Connecting Data in a Data Fabric for Modern Business AnalyticsPete Aven
 
Unit 3 gathering information and data
Unit 3   gathering information and dataUnit 3   gathering information and data
Unit 3 gathering information and datamrcox
 
Big-Data 5V of big data engineering.pptx
Big-Data 5V of big data engineering.pptxBig-Data 5V of big data engineering.pptx
Big-Data 5V of big data engineering.pptxajajkhan16
 
Lesson 6 value & importance of information
Lesson 6 value & importance of informationLesson 6 value & importance of information
Lesson 6 value & importance of informationOneil Powers
 
File and data base management
File and data base managementFile and data base management
File and data base managementAsad Ahmed
 

Similar to data_architecture.pptx (20)

What is Data?
What is Data?What is Data?
What is Data?
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOP
 
Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Top BI trends and predictions for 2017
Top BI trends and predictions for 2017
 
Big data
Big dataBig data
Big data
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
omama munir 58.pptx
omama munir 58.pptxomama munir 58.pptx
omama munir 58.pptx
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Connecting Data in a Data Fabric for Modern Business Analytics
Connecting Data in a Data Fabric for Modern Business AnalyticsConnecting Data in a Data Fabric for Modern Business Analytics
Connecting Data in a Data Fabric for Modern Business Analytics
 
Big_Data.pptx
Big_Data.pptxBig_Data.pptx
Big_Data.pptx
 
Unit 3 gathering information and data
Unit 3   gathering information and dataUnit 3   gathering information and data
Unit 3 gathering information and data
 
Big-Data 5V of big data engineering.pptx
Big-Data 5V of big data engineering.pptxBig-Data 5V of big data engineering.pptx
Big-Data 5V of big data engineering.pptx
 
Lesson 6 value & importance of information
Lesson 6 value & importance of informationLesson 6 value & importance of information
Lesson 6 value & importance of information
 
File and data base management
File and data base managementFile and data base management
File and data base management
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
 

Recently uploaded

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfOverkill Security
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistandanishmna97
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...SOFTTECHHUB
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxjbellis
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهMohamed Sweelam
 

Recently uploaded (20)

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdf
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 

data_architecture.pptx

  • 1. Introduction to Data Architecture Lecture # 1 Dr. Saif Ur Rehman Malik
  • 2. Introduction (cont…) • Corporate data include everything found in the corporation in the way of data. • The most basic division of corporate data is by structured data and unstructured data. • As a rule, there are much more unstructured data than structured data. • Unstructured data have two basic divisions— • repetitive data and nonrepetitive data. • Big data is made up of unstructured data.
  • 3. Introduction (cont…) • Nonrepetitive big data has a fundamentally different form than repetitive unstructured big data. • The differences between nonrepetitive big data and repetitive big data are so large that they can be called the boundaries of the “great divide.” • As a rule, nonrepetitive big data has MUCH greater business value than repetitive big data.
  • 4. Data Architecture • Data architecture is about the larger picture of data and how it fits together in a typical organization.
  • 6. Structured Data • Structured data is when data is in a standardized format, has a well- defined structure, complies to a data model, follows a persistent order, and is easily accessed by humans and programs. This data type is generally stored in z a database. • Examples: SQL, Excel, or any relational database.
  • 7. Unstructured Data Unstructured data is information that is not arranged according to a preset data model or schema, and therefore cannot be stored in a traditional relational database or RDBMS. Text and multimedia are two common types of unstructured content.
  • 8. Repetitive Unstructured • A typical form of repetitive unstructured data in the corporation might be the data generated by an analog machine. • For example, a farmer has a machine that reads the identification of railroad cars as the railroad cars pass through the farmer's property. Trains pass through the property night and day. The electronic eye reads and records the passage of each car on the track.
  • 9. Nonrepetitive Unstructured Data • Nonrepetitive unstructured data are data that are nonrepetitive, such as e-mails. • Each email can be long or short. The e-mail can be in English or Spanish (or some other languages.) The author of the e-mail can say anything that he/she pleases. It is only a pure accident if the contents of any e-mail are identical to the contents of any other email. • And there are many forms of nonrepetitive unstructured data. There are voice recordings, there are contracts, there are customer feedback messages, etc.
  • 10. The Great Divide of Data
  • 11. The Great Divide of Data It is hardly obvious why there should be this great divide of data. But there are some very • good reasons for the divide: • Repetitive data usually have very limited business value, while nonrepetitive data are rich in business value. • Repetitive data can be handled one way; nonrepetitive data are handled very differently. • Repetitive data can be analyzed one way, while nonrepetitive data can be analyzed in a very different manner.
  • 12. Textual/Nontextual Data • Nonrepetitive unstructured data can be divided into textual and nontextual data.

Editor's Notes

  1. Structure of something is architecture
  2. Depicst, emails, all transactions, telephone conversations, chats, etc
  3. There are many ways to subdivide the data shown in Fig. 1.1.1. The way that is shown is only one of many ways data can be understood. One way to understand the data found in the corporation is to look at structured data and nonstructured data. Fig. 1.1.2 shows this subdivision of data.
  4. Structured data is highly specific and is stored in a predefined format, where unstructured data is a conglomeration of many varied types of data that are stored in their native formats. This means that structured data takes advantage of schema-on-write and unstructured data employs schema-on-read.
  5. It is not obvious at all, but the dividing line in unstructured data between unstructured repetitive data and unstructured nonrepetitive data is very significant. In fact, the dividing line between unstructured repetitive data and unstructured nonrepetitive data is so important that the division can be called the “great divide” of data.
  6. Tools and techniques that work in one world simply are not applicable to the other world and vice versa.
  7. The basic divisions of data that are shown in Fig. 1.1.6 are important for a lot of reasons. Each of the divisions of data requires their own infrastructure, their own technology, and their own treatment. Even though all forms of data exist in the same corporation, each of the forms of data may as well exist on different planets. They simply require their own treatment and their own unique infrastructure.
  8. there is a very high degree of business value for structured data. As an example of the value of structured data, it is really important to the business to have the correct bank account balance, both to the bank and to the customer. Textual data contain even more highly valued business data. When customers talk to an agent of the company through a call center, everything the customer says is valuable. And there is significantly less business value for nonrepetitive nontextual data and unstructured repetitive data.