SlideShare a Scribd company logo
Introtobigdata
&applicationsDay -2
Oct 2020
Presented by: Parviz Vakili
parviz.vakili@gmail.com
Refences
[1]. DAMA-DMBOK (2017) Data Management Body of Knowledge (Second Edition)-DAMA
International
[2]. Data Strategy (2017) How to profit from a world of big data, analytics and the internet of things – By
Bernard Marr - Kogan Page
[3]. Big Data Analytics for Entrepreneurial Success (2019) – By Soraya Sedkaoui - IGI Global
[4]. https://www.eckerson.com/
[5]. https://www.lightsondata.com/
[6]. https://www.dataedo.com/
[7]. https://www.linkedin.com/in/denise-harders-4908a967/
[8]. http://www.fabak.ir/
[9]. https://www.sap.com/products/powerdesigner-data-modeling-tools.html
CREDITS: This presentation template was created by
Slidesgo, including icons by Flaticon, and infographics &
images by Freepik and illustrations by Storiesplease inform me if some references was missing.
BigData classification
Data Architecture
Data Architecture
Data Architecture
Data Architecture
Data Architecture
Data Architecture
Data Architecture
SimplifiedZachman Framework
SimplifiedZachman Framework
•What (the inventory column): Entities used to build the architecture
•How (the process column): Activities performed
•Where (the distribution column): Business location and technology location
•Who (the responsibility column): Roles and organizations
•When (the timing column): Intervals, events, cycles, and schedules
•Why (the motivation column): Goals, strategies, and means
SimplifiedZachman Framework
•The executive perspective (business context): Lists of business elements defining scope in identification models.
•The business management perspective (business concepts): Clarification of the relationships between business concepts defined
by Executive Leaders as Owners in definition models.
•The architect perspective (business logic): System logical models detailing system requirements and unconstrained design
represented by Architects as Designers in representation models.
•The engineer perspective (business physics): Physical models optimizing the design for implementation for specific use under the
constraints of specific technology, people, costs, and timeframes specified by Engineers as Builders in specification models.
•The technician perspective (component assemblies): A technology-specific, out-of-context view of how components are
assembled and operate configured by Technicians as Implementers in configuration models.
•The user perspective (operations classes): Actual functioning instances used by Workers as Participants. There are no models in
this perspective.
Data Architecture
Architecture refers to the art and science of building
things (especially habitable structures) and to the results
of the process of building – the buildings themselves. In
a more general sense, architecture refers to an organized
arrangement of component elements intended to
optimize the function, performance, feasibility, cost, and
aesthetics of an overall structure or system.
Data Architecture is fundamental to data management.
Because most organizations have more data than
individual people can comprehend, it is necessary to
represent organizational data at different levels of
abstraction so that it can be understood and management
can make decisions about it.
Data ArchitectureDefinition
Identifying the data needs of the enterprise
(regardless of structure), and designing
and maintaining the master blueprints to
meet those needs. Using master blueprints
to guide data integration, control data
assets, and align data investments with
business strategy.
ContextDiagram: Data Architecture
Conceptual DW/BIand BigData Architecture
UDAPArchitucture
BigData Analytics referencearchitecture
Data extraction
Data extracted from data sources may be stored
temporarily into a temporary data store or directly
transferred, and loaded into a Raw data store. Streaming
data may also be extracted, and stored temporarily.
BigData Analytics referencearchitecture
Data loading and pre-processing
Data are transferred loaded and processed, such as data
compression. The Raw data store contains unprocessed
data.
BigData Analytics referencearchitecture
Data processing
Data from the Raw data store may be cleaned or
combined, and saved into a new Preparation data
store, which temporarily holds processed data.
Cleaning and combining refer to quality
improvement of the raw unprocessed data. Raw
and prepared data may be replicated between data
stores. Also, new information may be extracted
from the Raw data store for Deep Analytics.
Information extraction refers to storing of raw
data in a structured format. The Enterprise data
store is used for holding of cleaned and processed
data. The Sand-box store is used for containing
data for experimental purposes of data analysis.
BigData Analytics referencearchitecture
Data analysis
Deep Analytics refers to execution of batch-
processing jobs for in situ data. Results of the
analysis may be stored back into the original data
stores, into a separate Analysis results store or
into a Publish & subscribe store. Publish &
subscribe store enables storage and retrieval of
analysis results indirectly between subscribers
and publishers in the system. Stream processing
refers to processing of extracted streaming data,
which may be saved temporarily before analysis.
Stream analysis refers to analysis of streaming
data, to be saved into Stream analysis results.
BigData Analytics referencearchitecture
Data loading and transformation
Results of the data analysis may also be
transformed into a Serving data store, which
serve interfacing and visualization applications.
A typical application for transformation and
Serving data store is servicing of Online
Analytical Processing (OLAP) queries.
BigData Analytics referencearchitecture
Interfacing and visualization
Analyzed data may be visualized in several
ways. Dashboarding application refers to a
simple UI, where typically key information is
visualized without user control. Visualization
application provides detailed visualization and
control functions, and is realized with a Business
Intelligence tool in the enterprise domain. End
user application has a limited set of control
functions, and could be realized as a mobile
application for end users.
BigData Analytics referencearchitecture
Joband modelspecification
Batch-processing jobs may be
specified in the user interface.
The jobs may be saved and
scheduled with job scheduling
tools. Models/algorithms may
also be specified in the user
interface (Model specification).
Machine learning tools may be
utilized for training of the
models based on new extracted
data.
Data GOVERNANCE
Data GOVERNANCE
Data GOVERNANCE
Data GOVERNANCE
Data GOVERNANCE
Data GOVERNANCE
Data GOVERNANCE
Data GOVERNANCE
Data GOVERNANCE
Data GOVERNANCE
Data GOVERNANCE
Data Governance
Data Governance (DG) is defined as the exercise of
authority and control (planning, monitoring, and
enforcement) over the management of data assets. All
organizations make decisions about data, regardless of
whether they have a formal Data Governance function.
Those that establish a formal Data Governance program
exercise authority and control with greater intentionality
(Seiner, 2014). Such organizations are better able to
increase the value they get from their data assets. The
Data Governance function guides all other data
management functions. The purpose of Data
Governance is to ensure that data is managed properly,
according to policies and best practices
Data Governance Definition
The exercise of authority, control, and
shared decision-making (planning,
monitoring, and enforcement) over the
management of data assets.
ContextDiagram: Data Governance
Data Governance and Data Management
Data Governance and Data Management
Data Governance Organization Parts
Typical Data Governance Committees/ Bodies
An Example ofan Operating Framework
Maturity Model
-Stanford’s Maturity Model (https://lnkd.in/gs-Qsp4)
-IBM’s Maturity Model (https://lnkd.in/gPArsvH)
-Kalido Maturity Model(https://lnkd.in/gg3J7aJ)
-DataFlux’s Maturity Model (https://lnkd.in/gSBeRzx)
-Gartner’s Maturity Model(https://lnkd.in/gc9gckZ)
-Oracle’s Maturity Model(https://lnkd.in/gmJ7tBF)
-Open Universiteit Nederland Maturity Model (https://lnkd.in/gDd2Hd8)
Maturity Model
Data Governance
reference:
www.fabak.ir
Data Development (Modeling&Design)
Data Development (Modeling&Design)
Data Development (Modeling&Design)
Data Development (Modeling&Design)
Data Development (Modeling&Design)
Data Development (Modeling&Design)
Modeling& Design
Data modeling is the process of discovering, analyzing,
and scoping data requirements, and then representing
and communicating these data requirements in a precise
form called the data model. Data modeling is a critical
component of data management. The modeling process
requires that organizations discover and document how
their data fits together. The modeling process itself
designs how data fits together (Simsion, 2013). Data
models depict and enable an organization to understand
its data assets.
Data ModelingDefinition
Data modeling is the process of
discovering, analyzing, and scoping data
requirements, and then representing and
communicating these data requirements in
a precise form called the data model. This
process is iterative and may include a
conceptual, logical, and physical model.
ContextDiagram: Data modeling
different schemes
There are a number of different schemes
used to represent data. The six most
commonly used schemes are: Relational,
Dimensional, Object-Oriented, Fact-
Based, Time-Based, and NoSQL. Models
of these schemes exist at three levels of
detail: conceptual, logical, and physical.
Each model contains a set of components.
Examples of components are entities,
relationships, facts, keys, and attributes.
Once a model is built, it needs to be
reviewed and once approved, maintained.
Entity
Outside of data modeling, the definition of
entity is a thing that exists separate from
other things. Within data modeling, an
entity is a thing about which an
organization collects information.
CommonlyUsedEntity Categories
ModelingSchemesand Notations
CDM,LDM,PDM
Conceptual Data Model
The conceptual Data Model (CDM) helps you analyze the conceptual structure of an
information system and then identifies the major entities that need to be described, the
attributes in those entities, and the relationships between those entities. Conceptual data
models are more abstract than logical or physical data models.
Logical Data Model
The logical Data Model (LDM) helps you analyze the structure of the information system,
independent of any specific physical database implementation. LDM already involves entity
identifiers, which are not as abstract as CDM, but do not allow you to design elements of
views, indexes, and other more specific physical data models.
Physical Data Model
The physical Data Model (PDM) helps you analyze tables, views, and other database objects,
including the multidimensional objects required by the Data warehouse. PDM is more specific
than CDM and LDM. You can model, reverse engineer, and Kazuo into all the most popular
SchemetoDatabase Cross Reference
THANKS
Does anyone have any questions?
parviz.vakili@gmail.com
+98 912 444 2418
https://www.linkedin.com/in/parvizvakili/

More Related Content

What's hot

Big data
Big dataBig data
Big data
kalyani reddy
 
Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
iACT Global
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
Krisshhna Daasaarii
 
What is Big Data ?
What is Big Data ?What is Big Data ?
What is Big Data ?
AkhmadZakiAlsafi
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
BrijeshGoyani
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
rajkamaltibacademy
 
Big data.
Big data.Big data.
Big data.
MeganShaw38
 
Bigdata
BigdataBigdata
Big data tools
Big data toolsBig data tools
Big data tools
Novita Sari
 
Big Data Hadoop
Big Data HadoopBig Data Hadoop
Big Data Hadoop
Techsparks
 
Big data
Big dataBig data
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
Ashraf Uddin
 
Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdf
Anand572211
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentationAASTHA PANDEY
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
kk1718
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Big Data Projects Research Ideas
Big Data Projects Research IdeasBig Data Projects Research Ideas
Big Data Projects Research Ideas
Matlab Simulation
 
Big Data
Big DataBig Data
Big Data
Neha Mehta
 
Big data
Big dataBig data
Big data
Nausheen Hasan
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
Sanjeev Solanki
 

What's hot (20)

Big data
Big dataBig data
Big data
 
Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
What is Big Data ?
What is Big Data ?What is Big Data ?
What is Big Data ?
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
Big data.
Big data.Big data.
Big data.
 
Bigdata
BigdataBigdata
Bigdata
 
Big data tools
Big data toolsBig data tools
Big data tools
 
Big Data Hadoop
Big Data HadoopBig Data Hadoop
Big Data Hadoop
 
Big data
Big dataBig data
Big data
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdf
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentation
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Big Data Projects Research Ideas
Big Data Projects Research IdeasBig Data Projects Research Ideas
Big Data Projects Research Ideas
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
 

Similar to Intro to big data and applications - day 2

Pysyvästi laadukasta masterdataa SmartMDM:n avulla
Pysyvästi laadukasta masterdataa SmartMDM:n avullaPysyvästi laadukasta masterdataa SmartMDM:n avulla
Pysyvästi laadukasta masterdataa SmartMDM:n avulla
Bilot
 
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
YogeshIJTSRD
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
Sandeep Garg
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using Qlik
IRJET Journal
 
Data Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkData Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural Framework
Dr. Sunil Kr. Pandey
 
White Paper-2-Mapping Manager-Bringing Agility To Business Intelligence
White Paper-2-Mapping Manager-Bringing Agility To Business IntelligenceWhite Paper-2-Mapping Manager-Bringing Agility To Business Intelligence
White Paper-2-Mapping Manager-Bringing Agility To Business Intelligence
AnalytixDataServices
 
Data mining
Data miningData mining
Data mining
sweetysweety8
 
Seminario Big Data - 27/11/2017
Seminario Big Data - 27/11/2017Seminario Big Data - 27/11/2017
Seminario Big Data - 27/11/2017
Ordine Ingegneri Lecco
 
Seminario Big Data
Seminario Big DataSeminario Big Data
Seminario Big Data
Roberto Messora
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Cambridge Semantics
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
Denodo
 
Data Architecture Process in a BI environment
Data Architecture Process in a BI environmentData Architecture Process in a BI environment
Data Architecture Process in a BI environmentSasha Citino
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Daniel Zivkovic
 
GraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfGraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdf
Neo4j
 
Building the Architecture for Analytic Competition
Building the Architecture for Analytic CompetitionBuilding the Architecture for Analytic Competition
Building the Architecture for Analytic Competition
William McKnight
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
Wilfried Hoge
 
Project report
Project reportProject report
Project report
VISHAL VERMA
 
A BUSINESS INTELLIGENCE PLATFORM IMPLEMENTED IN A BIG DATA SYSTEM EMBEDDING D...
A BUSINESS INTELLIGENCE PLATFORM IMPLEMENTED IN A BIG DATA SYSTEM EMBEDDING D...A BUSINESS INTELLIGENCE PLATFORM IMPLEMENTED IN A BIG DATA SYSTEM EMBEDDING D...
A BUSINESS INTELLIGENCE PLATFORM IMPLEMENTED IN A BIG DATA SYSTEM EMBEDDING D...
IJDKP
 
IRJET- Search Improvement using Digital Thread in Data Analytics
IRJET- Search Improvement using Digital Thread in Data AnalyticsIRJET- Search Improvement using Digital Thread in Data Analytics
IRJET- Search Improvement using Digital Thread in Data Analytics
IRJET Journal
 

Similar to Intro to big data and applications - day 2 (20)

Pysyvästi laadukasta masterdataa SmartMDM:n avulla
Pysyvästi laadukasta masterdataa SmartMDM:n avullaPysyvästi laadukasta masterdataa SmartMDM:n avulla
Pysyvästi laadukasta masterdataa SmartMDM:n avulla
 
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using Qlik
 
Data Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkData Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural Framework
 
White Paper-2-Mapping Manager-Bringing Agility To Business Intelligence
White Paper-2-Mapping Manager-Bringing Agility To Business IntelligenceWhite Paper-2-Mapping Manager-Bringing Agility To Business Intelligence
White Paper-2-Mapping Manager-Bringing Agility To Business Intelligence
 
Data mining
Data miningData mining
Data mining
 
Seminario Big Data - 27/11/2017
Seminario Big Data - 27/11/2017Seminario Big Data - 27/11/2017
Seminario Big Data - 27/11/2017
 
Seminario Big Data
Seminario Big DataSeminario Big Data
Seminario Big Data
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Data Architecture Process in a BI environment
Data Architecture Process in a BI environmentData Architecture Process in a BI environment
Data Architecture Process in a BI environment
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
GraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfGraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdf
 
Building the Architecture for Analytic Competition
Building the Architecture for Analytic CompetitionBuilding the Architecture for Analytic Competition
Building the Architecture for Analytic Competition
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
 
Project report
Project reportProject report
Project report
 
A BUSINESS INTELLIGENCE PLATFORM IMPLEMENTED IN A BIG DATA SYSTEM EMBEDDING D...
A BUSINESS INTELLIGENCE PLATFORM IMPLEMENTED IN A BIG DATA SYSTEM EMBEDDING D...A BUSINESS INTELLIGENCE PLATFORM IMPLEMENTED IN A BIG DATA SYSTEM EMBEDDING D...
A BUSINESS INTELLIGENCE PLATFORM IMPLEMENTED IN A BIG DATA SYSTEM EMBEDDING D...
 
IRJET- Search Improvement using Digital Thread in Data Analytics
IRJET- Search Improvement using Digital Thread in Data AnalyticsIRJET- Search Improvement using Digital Thread in Data Analytics
IRJET- Search Improvement using Digital Thread in Data Analytics
 

Recently uploaded

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 

Recently uploaded (20)

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 

Intro to big data and applications - day 2

  • 1. Introtobigdata &applicationsDay -2 Oct 2020 Presented by: Parviz Vakili parviz.vakili@gmail.com
  • 2. Refences [1]. DAMA-DMBOK (2017) Data Management Body of Knowledge (Second Edition)-DAMA International [2]. Data Strategy (2017) How to profit from a world of big data, analytics and the internet of things – By Bernard Marr - Kogan Page [3]. Big Data Analytics for Entrepreneurial Success (2019) – By Soraya Sedkaoui - IGI Global [4]. https://www.eckerson.com/ [5]. https://www.lightsondata.com/ [6]. https://www.dataedo.com/ [7]. https://www.linkedin.com/in/denise-harders-4908a967/ [8]. http://www.fabak.ir/ [9]. https://www.sap.com/products/powerdesigner-data-modeling-tools.html CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, and infographics & images by Freepik and illustrations by Storiesplease inform me if some references was missing.
  • 12. SimplifiedZachman Framework •What (the inventory column): Entities used to build the architecture •How (the process column): Activities performed •Where (the distribution column): Business location and technology location •Who (the responsibility column): Roles and organizations •When (the timing column): Intervals, events, cycles, and schedules •Why (the motivation column): Goals, strategies, and means
  • 13. SimplifiedZachman Framework •The executive perspective (business context): Lists of business elements defining scope in identification models. •The business management perspective (business concepts): Clarification of the relationships between business concepts defined by Executive Leaders as Owners in definition models. •The architect perspective (business logic): System logical models detailing system requirements and unconstrained design represented by Architects as Designers in representation models. •The engineer perspective (business physics): Physical models optimizing the design for implementation for specific use under the constraints of specific technology, people, costs, and timeframes specified by Engineers as Builders in specification models. •The technician perspective (component assemblies): A technology-specific, out-of-context view of how components are assembled and operate configured by Technicians as Implementers in configuration models. •The user perspective (operations classes): Actual functioning instances used by Workers as Participants. There are no models in this perspective.
  • 14. Data Architecture Architecture refers to the art and science of building things (especially habitable structures) and to the results of the process of building – the buildings themselves. In a more general sense, architecture refers to an organized arrangement of component elements intended to optimize the function, performance, feasibility, cost, and aesthetics of an overall structure or system. Data Architecture is fundamental to data management. Because most organizations have more data than individual people can comprehend, it is necessary to represent organizational data at different levels of abstraction so that it can be understood and management can make decisions about it.
  • 15. Data ArchitectureDefinition Identifying the data needs of the enterprise (regardless of structure), and designing and maintaining the master blueprints to meet those needs. Using master blueprints to guide data integration, control data assets, and align data investments with business strategy.
  • 20. Data extraction Data extracted from data sources may be stored temporarily into a temporary data store or directly transferred, and loaded into a Raw data store. Streaming data may also be extracted, and stored temporarily.
  • 22. Data loading and pre-processing Data are transferred loaded and processed, such as data compression. The Raw data store contains unprocessed data.
  • 24. Data processing Data from the Raw data store may be cleaned or combined, and saved into a new Preparation data store, which temporarily holds processed data. Cleaning and combining refer to quality improvement of the raw unprocessed data. Raw and prepared data may be replicated between data stores. Also, new information may be extracted from the Raw data store for Deep Analytics. Information extraction refers to storing of raw data in a structured format. The Enterprise data store is used for holding of cleaned and processed data. The Sand-box store is used for containing data for experimental purposes of data analysis.
  • 26. Data analysis Deep Analytics refers to execution of batch- processing jobs for in situ data. Results of the analysis may be stored back into the original data stores, into a separate Analysis results store or into a Publish & subscribe store. Publish & subscribe store enables storage and retrieval of analysis results indirectly between subscribers and publishers in the system. Stream processing refers to processing of extracted streaming data, which may be saved temporarily before analysis. Stream analysis refers to analysis of streaming data, to be saved into Stream analysis results.
  • 28. Data loading and transformation Results of the data analysis may also be transformed into a Serving data store, which serve interfacing and visualization applications. A typical application for transformation and Serving data store is servicing of Online Analytical Processing (OLAP) queries.
  • 30. Interfacing and visualization Analyzed data may be visualized in several ways. Dashboarding application refers to a simple UI, where typically key information is visualized without user control. Visualization application provides detailed visualization and control functions, and is realized with a Business Intelligence tool in the enterprise domain. End user application has a limited set of control functions, and could be realized as a mobile application for end users.
  • 32. Joband modelspecification Batch-processing jobs may be specified in the user interface. The jobs may be saved and scheduled with job scheduling tools. Models/algorithms may also be specified in the user interface (Model specification). Machine learning tools may be utilized for training of the models based on new extracted data.
  • 44. Data Governance Data Governance (DG) is defined as the exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets. All organizations make decisions about data, regardless of whether they have a formal Data Governance function. Those that establish a formal Data Governance program exercise authority and control with greater intentionality (Seiner, 2014). Such organizations are better able to increase the value they get from their data assets. The Data Governance function guides all other data management functions. The purpose of Data Governance is to ensure that data is managed properly, according to policies and best practices
  • 45. Data Governance Definition The exercise of authority, control, and shared decision-making (planning, monitoring, and enforcement) over the management of data assets.
  • 47. Data Governance and Data Management
  • 48. Data Governance and Data Management
  • 50. Typical Data Governance Committees/ Bodies
  • 51. An Example ofan Operating Framework
  • 52. Maturity Model -Stanford’s Maturity Model (https://lnkd.in/gs-Qsp4) -IBM’s Maturity Model (https://lnkd.in/gPArsvH) -Kalido Maturity Model(https://lnkd.in/gg3J7aJ) -DataFlux’s Maturity Model (https://lnkd.in/gSBeRzx) -Gartner’s Maturity Model(https://lnkd.in/gc9gckZ) -Oracle’s Maturity Model(https://lnkd.in/gmJ7tBF) -Open Universiteit Nederland Maturity Model (https://lnkd.in/gDd2Hd8)
  • 61. Modeling& Design Data modeling is the process of discovering, analyzing, and scoping data requirements, and then representing and communicating these data requirements in a precise form called the data model. Data modeling is a critical component of data management. The modeling process requires that organizations discover and document how their data fits together. The modeling process itself designs how data fits together (Simsion, 2013). Data models depict and enable an organization to understand its data assets.
  • 62. Data ModelingDefinition Data modeling is the process of discovering, analyzing, and scoping data requirements, and then representing and communicating these data requirements in a precise form called the data model. This process is iterative and may include a conceptual, logical, and physical model.
  • 64. different schemes There are a number of different schemes used to represent data. The six most commonly used schemes are: Relational, Dimensional, Object-Oriented, Fact- Based, Time-Based, and NoSQL. Models of these schemes exist at three levels of detail: conceptual, logical, and physical. Each model contains a set of components. Examples of components are entities, relationships, facts, keys, and attributes. Once a model is built, it needs to be reviewed and once approved, maintained.
  • 65. Entity Outside of data modeling, the definition of entity is a thing that exists separate from other things. Within data modeling, an entity is a thing about which an organization collects information.
  • 68. CDM,LDM,PDM Conceptual Data Model The conceptual Data Model (CDM) helps you analyze the conceptual structure of an information system and then identifies the major entities that need to be described, the attributes in those entities, and the relationships between those entities. Conceptual data models are more abstract than logical or physical data models. Logical Data Model The logical Data Model (LDM) helps you analyze the structure of the information system, independent of any specific physical database implementation. LDM already involves entity identifiers, which are not as abstract as CDM, but do not allow you to design elements of views, indexes, and other more specific physical data models. Physical Data Model The physical Data Model (PDM) helps you analyze tables, views, and other database objects, including the multidimensional objects required by the Data warehouse. PDM is more specific than CDM and LDM. You can model, reverse engineer, and Kazuo into all the most popular
  • 70. THANKS Does anyone have any questions? parviz.vakili@gmail.com +98 912 444 2418 https://www.linkedin.com/in/parvizvakili/