Intro to big data and applications - day 2

Introtobigdata
&applicationsDay -2
Oct 2020
Presented by: Parviz Vakili
parviz.vakili@gmail.com

Refences
[1]. DAMA-DMBOK (2017) Data Management Body of Knowledge (Second Edition)-DAMA
International
[2]. Data Strategy (2017) How to profit from a world of big data, analytics and the internet of things – By
Bernard Marr - Kogan Page
[3]. Big Data Analytics for Entrepreneurial Success (2019) – By Soraya Sedkaoui - IGI Global
[4]. https://www.eckerson.com/
[5]. https://www.lightsondata.com/
[6]. https://www.dataedo.com/
[7]. https://www.linkedin.com/in/denise-harders-4908a967/
[8]. http://www.fabak.ir/
[9]. https://www.sap.com/products/powerdesigner-data-modeling-tools.html
CREDITS: This presentation template was created by
Slidesgo, including icons by Flaticon, and infographics &
images by Freepik and illustrations by Storiesplease inform me if some references was missing.

SimplifiedZachman Framework
•What (the inventory column): Entities used to build the architecture
•How (the process column): Activities performed
•Where (the distribution column): Business location and technology location
•Who (the responsibility column): Roles and organizations
•When (the timing column): Intervals, events, cycles, and schedules
•Why (the motivation column): Goals, strategies, and means

SimplifiedZachman Framework
•The executive perspective (business context): Lists of business elements defining scope in identification models.
•The business management perspective (business concepts): Clarification of the relationships between business concepts defined
by Executive Leaders as Owners in definition models.
•The architect perspective (business logic): System logical models detailing system requirements and unconstrained design
represented by Architects as Designers in representation models.
•The engineer perspective (business physics): Physical models optimizing the design for implementation for specific use under the
constraints of specific technology, people, costs, and timeframes specified by Engineers as Builders in specification models.
•The technician perspective (component assemblies): A technology-specific, out-of-context view of how components are
assembled and operate configured by Technicians as Implementers in configuration models.
•The user perspective (operations classes): Actual functioning instances used by Workers as Participants. There are no models in
this perspective.

Data Architecture
Architecture refers to the art and science of building
things (especially habitable structures) and to the results
of the process of building – the buildings themselves. In
a more general sense, architecture refers to an organized
arrangement of component elements intended to
optimize the function, performance, feasibility, cost, and
aesthetics of an overall structure or system.
Data Architecture is fundamental to data management.
Because most organizations have more data than
individual people can comprehend, it is necessary to
represent organizational data at different levels of
abstraction so that it can be understood and management
can make decisions about it.

Data ArchitectureDefinition
Identifying the data needs of the enterprise
(regardless of structure), and designing
and maintaining the master blueprints to
meet those needs. Using master blueprints
to guide data integration, control data
assets, and align data investments with
business strategy.

ContextDiagram: Data Architecture

Conceptual DW/BIand BigData Architecture

BigData Analytics referencearchitecture

Data extraction
Data extracted from data sources may be stored
temporarily into a temporary data store or directly
transferred, and loaded into a Raw data store. Streaming
data may also be extracted, and stored temporarily.

Data loading and pre-processing
Data are transferred loaded and processed, such as data
compression. The Raw data store contains unprocessed
data.

Data processing
Data from the Raw data store may be cleaned or
combined, and saved into a new Preparation data
store, which temporarily holds processed data.
Cleaning and combining refer to quality
improvement of the raw unprocessed data. Raw
and prepared data may be replicated between data
stores. Also, new information may be extracted
from the Raw data store for Deep Analytics.
Information extraction refers to storing of raw
data in a structured format. The Enterprise data
store is used for holding of cleaned and processed
data. The Sand-box store is used for containing
data for experimental purposes of data analysis.

Data analysis
Deep Analytics refers to execution of batch-
processing jobs for in situ data. Results of the
analysis may be stored back into the original data
stores, into a separate Analysis results store or
into a Publish & subscribe store. Publish &
subscribe store enables storage and retrieval of
analysis results indirectly between subscribers
and publishers in the system. Stream processing
refers to processing of extracted streaming data,
which may be saved temporarily before analysis.
Stream analysis refers to analysis of streaming
data, to be saved into Stream analysis results.

Data loading and transformation
Results of the data analysis may also be
transformed into a Serving data store, which
serve interfacing and visualization applications.
A typical application for transformation and
Serving data store is servicing of Online
Analytical Processing (OLAP) queries.

Interfacing and visualization
Analyzed data may be visualized in several
ways. Dashboarding application refers to a
simple UI, where typically key information is
visualized without user control. Visualization
application provides detailed visualization and
control functions, and is realized with a Business
Intelligence tool in the enterprise domain. End
user application has a limited set of control
functions, and could be realized as a mobile
application for end users.

Joband modelspecification
Batch-processing jobs may be
specified in the user interface.
The jobs may be saved and
scheduled with job scheduling
tools. Models/algorithms may
also be specified in the user
interface (Model specification).
Machine learning tools may be
utilized for training of the
models based on new extracted
data.

Data Governance
Data Governance (DG) is defined as the exercise of
authority and control (planning, monitoring, and
enforcement) over the management of data assets. All
organizations make decisions about data, regardless of
whether they have a formal Data Governance function.
Those that establish a formal Data Governance program
exercise authority and control with greater intentionality
(Seiner, 2014). Such organizations are better able to
increase the value they get from their data assets. The
Data Governance function guides all other data
management functions. The purpose of Data
Governance is to ensure that data is managed properly,
according to policies and best practices

Data Governance Definition
The exercise of authority, control, and
shared decision-making (planning,
monitoring, and enforcement) over the
management of data assets.

ContextDiagram: Data Governance

Data Governance and Data Management

Data Governance Organization Parts

Typical Data Governance Committees/ Bodies

An Example ofan Operating Framework

Maturity Model
-Stanford’s Maturity Model (https://lnkd.in/gs-Qsp4)
-IBM’s Maturity Model (https://lnkd.in/gPArsvH)
-Kalido Maturity Model(https://lnkd.in/gg3J7aJ)
-DataFlux’s Maturity Model (https://lnkd.in/gSBeRzx)
-Gartner’s Maturity Model(https://lnkd.in/gc9gckZ)
-Oracle’s Maturity Model(https://lnkd.in/gmJ7tBF)
-Open Universiteit Nederland Maturity Model (https://lnkd.in/gDd2Hd8)

Data Governance
reference:
www.fabak.ir

Data Development (Modeling&Design)

Modeling& Design
Data modeling is the process of discovering, analyzing,
and scoping data requirements, and then representing
and communicating these data requirements in a precise
form called the data model. Data modeling is a critical
component of data management. The modeling process
requires that organizations discover and document how
their data fits together. The modeling process itself
designs how data fits together (Simsion, 2013). Data
models depict and enable an organization to understand
its data assets.

Data ModelingDefinition
Data modeling is the process of
discovering, analyzing, and scoping data
requirements, and then representing and
communicating these data requirements in
a precise form called the data model. This
process is iterative and may include a
conceptual, logical, and physical model.

different schemes
There are a number of different schemes
used to represent data. The six most
commonly used schemes are: Relational,
Dimensional, Object-Oriented, Fact-
Based, Time-Based, and NoSQL. Models
of these schemes exist at three levels of
detail: conceptual, logical, and physical.
Each model contains a set of components.
Examples of components are entities,
relationships, facts, keys, and attributes.
Once a model is built, it needs to be
reviewed and once approved, maintained.

Entity
Outside of data modeling, the definition of
entity is a thing that exists separate from
other things. Within data modeling, an
entity is a thing about which an
organization collects information.

CDM,LDM,PDM
Conceptual Data Model
The conceptual Data Model (CDM) helps you analyze the conceptual structure of an
information system and then identifies the major entities that need to be described, the
attributes in those entities, and the relationships between those entities. Conceptual data
models are more abstract than logical or physical data models.
Logical Data Model
The logical Data Model (LDM) helps you analyze the structure of the information system,
independent of any specific physical database implementation. LDM already involves entity
identifiers, which are not as abstract as CDM, but do not allow you to design elements of
views, indexes, and other more specific physical data models.
Physical Data Model
The physical Data Model (PDM) helps you analyze tables, views, and other database objects,
including the multidimensional objects required by the Data warehouse. PDM is more specific
than CDM and LDM. You can model, reverse engineer, and Kazuo into all the most popular

SchemetoDatabase Cross Reference

THANKS
Does anyone have any questions?
parviz.vakili@gmail.com
+98 912 444 2418
https://www.linkedin.com/in/parvizvakili/

Intro to big data and applications - day 2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Intro to big data and applications - day 2

Similar to Intro to big data and applications - day 2 (20)

Recently uploaded

Recently uploaded (20)

Intro to big data and applications - day 2