Data development involves analyzing, designing, implementing, deploying, and maintaining data solutions to maximize the value of enterprise data. It includes defining data requirements, designing data components like databases and reports, and implementing these components. Effective data development requires collaboration between business experts, data architects, analysts, developers and other roles. The activities of data development follow the system development lifecycle and include data modeling, analysis, design, implementation, and maintenance.
2. 5.1 Introduction
• Data development is the analysis, design, implementation, deployment and
maintenance of data solutions to maximize the value of the data resource to the
enterprise.
• Is the subset of project activities within (SDLC).
• Focused on defining data requirements, designing the data solution
components, and implementing theses components.
• The primary data solution components are databases and other data structure
“include information products (Screens and reports)” and data access interfaces.
3.
4. 5.1 Introduction
• Project team members must collaborate with each other for effective solution
design.
• Business data Stewards and Subject matter experts (SMEs) provide
business requirements for data and information, including business rules
and data quality expectations, and then validate these requirement are
met.
• Data architects, analysts, and DBAs have primary responsibility for
database design. DBAs collaborate with software developer to define data
access services in layered service-oriented architecture (SOA)
implementations.
• Software architects and developers ( both application and data integration
specialists) have primary responsibility for data capture and usage design
within programs, as well as user interface design for information products (
screens and printed reports).
5. 5.2 Concepts and Activities
• The Activities necessary to carry out the data development function are:
• 5.2.1 System Development Lifecycle (SDLC)
• 5.2.2 Styles of Data Modeling
• 5.2.3 Data Modeling, Analysis, and Solution Design, including:
• 5.2.3.1 Analyze Information Requirements
• 5.2.3.2 Develop and Maintain Conceptual Data Models “Entities, Relationships”
• 5.2.3.3 Develop and Maintain Logical Data Models “ Attributes, Domains, Keys”
• 5.2.3.4 Develop and Maintain Physical Data Models
• 5.2.4 Detailed Data Design, including:
• 5.2.4.1 Design Physical Databases “Physical Database Design, Performance Modifications, Documentation”
• 5.2.4.2 Design Information Products
• 5.2.4.3 Design Data Access Services
• 5.2.4.4 Design Data Integration Services
• 5.2.5 Data Model and Design Quality Management
• 5.2.5.1 Develop Data Modeling and Design Standards
• 5.2.5.2 Review Data Model and Database Design Quality “Conceptual and Logical Data model Reviews,
Physical Database Design Review, Data Model Validation”
• 5.2.5.3 Manage Data Model Versioning and Integration
• 5.2.6 Data Implementation
• 5.2.6.1 Implement Development/ Test database changes
• 5.2.6.2 Created and Maintain Test Data
• 5.2.6.3 Migrate and Convert Data
• 5.2.6.4 Build and Test Information Products
• 5.2.6.5 Build and Test Data Access Services
• 5.25.6.6 Build and Test Data Integration Services
• 5.2.6.7 Validate information Requirements
• 5.2.6.8 Prepare for Data Deployment
6. 5.2.1 System Development Lifecycle (SDLC)
• SDLC including the following activities:
• Project Planning, including scope definition and business case justification
• Requirements Analysis
• Solution Design
• Detailed Design
• Component Building
• Testing, including unit, integration, system, Performance, and Acceptance
testing
• Deployment Preparation, including documentation development and
training
• Installation and Deployment, including piloting and rollout.
• These tasks create a data modeling series leading to implement system.
• Waterfall and spiral consider as methods used with SDLC.
• IS capture and deliver information ( data in context with relevance and time
frame) to support business functions.
8. 5.2.2 Styles of Data Modeling
• Several data modeling methods, different diagrams and styles, and different
symbols and box contents to communicated detailed specifications:
• IE: “information engineering”, most popular data modeling diagramming style
to represent Data Structure. “symbols to depict cardinality” know as “Crow's
feet”
• IDEF1X: alternate data modeling syntax, Using circles (darkened, some empty)
and line (some solid, some dotted)
• ORM: Object Role Modeling, enables very detailed specification of business
data relationships and rules. “techniques for storing, retrieving, updating, and
deleting from an object-oriented program in a relational database ”
• UML: the Unified Modeling Language is an integrated set of diagramming
conventions for several different forms of modeling. Develop to standardize
OO analysis and design.
• UML defines several different types of models and diagrams.
• Modeling data / database in separate models about software by data
professionals is recommended.
• Using different data modeling is useful to differentiate and communicate each
model.
• No need for data stewards to become data modelers. However, they are necessary
to be well-informed.
9. 5.2.3 Data Modeling, Analysis, and Solution Design
• Data modeling is an analysis and design method used to:
• Define and analyze data requirements
• Design data structures that support these requirements
• Data model: is a set of specifications and related diagrams reflect data
requirements and designs.
• Think of data model as diagram that uses text and symbols to represent data
elements and relationships between them.
• Two formulas guide a modeling approach:
• Purpose + audience = deliverables.
• Deliverables + resources + time = approach
10. 5.2.3 Data Modeling, Analysis, and Solution Design
• The purpose of a data model is to facilitate:
• Communication: help us understand a business area, existing application, or
the impact of modifying an existing structure. Facilitate training new business.
• Formalization: a data model documents a single, precise definition of data
requirements and data related business rules.
• Scope: a data model can help explain the data context and scope of
purchased application pages.
• Data models that include the same data may differ by:
• Scope: Expressing a perspective about data in terms (Business or application
view), realm (process, department, division, enterprise, or industry view) and
time (current state, short-term future, long-term future)
• Focus: basic and critical concepts (conceptual view), detailed but
independent of context ( logical view), or optimized for a specific technology
and use ( physical view)
• Using data model to meet information needs. Several analysis required: “See next
slides”
11. Data Modeling, Analysis, and Solution Design
5.2.3.1 Analyze Information Requirements
• To identify information requirements, we need to identify business information
needs “one or more business process”.
• Every project charter should include data-specific objectives and identify the data
within scope.
• Requirements analysis include the elicitation, organization, documentation,
review, refinement, approval, and change control of business requirements.
• Logical data modeling an important means of expressing business data
requirements. However, many organizations have formal requirements
management disciplines through reports and tables created by data modeling
tools.
• Business system planning (BSP) or information systems planning used as
techniques and activities to define enterprise data model. “chapter 4”
12. • Conceptual data model is a Visual, high-level perspective on a subject area of
importance to the business.
• Include entities and relationships between them. As in figure 5.3
Data Modeling, Analysis, and Solution Design
5.2.3.2 Develop and Maintain Conceptual Data Models
13. Data Modeling, Analysis, and Solution Design
5.2.3.2 Develop and Maintain Conceptual Data Models
• Conceptual data model include business terms, relationship terms, entity
synonyms, and security classification.
• To create a conceptual data model, start with one subject area and determine the
objects included in that subject area.
• For example: Customer “Subject Area”, contain objects such as “Account
Owner, Sub Account, Contact Preferences..Etc.”
• Intermediate Conceptual model used when there is change involved and proposed
to the project.
• Copy the model changes to the production system as part of the release process,
to ensure that the model keeps in “Synch” with current reality.
14. Conceptual Data Models
5.2.3.2.1 Entities
• Data entity is a collection of data about something seem as important and
worthy of capture.
• Entity as noun focus on who, what, when, where, why, and how to be identified.
• Entity Appears in conceptual and logical data model.
• Conceptual business entities describe the things collect data, “e.g.,
customer, product..etc.. ”
• Logical follow the rules of normalization and abstraction and become
numerous components.
• Entities are either independent ( Kernel entity – not depend on any other entity
for its existence ) or dependent entities.
• dependent entity type:
• Attributive / characteristic depends on only one other parent entity. “e.g.
emps beneficiary depending on emp”
• Associative/ mapping depends on two or more entities “e.g. entity
Registration depend on Student and Course entities”
• Category/ sub-type or super-type: example of generalization and
inheritance, sub-type inherits the super-type.
15. Conceptual Data Models
5.2.3.2.2 Relationships
• Business rules define constraints on what can and can not be done.
• Divided into
• Data rules constrain how data relates to other data ”e.g., Fresh Student can
register for at most 18 credits a semester”
• Action rules “instructions on what to do when data elements contain certain
value”
• Data models express two primary types of data rules:
• Cardinality rules define the quantity of each entity instance participate in a
relationship between two entity. “e.g. each company can employ many
persons”
• Referential integrity rules ensure valid values. “ e.g. a person can exist
without working for a company, but a company cannot exist unless at least
one person is employed”
• These two data rules are combined to produce between company and person as:
• Each person can work for zero to many companies.
• Each company must employ one or many persons.
• “many” refers to ( Cardinality) and “zero or one” refers to (referential integrity)
16. Conceptual Data Models
5.2.3.2.2 Relationships, continuo.
• A relationship between two entities may be one of three types:
• A one-to-one relationship says that a parent entity may have one and only
one child entity.
• A one-to-many relationship say that a parent may have one or more child
entities. “most common relationships”.
• A many-to-many relationship says that an instance of each entity may be
associated with zero to many instances of the other entity, and vice versa.
• A recursive relationship relate instances of an entity to other instances of the
same entity. “entity may have relationship with itself”
• May be “one-to-one, one-to-many, or many-to-many”
17. 5.2.3.3 Develop and Maintain Logical Data Models
• Logical data model is a detailed representation of data requirements and business
rules that govern data quality. Usually in support of a specific usage context
(application requirements).
• Independent of any technology or specific implementation technical constraints.
• Often begins as an extension of a conceptual data model
• Adding data attributes to each entity.
• Organizations should have naming standards to guide the naming of logical data
objects.
• Transform conceptual data model structures by applying two techniques:
• Normalization
• Abstraction
19. 5.2.3.3 Develop and Maintain Logical Data Models
• Normalization is the process of applying rules to organize business complexity into stable
data structure.
• The goal of Normalization is to keep each data element in only one place.
• Normalization rules sort data elements according to primary and foreign keys.
• Normalization rules sort into levels:
• First Normal Form (1NF) : Ensures each entity has a valid primary key, every data
element depends on the primary key, and remove repeating groups and ensuring each
data element is atomic (not multi-valued)
• Second Normal Form (2NF): Ensure each entity has the minimal primary key and that
every data element depends on the complete primary key.
• Third Normal Form (3NF): Ensure each entity has no hidden primary keys. “called
normalized model”
• Boyce/ codd normal form (BCNF): overlapping composite candidate keys.. ”Occur
rarely”
• Fourth Normal Form (4NF): Resolves all Many-to-Many relationships in pairs into
smallest pieces. ”Occur rarely”
• Fifth Normal Form (5NF): Resolves inter-entity dependencies into basic pairs, and all
join dependencies use parts of primary keys. ”Occur rarely”
• Sixth Normal Form (6NF): adds temporal objects to primary keys, in order to allow for
historical reporting and analysis over timeframes. ”Occur rarely”
20. 5.2.3.3 Develop and Maintain Logical Data Models
• Abstraction is the redefinition of data entities, elements, and relationships by
removing details to broaden the applicability of data structure to a wider class
of situations.
• often by implementing super-types rather than sub-types.
• Using the generic party Role super-type to represent the customer, employee,
and Supplier sub-types is an example of applying abstraction.
• Use normalization to show known details of entities.
• Use abstraction when some details of entities are missing or not yet discovered,
or when the generic version of entities is more important or useful than
subtype.
21. Logical Data Models
5.2.3.3.1 Attributes
• An attribute is a property of an entity; a type of fact important to the business
whose values help identify or describe an entity instance.
• For example, “ attribute student Last Name describe the last name of each
student”
• Attributes translate in a physical data model to a field in file or a column in
database table.
• Attributes use business names, while fields and columns use technical names
• In Logical data model, Entity represent nouns, Attributes represent adjectives
• Attributes in logical data model should be atomic, should contain one piece of
data (fact)“e.g. phone number divides into several elements (home, office, fax)”
• An instance of an attribute is the value of the attribute for a particular entity
instance.
• Clarity, accuracy, and completeness are the characteristics for high-quality data
for making intelligent business decisions and intelligent application design.
22. Logical Data Models
5.2.3.3.2 Domains
• Domain is the possible values for an attribute.
• Attribute can never contain values outside of its assigned domain.
• Some domains have a limited number of specific defined values, or minimum or
maximum limits for numbers.
• Business rules can also restrict domains.
• Attributes often share the same domain. For example, an employee hire data
and purchase date must be:
• A valid calendar date ( for example, not February 31st ).
• A date that falls on a weekday.
• A date that does not fall on a holiday.
• A data dictionary contains a collection of domains and the attributes that relate
to each domain, among other things.
23. Logical Data Models
5.2.3.3.3 Keys
• Attributes assigned to entity are key or non-key
• Key data element help identify one unique entity instance from all others,
either fully (by itself) or partially ( in combination with other key elements).
• Non-key data elements describe the entity instance but do not help
uniquely identify it.
• Attributes may contain more than key that present a unique value. However,
there should be only one primary key and all other keys become alternate keys.
Called “Composite Key ”
• Surrogate key or (anonymous keys) contain a randomly generated value
uniquely assigned to an entity instance, used to avoid using composite primary
keys. (true surrogate keys are random, not sequential)
• Foreign key is attribute that provide a link to another entity, appears in both
entities in a relationship, enable navigation between data structures.
• Identifying relationship occurs when the foreign key attributes of a parent
entity appears as part of the composite primary key of a child entity.
• Non-identifying relationship occurs when the foreign key of a parent entity
is non-key attributes describing the child entity.
24. 5.2.3.4 Develop and Maintain Physical Data Models
• A physical data model optimizes the implementation of detailed data
requirements & business rules in light of technology constraints, applications
usage, performance requirements, and modeling standards.
• Design Relational databases with the specific Capabilities of DBMS in mind (
IMB DB2, Oracle, MS SQL server or Access)
• An example of physical data model is shown in figure 5.5
26. 5.2.3.4 Develop and Maintain Physical Data Models
• Physical Data model design include making decisions about:
• The technical name of each table and column ( relational databases), or file
and field ( non-relational databases), or schema and elements (XML
databases)
• The logical domain, Physical data type, length and nullability of each column
or field.
• Any Default values for columns or fields, especially for NOT NULL
Constraints.
• Primary and alternate unique keys and indexes, including how to assign
keys.
• Implementation of small reference data value sets in the logical model
(separate code tables, b) a master shared code table, or c) simply as rules or
constraints.
• Implementation of minor supertype / subtype logical model entities in the
physical database design where the sub-type entities attributes are merged
into a table representing the super-type entity as nullable columns or
collapsing the super-type entity’s attribute in a table for each sub-type.
27. 5.2.3.4 Develop and Maintain Physical Data Models
• The Techniques used to transform logical data model into physical data model
including:
• Denormalization: Selectively and justifiably violating normalization rule, re-
introducing redundancy into the data model to reduce retrieval time, and
reduce data quality
• Surrogate keys: substitute key not visible to the business.
• Indexing: create additional index files to optimize specific types of queries.
• Partitioning: break a table or file vertically ( separating groups of columns)
or horizontally ( separating groups of rows)
• Views: virtual tables used to simplify queries, control data access, and
rename columns, without the redundancy and loss of referential integrity
due to denormalization.
• Dimensionality: Creation of fact tables with associated dimension tables,
structured as star schemas and snowflake schemas, for business intelligence
28. 5.2.4 Detailed Data Design
• Detailed data design activities include:
• Detailed physical database design, including views, functions, triggers, and
stored procedures.
• Other supporting data structures, such as XML schemas and object classes.
• Information Products, such as the use of data in screens and reports.
• Data access solutions, including data access objects, integration services,
and reporting and analysis services.
• Database administrator (DBAs) take the lead role in:
• Database Design
• Designing information products ( XML schemas, messages, screens, and
reports) “Collaborative role”
• Data analysts take the lead role in:
• designing information products and related data services ( data access
services, data integration services, BI services)
• Database design “Collaborative role”
29. 5.2.4.1 Design Physical Databases
• Required to include database implementation specifications.
• May take advantage of unique functions and capabilities of a specific DBMS,
which may or may not be included in data model itself.
• For Relational database, design deliverables are DDL specifications.
• For XML databases, design deliverables is the name space.
• DBA primary responsibility for detailed database design, including:
• Ensuring the design meets data integrity requirements.
• Determining the appropriate physical structure to house and organize data.
“Files, OLAP cubes”
• Determining the resources requirements for database. “Servers, network,
CPU …”
• Creating Detailed design specifications for data structures. “Table, indexes,
views”
• Ensuring Performance requirements are met “batch, response time for
CRUD”
• Designing for backup, recovery, archiving, purge processing.
• Designing data security implementation “authentication, encryption needs…”
• Determine partitioning and hashing schemes, where appropriate.
• Requiring SQL code review to ensure that the code meets coding standards
and will run efficiently.
30. Physical Database Design
5.2.4.1.1 Physical Database Design
• two choices design database based on them:
• Architecture: e.g. relational, hierarchical, network, object, star schema, cube
• Considerations : where, how data updated, the natural organization of
data, and how data is viewed and used.
• Technology: e.g. relational, XML, OLAP or Object technology
• Considerations: How long the data needs to be kept, where it must be
integrated with other data or passed across system boundaries, and on
requirements of data security, integrity, recoverability, accessibility, and
reusability.
• Other factors “Organizational or political”:
• Purchase and licensing requirements, including DBMS, DB Server, Client-
side data access and reporting tools
• Auditing and privacy requirements ( Sarbanes-Oxley, PCI, HIPAA,etc)
• Application requirements ( Database may support web application or
service, or particular analysis tool
• SLAs “Database Service Level Agreements”
31. Physical Database Design
5.2.4.1.1 Physical Database Design
• Database Designers must find answers to several Questions:
• What are the Performance requirements? (max permissible time for query to
return results).
• What are the Availability requirements for DB “performing data operations, DB
backups”
• What is The expected size of the database, rate growth of data, Old and
unused data “archived or deleted, user anticipated”
• What Sorts of data virtualization needs to support application requirements.
• With other applications need the data? If so, what data and how?
• Will users expect to be able to do ad-hoc querying and reporting of the data?
If so, how and with which tools?
• What, if any business or application processes does the DB need to
implement?
• Ara there application or developer concerns regarding the database, or the
data base development process, that need to be addressed?
• Is the application code efficient? Can a code change relieve a performance
issue?
32. Physical Database Design
5.2.4.1.1 Physical Database Design
• DBA should keep the following design Principles in order to designing database:
• Performance and Ease of Use: Quick and easy access to data through
business-relevant form, maximizing business value of both applications and
data.
• Reusability: The DB structure should be appropriate multiple applications use
data, ensure multiple business purposes “Business analysis, quality
improvement, Strategic plaining, CRM, Process improvement” use the data.
• Integrity: The data should always have a valid business meaning and value,
regardless of context.
• Security: true and accurate data should always be immediately available to
authorized users.
• Maintainability: perform all data work at a cost that yields value by ensuring
that the cost of creating, storing, maintaining, using, and disposing of data
does not exceed its value to the organization, ensure the fastest response to
changes in Business processes and new business requirements.
33. Physical Database Design
5.2.4.1.1 Physical Database Design
• Recommended best practices for Physical database design:
1. For Relational database supporting transaction processing (OLTP)
applications, use a normalized design to promote data integrity,
reusability, good update performance, and data extensibility.
2. At the same time, use views, functions, and stored procedures to create
non-normalized, application-specific, object-friendly, conceptual (Virtual)
views of data.
3. Use standard naming conventions and meaningful, descriptive names
across all databases and DB objects for ease of maintenance.
4. Enforce data security and integrity at the database level, not in the
application.
5. Try to keep database processing on the DB server as much as possible,
max performance, security, scalability, reduce network traffic.
6. Grant permissions as roles or application groups, not to individuals,
improve both security and ease of maintenance.
7. Do all updates in controlled manner, do not permit any direct, ad-hoc
updating of DB.
34. Physical Database Design
5.2.4.1.2 Performance Modifications
• Several Techniques used to optimize database performance:
• Indexing: define appropriate indexes for DB tables. Index is an alternate path
for accessing data in DB to optimize query performance. Without indexes,
DBMS will revert to reading every row in the table (Table Scan).
• Denormalization: is the deliberate transformation of normalized logical data
model into tables with redundant data.
• Denormalization Techniques include:
• Collapse hierarchies (roll-up)
• Divide hierarchies (Push down)
• Vertically split
• Horizontally split
• Combine and pre-join tables
• Repeat columns in one row
• Derive data from stored data
• Creating report copies
• Create duplicates (mirrors)
35. Physical Database Design
5.2.4.1.3 Physical Database Design Documentation
• DB design Document guides implementation and maintenance.
• Reviewable to catch and correct error in design before creating and updating.
• Modifiable for ease of implementation of future iterations of the design.
• Physical DB Design Document components:
• An introductory description of the business function of the DB design.
• Graphical Model of the design.
• Database-language specification statements: DDL specification for all DB objects ( tablespaces,
tables, indexes, indexspaces, views..etc.)
• Documentation of the technical meta-data, including data type, length, domain…etc.
• Use cases or sample data
• Short descriptions, as needed, to explain:
• DB architecture and technology chosen, and why they were chosen
• Constraints that affected the selection of the DBMS, including cost, policy, performance,
reliability or scalability, security, application constraints expected data volumes, etc.
• The database design process, including the methods and tools used.
• The differences between the physical database design and the logical data model, and the
reasons for these differences.
• The update mechanism chosen for the database, and its implementation.
• Security requirements for the database, and their implementation.
• SLA for database and its implementation.
• User and/or application requirements for the DB and their implementation.
36. 5.2.4.2 Design Information Products
• Information Products are Screens and reports created by Data analysts assisting
with Software designers and developers to meet business requirements.
• New technologies used by DBAs assist in the development of applications make
data more readily available, in more usable form:
• Reporting Services: Generating reports in many types “Canned and ad-hoc”
• Analysis Services: dived data based in specific Categories (Data to analyze
sales trends”
• Dashboards: user interface to display charts, graphs, help use to perform
action by data.
• Scorecards: specialized type of analytics display that indicates scores and
calculated evaluations.
• Portals: web interfaces present links to multiple applications of information
on single web page.
• XML Delivery: schema definitions to enable using XML within databases and
applications.
• Business Process Automation: use data integrated multiple DBs as input to
software for Business Process Automation that coordinates multiple Business
Processes across disparate platforms.
• Application Integration: enable data to be easily passed from application-to-
application across different platforms.
37. 5.2.4.3 Design Data Access Services
• Best way to access data in remote databases and combine these data with data in local database.
• Several Methods to do this, and each one have strength and weakness DBA should be familiar:
• Linked Server type connections through an ODBC or OLE/DB connection. “strength: easy to
implement in DB”, “weakness: limited functionality, Security concerns, not scale well, require
call procedure because it is synchronous, dependent on the quality of vendor-supplied ODBC
or OLE/DB drivers (sometimes abysmal)”
• SOA Web Services: Encapsulate remote data access in the form of web services and call
them from applications. “Strength: increase the reusability of data to perform and scale
well”, “Weakness: harder and costly to write, test and deploy, Risk (SOA nightmare),
consuming”
• Message Brokers: (used by MS SQL server 2005) implement messaging services in the DB.
“Strength: Easy to implement, reliable and perform well”, “Weakness: only works with
instances of the same DBMS”
• Data Access Classes: Dataset objects in-memory database, “Strength: ease of access, better
performance”, “Weakness: exclusive to use ODBC or OLE/DB connections, .NET, and
Unix/Linux and Java applications”
• ETL : Tool to extract data from the source, transform it as necessary ( reformat and cleansing)
and load it into a read-only table in DB. “Strength: stored procedure and schedule to execute
intervals”, “Weakness: not scale or perform well for large number of records, expensive to
maintain over time”
• Replication: ( mirroring and log shipping), “Strength: get data from one DB to another”,
“Weakness: timing rights and the data need, Increase Failures”
• Co-Location: co-locate the source and target databases on the same database server.
“Strength: frequency of access”, “Weakness: not an ideal solution”
• Enable the easy and inexpensive reuse of data across the enterprise, avoid cost of replica schema,
prevent redundant and inconsistent data, as possible.
38. 5.2.4.3 Design Data Integration Services
• COMMIT (Group transaction)makes DB transaction an atomic unit. Therefore,
Developers define DB transactions by determining when to COMMIT changes.
• Whenever multiple users can concurrently update tables. Therefore, update &
control mechanisms required. “usually involve timestamp, datetime”.
• Use locks to ensure the integrity of data, permitting one user change DB row at
one time. “Lock Granularity”
• Define source-to-target mappings and data transformation designs for (ETL)
programs for on-going movement, cleansing, and integration.
• Design programs and utilities for data migration and conversion from old data
structures to new data structures.
39. 5.2.4.3 Design Data Integration Services
• Criteria of Designing Data integration:
1. Do all updates in a controlled manner. Do not allow direct, ad-hoc updating
of the DB.
2. Manage all update to a specific Business Process as single unit of work
“transactional integrity”. Do Not allow partial update of the DB to Occur.
3. “Concurrency control”, not allow multiple users update at the same time in
1 records
4. Immediately abort the current transaction and roll back errors in updating,
and immediately report the error to the calling process or application.
5. Restrict the update ability to specific users (preferred user roles) authorized
to do so.
6. Restrict updates to a small number of records at a time, to prevent
excessive locking of tables and “hanging” of application when rolling back a
large update.
• Consider the following possible update mechanisms:
• FSPs (Fundamental Stored Procedures)
• Application data Layer
• Dataset updating
• Updateable views
40. 5.2.5 Data Model and Design Quality Management
• Data analysts and designers act as an intermediary between Executives and DBAs
who capture the data in usable form “Application class model, SLAs, and data
requirements”.
• Data professionals must also balance short-term versus long term business
interests.
• should be reasonable balance between the short-term needs and the long-term
needs of the enterprise.
• These actions and purposes of design data model and Quality Management can
be performed through several procedures. “See next Sildes”
41. 5.2.5.1 Develop Data Modeling and Design Standards
• Serve as the Guiding Principles to effectively meet business data needs,
conform data needs, data architecture, and ensure data quality, must
complement not conflict with related IT standards.
• Naming standards are important for entities, tables, attributes, keys, views, and
indexes. “should be unique and as descriptive as possible”.
• Data Modeling and DB design standards should include:
• A list of and description standard data modeling and DB design deliverables.
• A list of standard names, acceptable abbreviations, and abbreviation rules
for uncommon words, that apply to all data model objects.
• A list of standard naming formats for all data model objects, including
attribute and column class words.
• A list and description of standard methods for creating and maintaining
these deliverables.
• A list and description of data modeling and DB design roles and
responsibilities.
• A list and description of all meta-data properties “Business & technical”
• Guidelines for how to use data modeling tools.
• Guidelines for preparing for and leading design reviews.
42. 5.2.5.2 Review Data Model and Database Design Quality
• Project teams conduct requirements reviews and design reviews as appropriate. (
include a conceptual data model, logical data model, and physical database design
reviews).
• To conduct design reviews, required a participants of grouping subject matter
experts reviewing different backgrounds, skills, expectations, and opinions.
• Participants must be able to discuss different viewpoints and try to avoid conflicts.
• Chair each design review with one leader who facilitates the meeting.
• The leader creates and follows an agenda, ensure all required documentation is
available and distributed, solicits input from all participants, maintain order and
keeps the meeting moving, and summarizes the group’s consensus findings.
• Many design reviews also utilize a scribe to capture points of discussion.
43. Data Model and Database Design Quality
5.2.5.2.1 Conceptual and Logical Data Model Reviews
5.2.5.2.2 Physical Database Design Review
• Conceptual data model and Logical data model design reviews should ensure:
1. Business data requirements are captured and expressed in the model,
include business rules governing entity relationships.
2. Business(Logical) names and business definitions for entities and attributes
(business semantics) are clear, practical, consistent, and complementary.
3. Data modeling standards, including naming standards, have been followed.
4. The conceptual and logical data models have been validated.
• Physical database design reviews should ensure that:
1. The design meets business, technology, usage and performance
requirements.
2. Database design standards, including naming and abbreviation standards,
have been followed
3. Availability, recovery, archiving, and purging procedures are defined
according to standards.
4. Meta-data Quality expectations and requirements are met in order to
properly update any meta-data repository.
5. The physical data model has been validated.
44. Data Model and Database Design Quality
5.2.5.2.3 Data Model Validation
• Several questions used to validate data models:
• Does the model match applicable modeling standards? Does the model use
standard data dictionary terms? Does the model use standard domains? Does
the model use class word suffixes on all applicable columns? Does the model
include descriptions of all objects and relationships? Does the model use
abbreviation standards where applicable?
• Does the model match the business requirements? Does the model contain all
the relevant data items? Can you execute the required transactions against the
database? Can you retrieve the transaction contents correctly? Can you
execute any required queries against the model?
• Does the model match the database requirements? Are there no objects
named the same as database-reserved words? Do all objects have unique
name? Does the model assign owners to all objects?
45. 5.2.5.3 Manage Data Model Versioning and integration
• Data models and design specifications require careful control. Note each change
to a data model reserve the lineage of changes over time.
• Each change should note:
• Why the project or situation required the change
• What and how the object(s) changed, including which tables had columns
added, modified, or removed, etc.
• When the change was approved and when the change was made to the
model. “not necessarily when the change was implemented in a system”
• Who made the change.
• Where the change was made; in which models.
• Changes may be made to multiple parts of enterprise models simultaneously, as
part of the normal process, to prevent errors in data and databases during future
development.
• some data modeling tools include repositories that provide data model versioning
and integration functionality.
46. 5.2.6 Data Implementation
• Consists of data management activities that support system building, testing
and deployment, including:
• Database implementation and change management in the development
and test environments.
• Test creation, including any security procedures, such as obfuscation.
• Development of data migration and conversion program, both for project
development through the SDLC and for business situations like
consolidations or divestitures.
• Validation of data quality requirements.
• Creation and delivery of user training.
• Contribution to the development of effective documentation.
• After design, The DBA is:
• responsible for implementing the designed data structures in the
development and test environments.
• Responsible for change control of the development database
environment and its configuration.
47. 5.2.6.1 Implement Development / test Database Changes
• During the course of application development changes to the DB are required,
implementation happens depending on roles and responsibilities:
• Developers create and update database objects directly, “views, functions,
and stored procedures”, and then update the DBAs and data modelers for
review and update of the data model.
• The development team have their own “developer DBA” who is given
permission to make schema changes, with the proviso that these changes be
reviewed with DBA and data modeler.
• Collaboration of Developers and Data modelers and generate ‘change DDL’
for DBAs to review and implement.
• Collaboration of Developers and data modelers, who interactively ‘push’
changes to the development environment, using functionality in the data-
modeling tool, after review and approval by DBAs.
• If using Agile, changes and update in logical and physical models done
asynchronously.
• DBAs should carefully monitor all database code to ensure that it is written to
the same standards as application code.
48. 5.2.6.2 Create and Maintain Test Data
• DBA, Software Developers and Testers may collaborate to populate databases in
the development with test data. “Generate test data or extract a representative
subset of production data”.
• Strictly observe privacy and confidentiality requirements for test data.
• The DBA may also assist the developers with the creation of SQL scripts and data
integration ‘packages’, such as DTS or SSIS packages, used to create and maintain
test data.
49. 5.2.6.3 Migrate and Convert Data
• “the migration of legacy data to a new database environment ”Considers a Key
component of Projects.
• Including any necessary data cleansing and reformatting.
• The time and cost required should not be under-estimated.
• Required collaborative effort of the data architect/analysts familiar with the
legacy data models and the target data model.
• DBA, business users, and developersv familiar with the legacy applications.
Depending on where the legacy data is stored, this effort may involve the use of
many different technologies including:
• SQL , COBOL, Unix scripting, DBMS integration packages such as DTS or SSIS,
non-relational DBMSs, third-party ETL applications, data integration web
services, FTP, RPC, ODBC, OLE/DB
• Data migration efforts can easily consume thousands of hours of effort.
50. 5.2.6.4 Build and Test Information Products
• Mechanisms used by Data professionals and Software developers on
development and testing of information products created by the systems:
• Implementing mechanisms for integrating data from multiple sources,
along with the appropriate meta-data to ensure meaningful integration of
the data.
• Implementing mechanisms for reporting and analyzing the data, including
online and web-based reporting, ad-hoc querying, BI scorecards, OLAP,
Portals, and the like.
• Implementing mechanisms for replication of the data, if network latency or
other concerns make it impractical to service all users from a single data
source.
• Software developers are responsible for:
• coding and testing programs, including DB access calls.
• Creating, testing and maintaining information products, including screens
and reports.
• Testing includes unit, integration, and performance testing.
51. 5.2.6.5 Build and Test Data Access services
• DBAs are responsible for developing data access services.
• DBA & software developers in developing and executing data access services.
“first for development and test environments, and later for production
deployment”
• Data requirements include business rules for data access to guide the
implementation of data access services, with software developers
• Business data stewards and other (SMEs) should validate the correct
implementation of data access requirements and performance through user
acceptance testing.
52. 5.2.6.6 Build and Test Data integration Services
• Data integration specialists are responsible for developing ETL Programs and
Technology for data integration.
• DBA & Software developers are working in developing, testing and executing
Data migration and conversion programs and procedures. “first for development
and test data, and later for production deployment”
• Data requirements should include business rules for data quality to guide the
implementation of application edits and database referential integrity constraints
• Business data stewards and other (SMEs) should validate the correct
implementation of data requirements through user acceptance testing.
53. 5.2.6.7 Validate Information Requirements
• Do not end with design
• Testing & Validating if the proposed solution meets the requirements through
data professional, but also in planning deployment, developing training, and
documentation.
• In (Agile) application development project, Data requirements may change
suddenly or abruptly.
• In response to new or changed requirements:
• Invalidated assumptions regarding the data, or re-prioritization of existing
requirements.
• Data modeler may serve as intermediary between developers and data
analyst/architect
• Reviewing any additions or changes to business data requirements.
• Would also properly reflect them in the logical and physical data models.
• DBA would implement any changes in most effective manner in DB.
• DBA then works with the developers to test the implementation of data
requirements.
• Make sure that the application requirements are satisfied.
54. 5.2.6.8 Prepare for Data Deployment
• Data Analysts can leverage the business knowledge captured in data modeling to
define, clear, and consistent language in user training and documentation.
• Data stewards and Data analysts should participate in deployment preparation,
including development and review of training materials and system
documentation
• Help desk support staff also requires orientation and training in how system users
appropriately access, manipulate, and interpret data.
• DBA in Data Deployment are:
• responsible for implementing new and changed database objects into the
production environment.
• should carefully control the installation of new DBs and changes to existing
DBs in production environment.
• Once installed, Business data stewards and data analysts should monitor the early
use of the system to see that business data requirements are indeed met.