The document discusses data retention policies and handling of confidential and sensitive data. It defines data retention policies and their purpose, which is to maintain important records for future use while disposing of unneeded records. It outlines categories of document types that must be protected by retention policies, such as legal, financial, and employee records. The document also defines sensitive data and types, including personal information, business information, and classified data. It discusses how to properly handle sensitive data through access policies, encryption, and aggregate disclosure of information rather than individual records.
INFORMATION RESOURCES MANAGEMENT UNDER INDUSTRY-INSTITUTE PARTNERSHIP: A Case...Bhojaraju Gunjal
Gunjal, Bhojaraju., Choukimath, PA and Agadi, KB (2003). Information Resources Management under Industry-Institute Partnership: A study on IRMRA-TSR-PIIT Library, In Proceedings of BOSLA Seminar, TISS, Nov. 8, 2003, Mumbai.
A presentation for researcher, majorly scientists, on how to prepare proposal with well structured and documented data management plan. it presentation also covered key aspect of data management planning as well as the importance of data management planning. What are donors or funders looking for in a research proposal?
INFORMATION RESOURCES MANAGEMENT UNDER INDUSTRY-INSTITUTE PARTNERSHIP: A Case...Bhojaraju Gunjal
Gunjal, Bhojaraju., Choukimath, PA and Agadi, KB (2003). Information Resources Management under Industry-Institute Partnership: A study on IRMRA-TSR-PIIT Library, In Proceedings of BOSLA Seminar, TISS, Nov. 8, 2003, Mumbai.
A presentation for researcher, majorly scientists, on how to prepare proposal with well structured and documented data management plan. it presentation also covered key aspect of data management planning as well as the importance of data management planning. What are donors or funders looking for in a research proposal?
http://www.embarcadero.com
Data yields information when its definition is understood or readily available and it is presented in a meaningful context. Yet even the information that may be gleaned from data is incomplete because data is created to drive applications, not to inform users. Metadata is the data that holds application
data definitions as well as their operational and business context, and so plays a critical role in data and application design and development, as well as in providing an intelligent operational environment that's driven by business meaning.
Data Protection by Design and Default for Learning AnalyticsTore Hoel
The Principle of Data Protection by Design and Default as a lever for bringing Pedagogy into the Discourse on Learning Analytics. Workshop presentation at ICCE 2016 conference in Mumbai, India 29 November 2016
Characterizing and Processing of Big Data Using Data Mining TechniquesIJTET Journal
Abstract— Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. It concerns Large-Volume, Complex and growing data sets in both multiple and autonomous sources. Not only in science and engineering big data are now rapidly expanding in all domains like physical, bio logical etc...The main objective of this paper is to characterize the features of big data. Here the HACE theorem, that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective, is used. The aggregation of mining, analysis, information sources, user interest modeling, privacy and security are involved in this model. To explore and extract the large volumes of data and useful information or knowledge respectively is the most fundamental challenge in Big Data. So we should have a tendency to analyze these problems and knowledge revolution.
Enterprise Knowledge - Taxonomy Design Best Practices and MethodologyEnterprise Knowledge
This presentation, origninally presented at the Knowledge Management Institute's KM Symposium on March 27, 2014, addresses the concepts of business taxonomy value, taxonomy design methodology, and taxonomy design best practices. It is intended as an introductory deck for anyone seeking guidance on taxonomy design efforts.
Best Practice Intelligence Portals for Telecommunication & High Tech Companie...Comintelli
How do you handle information overload? How do you break information silos? How do you spend less time trying to find information and more time analyzing it? This presentation is from a webinar held on June 7th, 2017 and describes why and how intelligence portals can provide a single place for all relevant market and industry news, reports and research. Focusing on topics and sources for the telecom & high tech industry, a portal can turn unstructured information from multiple sources into actionable insights.
Maturing Your Organization's Information Risk Management StrategyPrivacera
As organizations grow, they face more risks associated with the security and protection of sensitive data. Organizations struggling to navigate the different stages of business need to be sensitive to the increasing maturity necessary to support increasing demands for data governance and information risk management.
Learn about:
▪ Four different stages of the maturity curve
▪ Assessing data sensitivity and classifying data assets
▪ Access controls and data protection
▪ Interpreting policies and determining their impact on information management
▪ Determining the impact of data protection policies on information management practices
▪ Automating policy compliance auditing
▪ Maintaining governance consistency across the hybrid data enterprise
Watch the on-demand webinar here: https://tdwi.org/webcasts/2021/03/arch-all-maturing-your-organizations-information-risk-management-strategy.aspx with TDWI Speaker: David Loshin, President of Knowledge Integrity and guest speaker Bill Brooks, Director of Solutions Engineering, Privacera (www.privacera.com)
“Recognizing Value from a Shared RM/DM Repository: Canadian Government Perspe...Cheryl McKinnon
2003 ARMA Conference Proceedings paper outlining Canadian government examples in content and information management. A historical piece, with focus on Canadian Federal RDIMS initiative up to 2003 and City of Coquitlam. Background to ARMA session co-delivered by Cheryl McKinnon and Heather Gordon
Agent-SSSN: a strategic scanning system network based on multiagent intellige...IJERA Editor
This article reports a development of a strategic scanning system network prototype system based on multi agent
system and ontology, called Agent-SSSN, for developing business intelligent strategies. This is a cooperative
approach to integrate the knowledge of experts in business intelligent system. The approach presented in this
chapter is targeted towards using ontologies. The use of ontologies in MAS environment enables agent to share
a common set of concept about context, expert user profiles and other domain elements while interacting with
each other. In this paper, we focus especially on the modeling of the system Multi-Agents using O-MaSE
(Organization-based Multiagent Systems Engineering Methodology) and a conceptual diagram of the ontology
database.
Data Profiling, Data Catalogs and Metadata HarmonisationAlan McSweeney
These notes discuss the related topics of Data Profiling, Data Catalogs and Metadata Harmonisation. It describes a detailed structure for data profiling activities. It identifies various open source and commercial tools and data profiling algorithms. Data profiling is a necessary pre-requisite activity in order to construct a data catalog. A data catalog makes an organisation’s data more discoverable. The data collected during data profiling forms the metadata contained in the data catalog. This assists with ensuring data quality. It is also a necessary activity for Master Data Management initiatives. These notes describe a metadata structure and provide details on metadata standards and sources.
MyComplianceOffice presents our Oct 26th webinar, “ Prepare Your Firm for GDPR", co-hosted by MCO and Emily Mahoney a Technology Lawyer at Mason Hayes & Curran
GDPR Breakfast Briefing - For Business Owners, HR Directors, Marketing Direct...Harrison Clark Rickerbys
Slideshow from GDPR Breakfast Briefing - For Business Owners, HR Directors, Marketing Directors, IT Directors & Ops Directors, on 7th March 2018 at Hilton Puckrup Hall
http://www.embarcadero.com
Data yields information when its definition is understood or readily available and it is presented in a meaningful context. Yet even the information that may be gleaned from data is incomplete because data is created to drive applications, not to inform users. Metadata is the data that holds application
data definitions as well as their operational and business context, and so plays a critical role in data and application design and development, as well as in providing an intelligent operational environment that's driven by business meaning.
Data Protection by Design and Default for Learning AnalyticsTore Hoel
The Principle of Data Protection by Design and Default as a lever for bringing Pedagogy into the Discourse on Learning Analytics. Workshop presentation at ICCE 2016 conference in Mumbai, India 29 November 2016
Characterizing and Processing of Big Data Using Data Mining TechniquesIJTET Journal
Abstract— Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. It concerns Large-Volume, Complex and growing data sets in both multiple and autonomous sources. Not only in science and engineering big data are now rapidly expanding in all domains like physical, bio logical etc...The main objective of this paper is to characterize the features of big data. Here the HACE theorem, that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective, is used. The aggregation of mining, analysis, information sources, user interest modeling, privacy and security are involved in this model. To explore and extract the large volumes of data and useful information or knowledge respectively is the most fundamental challenge in Big Data. So we should have a tendency to analyze these problems and knowledge revolution.
Enterprise Knowledge - Taxonomy Design Best Practices and MethodologyEnterprise Knowledge
This presentation, origninally presented at the Knowledge Management Institute's KM Symposium on March 27, 2014, addresses the concepts of business taxonomy value, taxonomy design methodology, and taxonomy design best practices. It is intended as an introductory deck for anyone seeking guidance on taxonomy design efforts.
Best Practice Intelligence Portals for Telecommunication & High Tech Companie...Comintelli
How do you handle information overload? How do you break information silos? How do you spend less time trying to find information and more time analyzing it? This presentation is from a webinar held on June 7th, 2017 and describes why and how intelligence portals can provide a single place for all relevant market and industry news, reports and research. Focusing on topics and sources for the telecom & high tech industry, a portal can turn unstructured information from multiple sources into actionable insights.
Maturing Your Organization's Information Risk Management StrategyPrivacera
As organizations grow, they face more risks associated with the security and protection of sensitive data. Organizations struggling to navigate the different stages of business need to be sensitive to the increasing maturity necessary to support increasing demands for data governance and information risk management.
Learn about:
▪ Four different stages of the maturity curve
▪ Assessing data sensitivity and classifying data assets
▪ Access controls and data protection
▪ Interpreting policies and determining their impact on information management
▪ Determining the impact of data protection policies on information management practices
▪ Automating policy compliance auditing
▪ Maintaining governance consistency across the hybrid data enterprise
Watch the on-demand webinar here: https://tdwi.org/webcasts/2021/03/arch-all-maturing-your-organizations-information-risk-management-strategy.aspx with TDWI Speaker: David Loshin, President of Knowledge Integrity and guest speaker Bill Brooks, Director of Solutions Engineering, Privacera (www.privacera.com)
“Recognizing Value from a Shared RM/DM Repository: Canadian Government Perspe...Cheryl McKinnon
2003 ARMA Conference Proceedings paper outlining Canadian government examples in content and information management. A historical piece, with focus on Canadian Federal RDIMS initiative up to 2003 and City of Coquitlam. Background to ARMA session co-delivered by Cheryl McKinnon and Heather Gordon
Agent-SSSN: a strategic scanning system network based on multiagent intellige...IJERA Editor
This article reports a development of a strategic scanning system network prototype system based on multi agent
system and ontology, called Agent-SSSN, for developing business intelligent strategies. This is a cooperative
approach to integrate the knowledge of experts in business intelligent system. The approach presented in this
chapter is targeted towards using ontologies. The use of ontologies in MAS environment enables agent to share
a common set of concept about context, expert user profiles and other domain elements while interacting with
each other. In this paper, we focus especially on the modeling of the system Multi-Agents using O-MaSE
(Organization-based Multiagent Systems Engineering Methodology) and a conceptual diagram of the ontology
database.
Data Profiling, Data Catalogs and Metadata HarmonisationAlan McSweeney
These notes discuss the related topics of Data Profiling, Data Catalogs and Metadata Harmonisation. It describes a detailed structure for data profiling activities. It identifies various open source and commercial tools and data profiling algorithms. Data profiling is a necessary pre-requisite activity in order to construct a data catalog. A data catalog makes an organisation’s data more discoverable. The data collected during data profiling forms the metadata contained in the data catalog. This assists with ensuring data quality. It is also a necessary activity for Master Data Management initiatives. These notes describe a metadata structure and provide details on metadata standards and sources.
MyComplianceOffice presents our Oct 26th webinar, “ Prepare Your Firm for GDPR", co-hosted by MCO and Emily Mahoney a Technology Lawyer at Mason Hayes & Curran
GDPR Breakfast Briefing - For Business Owners, HR Directors, Marketing Direct...Harrison Clark Rickerbys
Slideshow from GDPR Breakfast Briefing - For Business Owners, HR Directors, Marketing Directors, IT Directors & Ops Directors, on 7th March 2018 at Hilton Puckrup Hall
For more information visit https://www.thesaurus.ie or https://www.brightpay.ie
The General Data Protection Regulation (GDPR) comes into effect on 25 May 2018 with the aim of protecting all EU citizens from privacy and data breaches in an increasingly data driven world.
Payroll bureaus process large amounts of personal data, not least in relation to their customers, their customers’ employees, and their own employees. Consequently, the GDPR will impact most if not all areas of the business and the impact it will have cannot be overstated.
In this CPD accredited webinar, we will peel back the legislation to outline clearly:
What is GDPR and why is it being implemented?
Why employers need to take it seriously
How it will impact payroll bureaus
How to prepare for GDPR
How we are working to help you
Data Privacy Laws: A Global Overview and Compliance StrategiesShyamMishra72
Data privacy laws and regulations vary from one country or region to another, creating a complex landscape for businesses that operate internationally. To maintain compliance with data privacy laws and protect individuals' personal information, organizations need to understand and navigate the legal requirements. Here is a global overview of some key data privacy laws and compliance strategies:
The Summary Guide to Compliance with the Kenya Data Protection Law Owako Rodah
The Data Protection Act 2019, was enacted on November 8th, 2019, ushering a new era of accountability and responsibility with regard to processing of personal data and information. Naturally, there has been a resurrection of the chatter around data protection in increasingly data-driven social and economic settings. The question on everyone’s mind being what does this mean for me?
Dive deeper into the importance of privacy policies, their benefits for businesses, the potential penalties for inadequate policies, and the most efficient way to deploy them on websites. Read Guide: https://securiti.ai/what-is-a-privacy-policy/
IT2255 Web Essentials - Unit II Web Designingpkaviya
HTML - Form Elements - Input types and Media elements - HTML 5 - CSS3 - Selectors, Box Model, Backgrounds and Borders, Text Effects, Animations, Multiple Column Layout, User Interface.
IT2255 Web Essentials - Unit I Website Basicspkaviya
Internet Overview – Fundamental computer network concepts – Web Protocols – URL – Domain Name – Web Browsers and Web Servers – Working principle of a Website – Creating a Website – Client-side and server-side scripting.
BT2252 - ETBT - UNIT 3 - Enzyme Immobilization.pdfpkaviya
Enzymes are catalysts that perform all vital biological reactions within an organism’s body. Their distinguishing characteristic is that they endure the reaction unchanged.
Therefore, they can be utilised repeatedly. However, soluble enzymes are limited by their separation from the product and substrate.
The majority of Enzymes in a living organism are either connected to the cell membrane or encapsulated within the cells.
This result led to the hypothesis that pure separated enzymes may work better when immobilised on a solid substrate.
The phrase immobilised enzyme refers to “catalytically active enzymes that are physically limited or localised in a specific region of space and can be used again and continuously.”
The benefit of immobilisation is that it promotes work-up product isolation. Listed below are some potential advantages and disadvantages of immobility.
Soluble Enzyme + Substrate———– Product (single time usage of enzyme)
Immobilized Enzyme + Substrate———Product (Repeated usage of enzyme)
A number of essential considerations must be made when immobilising an enzyme.
The enzyme’s biological activity should be maintained.
The enzyme ought to be more stable than its soluble equivalent.
The price of immobilisation shouldn’t be excessively high.
The relationship between humans and enzymes has evolved over time. Even during historical times, where there was no concept of enzymes, ancient Egypt people produced beer and wine by enzymatic fermentation. After several thousand years, enzymatic studies have significantly progressed. Enzymes are proteins that accelerate many biochemical and chemical reactions. They are natural catalysts and are ubiquitous, in plants, animals, and microorganisms, where they catalyze processes that are vital to living organisms. The growing knowledge and technique improvement about protein extraction and purification lead to the production of many enzymes at an analytical grade purity for research and biotechnological applications. Enzymes are intimately involved in a wide variety of traditional food processes, such as cheese making, beer brewing, and wine industry. Recent advances in biotechnology, particularly in protein engineering, have provided the basis for the efficient development of enzymes with improved properties. This has led to establishment of novel, tailor-made enzymes for completely new applications, where enzymes were not previously used. The technology of immobilized enzymes is still going through a phase of evolution and maturation. Evolution is reflected in the ever-broadening range of applications of immobilized enzymes. Maturation is mirrored in the development of the theory of how immobilized enzymes function and how the technique of immobilization is related to their primary structure through the formation and configuration of their three dimensional structure. There still remains much room for the development of useful processes and materials based on this hard-won understanding.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
1. IT6701 – Information Management
Unit V – Information Lifecycle Management
By
Kaviya.P, AP/IT
Kamaraj College of Engineering & Technology
1
2. Unit V – Information Lifecycle Management
Data retention policies; Confidential and
Sensitive data handling, lifecycle management
costs. Archive data using Hadoop; Testing and
delivering big data applications for performance
and functionality; Challenges with data
administration
2
3. Data Retention Policies
What is Data Retention Policies?
• A document retention policy provides for the systematic review, retention and
destruction of documents received or created in the course of business.
• A document retention policy will identify documents that need to be maintained
and contain guidelines for how long certain documents should be kept and how
they should be destroyed.
Purpose of Data Retention Policies
• To maintain important records and documents for future use or reference.
• To dispose of records or documents that are no longer needed.
• To organize records so that they can be searched and accessed easily at a later
date.
3
4. Data Retention Policies
Categories of Requirements
• Legal or Legitimate requirements: The compliance or legal aspect, where a
certain legal case is filed and some piece of information need to be produced in a
court of law.
• Business or Commercial requirements: To make information available from the
operation’s perspective.
• Personal or Private requirements: To make information available from the
personal perspective.
4
5. Data Retention Policies
Scope : Categories of Document (What documents must be protected?)
• Legal Records: It include all the legal records, contracts, trademark, power of attorney,
press release, etc. These are the first set of documents that should be considered for
retention.
• Final Records: Documents not requiring ad hoc modification or alteration. They can
also specify records of completed activities.
• Permanent Records: Include all the business documents that describe the
organization’s details. They can also comprise of contracts, financial registers,
copyrights, patents, proposals.
• Accounting and Corporate Tax Records: Consists of financial statements,
investments, audits, tax returns, purchase, sales records, etc.
5
6. Data Retention Policies
Scope : Categories of Document (What documents must be protected?)
• Workplace Records: Information about the day-to-day activities of employees,
agreements, minutes of meetings, bylaws, etc.
• Employment, Employee, and Payroll Records: Include job postings, job
advertisements, recruitment procedures, performance reviews, etc.
• Bank Records: Information about bank transactions, deposits, cheque details, stop
payment, check bouncing.
• Historic Records: Records that are no longer required by the organization.
• Temporary Records: Documents that are not completed or finalized.
6
7. Data Retention Policies
Data Retention Policy
• When developing a retention policy, it is important to focus on the reason behind data
retention.
• The decision is based creation date, and include other criteria such as last access time, type of
data, time till which data is valid, data value, etc.
• The policy document should include details of the data/document that needs to be retained.
• The data should be divided into various categories such as personal employee data, client
data, financial data, legal data, etc.
• This division would help in deciding the duration of retention and destruction procedures.
• When the data retention period is over, the data should be discarded.
7
8. Data Retention Policies
Why to have Data Retention Policies?
The policy is also helpful to:
• Provide a system for complying with document retention laws
• Ensure that valuable documents are available when needed
• Save money, space and time
• Protect against allegations of selective document destruction, and
• Provide for the routine destruction of non-business, superfluous and outdated
documents
8
9. Data Retention Policies
Why to have Data Retention Policies?
The six most important reasons why an organization should implement a document
retention policy are:
1. To comply with legal duties and requirements, either statutory or regulatory
2. To avoid liability through “spoliation” the improper destruction or alteration of
documents in a litigation situation
3. To support or oppose a position in an investigation or litigation
4. To protect from unnecessary expense and time during discovery
5. To maintain control over discovery and e-discovery, and
6. To keep documents confidential and avoid leakage to attackers or competitors
9
10. Data Retention Policies
Laws Related to Data Retention Policy - India
• In India there is no Central Act which laid down the provisions related to Data Retention
Laws.
• But there are different policies incorporated by various agencies and which maintain and
follows their policies.
• Eg 1: Government of India Central Vigilance Commission by their wide notification
no. No.17/09/2006-Admn. gives the provisions related to Retention period/destruction
schedule of recorded files.
• Eg 2: The Ministry of Finance - Financial intelligence Unit has its own policy.
Notification No. 9/2005 - gives the “rules for Record Keeping and Reporting”.
10
11. Data Retention Policies
Laws Related to Data Retention Policy - India
• Rule 6. Retention of records - The records referred to in rule 3 shall be
maintained for a period of ten years from the date of cessation of the
transactions between the client and the banking company, financial institution or
intermediary, as the case may be.
• Thus, it may be noted that organization has its own Data retention Policies and
certain rules for retention of such records.
• However, there is no such established law wherein it is binding for the
organizations to prepare such policies.
11
12. Confidential and Sensitive Data Handling
Definition of Sensitive Data
• Data collected may be personal, confidential or sensitive in nature.
• Personal data provides information about an individual, and through which an
individual can be easily and uniquely identified, either directly or indirectly.
• Confidential data is the personal data that is private and should not be disclosed
to others.
12
13. Confidential and Sensitive Data Handling
Types of Sensitive Data
• Personal Information
– Sensitive personally identifiable information is data that can be traced back
to an individual, thus revealing one’s identity.
– Such information includes biometric data, medial information and history,
bank and credit card information, Passport or Aadhar numbers.
– Threats include not only crimes such as identity theft, but also disclosure of
personal information that the individual would prefer reminded private.
– Sensitive data should be encrypted both in transit and at rest.
13
14. Confidential and Sensitive Data Handling
Types of Sensitive Data
• Business Information
– Sensitive business information includes everything that poses a risk to the
company in question if discovered by a competitor or the general public.
– Such information includes trade secrets, contract details, acquisition plans,
financial data, supplier details, customer information.
– Methods of protecting corporate information from unauthorized access are
becoming integral to corporate security.
– These methods include deciding policy for security, metadata management
and document sanitization.
14
15. Confidential and Sensitive Data Handling
Types of Sensitive Data
• Classified Information
– It is pertains to a government body and is restricted according to the level of
sensitivity. (Eg: restricted, confidential, secret, and top secret)
– Information is generally classified to protect security.
– Once the risk of harm has passed or decreased, classified information may
be declassified and, possibly, made public.
15
16. Confidential and Sensitive Data Handling
Handling of Sensitive Data
• Sensitive data needs to be handled with utmost care with highest possible security
measures.
• Given a dataset, one or more attribute values in the tuple/record can be sensitive and
hence needs to be protected. But at the same time, other attributes of the same
tuple/record can be made available.
• Thus, the access policy needs to be defined at different granularity levels so that access
of these values for the attributes can be made available.
• Eg: If a query is triggered seeking information of all the patients having certain health
records, it should not reveal the identity of the individuals. Instead some aggregate
function can be applied like giving the total number of count of patients suffering from
the health condition.
16
17. Confidential and Sensitive Data Handling
Access Decision
• The database administrator decides what data should be in the database and who
should have access to it.
• These decisions are based on access policies that are defined in the
organization.
• Multiple factors are considered in making these polices such as availability of
data, acceptability of the access, authenticity of the user, etc.
17
18. Confidential and Sensitive Data Handling
Types of Disclosures
Sensitive data can be also be characterized based on what values are being disclosed.
• Displaying exact data: This is the most serious disclosure where the user will directly
get the sensitive data on request or sometimes without request; the latter being a serious
security concern.
• Displaying Bounds: Bounds are a convenient way of presenting sensitive data,
indicating that the sensitive value lies between high or low value. Eg: An organization
can reveal the range of salaries given to its managers, such that any person willing to
join the organization can take decision based on it.
18
19. Confidential and Sensitive Data Handling
Types of Disclosures
Sensitive data can be also be characterized based on what values are being disclosed.
• Displaying negative results: Sometimes a query could display a negative result,
specifying that a particular value is not present. This is of particular importance if the
data is of binary type and is represented as 0 or 1. Thus disclosing a value 0 is of
significant importance. However, in certain cases displaying information like whether a
student will appear in the top 10 list would not reveal significant information.
• Displaying probable values: Sometimes it maybe be possible to determine the
probability that a certain attribute will hold a particular value.
• Sensitive data can be secured by keeping it in an encrypted format so that the
information is not accidently revealed. But this can be tedious sometimes, if different
attributes need different levels of confidentiality.
19
20. Confidential and Sensitive Data Handling
Handling Data
1. Create a risk aware culture that includes an information security risk management
program. Define security and risk mitigation and handling policies at the enterprise
level.
2. Define data types used in the organization and classify it as confidential or sensitive.
3. Clarify responsibilities and accountability for the protection of confidential/sensitive
data.
4. Limit the access to confidential/sensitive data only to those absolutely essential to
institutional process.
5. Provide awareness and training to properly use the resources and follow the guidelines
and rules specified.
6. Authenticate compliance regularly with your policies and procedures.
20
21. Confidential and Sensitive Data Handling
Law provision in India Defining Sensitive Data and its Handling
Right to Information Act, 2005 gave a stimulus to transparency in government dealings and
concurrently provided some protection against the unwarranted disclosure of confidential
information under the law.
• A new civil provision prescribing damages for an entity that is negligent in using
“reasonable security practices and procedures” while handling “sensitive personal or
data information” resulting in wrongful loss or wrongful gain to any person.
• Criminal punishment for a person (a) if s/he discloses sensitive personal information;
(b) does so without the consent of the person or in breach of relevant contact and (c)
with an intention of ,or knowing that the disclosure would cause wrongful loss or gain.
• The IT rules introduced in 2011, defines “sensitive personal data” for the first time in
India.
21
22. Confidential and Sensitive Data Handling
Law provision in India Defining Sensitive Data and its Handling
The salient features of the new rules are as follows:
• Sensitive personal information: The laws relate to dealing with information generally,
personal information and “sensitive personal or data information”(SPD). SPD is defined to
cover the following : (a)passwords,(b)financial and credit information such as bank account or
credit card or debit card or other payment instrument details;(c)physical, physiological and
mental conditions ;(d) sexual orientation; (e) medical records and history and (f) biometric
and deoxyribonucleic acid(DNA) information. It may be noted that SPD deals with
information of individuals and not information of business.
• Privacy policy: Every business needs to have a privacy policy that must be published on its
website. Even if the business is not handling SPD, it is required to have a privacy policy. It
must describe what information is collected, what is the purpose of using the information, to
whom or how the information might be disclosed and the sound security practices followed to
safeguard the information.
22
23. Confidential and Sensitive Data Handling
Law provision in India Defining Sensitive Data and its Handling
The salient features of the new rules are as follows:
• Consent for collection: A business cannot collect SPD unless it obtains the prior
consent of the Information provider. The consent has to be provided by letter, fax or
email.
• Notification: The business should ensure that the information provider is aware
of the information being collected, the purpose of using the information, the
recipients of the information and the name and address of the agency collecting
the information.
• Use and Retention: The usage of personal information has to be restricted to
the purpose for which it was collected. The data retention rules have to be
followed in terms of maintaining the data for specified period as well as
destroying the data after that. The business should not maintain the SPD for
longer than it is specified.
23
24. Confidential and Sensitive Data Handling
Law provision in India Defining Sensitive Data and its Handling
The salient features of the new rules are as follows:
• Rights of access, correction and withdrawal: The business should permit the
information provider the right to review the information, and should ensure that
any information found to be inaccurate or deficient be corrected. The
information provider also has the right to withdraw its consent to the collection
and use of the information
• Transnational transfer: A business can only transfer the SPD or information to
a party overseas if the overseas party ensures the same level of protection
provided for under the Indian rules.
• Security procedures: The IT Act requires reasonable security procedures to be
maintained to escape liability. The security procedure has to be audited on a
regular basis by an independent auditor, approved by the Government of India.
24
25. Lifecycle Management Costs
• Data Lifecycle Management is the process of handling the flow of business
information throughout its lifespan, from requirements through maintenance.
• Information Lifecycle Management (ILM) is the consistent management of
information from creation to final disposition.
• It is comprised of strategy, process, and technology to effectively manage
information which, when combined, drives improved control over information in
the enterprise.
• It aims at automating the processes involved in organizing data into separate tiers
according to the specified policies, and automating data migration from one tier to
another tier.
• As a rule, newer data, and data that must be accessed more frequently, is stored on
faster, but more expensive storage media, while less critical data is stored on
cheaper, but slower media.
25
26. Lifecycle Management Costs
Benefits of Information Management Lifecycle
• Reduced Risk: Reduce unneeded and expired information, and make your information
easier to manage and discover.
• Cost Saving: eDiscovery, storage, and legal hold costs can be reduced with better
management of information.
• Improved Service: Archiving, eDiscovery, and Records Management may become less
of a distraction and drain on IT and Legal.
• Effective Governance: ILM can introduce management rigor and controls that benefit
the enterprise. ILM can bring the added bonus of improved management of information
for the entire business.
26
27. Lifecycle Management Costs
Five Stages of Data Lifecycle
• Data Creation
– When an employee or client creates and saves a file, that data becomes a part of the
organization’s daily operation.
– Enterprises often store this active data locally and on a network server while backing it
up on local storage appliances or cloud storage.
– This setup provides for fast recovery in case of data loss.
• Backup storage against data loss
– As the system’s efficiency increases, the enterprise can replicate the data from primary
storage into less costly off-site tape vaults or to the cloud.
– In case of a major outage or disaster, the data can be restored completely.
– The backup of the data and the amount of replication depends on the type and value of
the data.
27
28. Lifecycle Management Costs
Five Stages of Data Lifecycle
• Archiving helps contain storage costs
– Older inactive data that is not frequently handled can be retained in case of a legal, regulatory
or audit event.
– Various data storage networks can be used to archive the data, or data can be retained using
cloud or Hadoop.
– Offsite tapes offer high security, quick access, lower storage costs for such long-term data
storage demands.
– This kind of low-cost tape is particularly well suited to unstructured data such as Email.
• Ensuring secure data destruction
– The final stage of data lifecycle requires secure data destruction, which is typically governed
by a schedule that defines when and how you must destroy unwanted data.
– Once data reaches its expiration date, secure media destruction can ensure its environmentally
friendly disposal.
28
29. Lifecycle Management Costs
Five Stages of Data Lifecycle
• Put secure IT asset disposition to work
– The data storage lifecycle does not end until the last traces of data are destroyed –and this
includes information remaining within any obsolete hardware or peripherals.
– As with media destruction, maintain the chain of custody when eliminating any old computers
and office equipment.
Efficient Information Lifecycle Management
• For handling large amount of data, the storage needs to be scalable to accommodate it. Hence,
a flexible architecture should be considered for storage.
• Analytics application in some cases require us to access archived and unstructured data. To
leverage analytics, to make informed decision data can be archived into frameworks like
Hadoop.
• The storage can be optimized for maintenance and licensing costs by migrating rarely used
data into framework like Hadoop.
29
30. Lifecycle Management Costs
To proficiently manage data throughout its entire lifecycle, organizations must keep three
objectives in mind:
• Data veracity(trustworthiness) is critical for both analytics and regulatory compliance.
• Both structured and unstructured data must be managed effectively.
• Data privacy and security must be protected at all times.
30
31. Archive Data Using Hadoop
• The inexpensive cost of storage for Hadoop which supports to store any type of
data like structured , semi-structured or unstructured data plus the ability to query
Hadoop data using SQL commands.
• Hadoop utilizes commodity hardware and can be easily scaled up to
accommodate new data.
• Thus, the Hadoop environment can be used to archive and process the data.
• The Hadoop used to perform archiving is Sqoop, which can move the data to be
archived from the data warehouse into Hadoop.
• You will need to consider what form you want the data to take in your Hadoop
cluster. In general, compressed Hive files are a good option.
31
32. Archive Data Using Hadoop
• Archiving everything has an advantage of providing a single interface across the entire
dataset for issuing queries.
• Partial availability of data would require queries to be executed on the archived data
and the active data, and provide a merged solution of the two queries.
• An enterprise data warehouse archiving solution for Hadoop must provide three key features:
– Schema conversation: The archive must precisely duplicate the schema of the source
warehouse. It is essential to confirm that data values will be archived without loss of
precision. Changes to the source schema, for example, adding new columns or changing data
types, should also be captured by the archive.
– Control and security: The archive must provide access to data on a “need to know” basis; it
must guarantee that sensitive data is encrypted or masked, and that access is audited.
– Querying support: Support for SQL access to the archived data is essential. Applications
would require us to make use of the archived data to generate reports or to perform
analysis.
32
33. Testing and Delivering Big Data Applications for
Performance and Functionality
• Testing bid data application is more a verification of its data processing rather than testing
the individual features of the software product.
• When it comes to big data testing, performance and functional testing are the key
components to evaluate.
• The testing of Hadoop big data application can be performed as a two-step process.
– Checking the functionality: The business logic encoded using MapReduce programs
is tested in this phase. For this, unit testing can be performed and executed in the
pseudo-distributed mode.
– Checking on the cluster: Once the business logic is validated, it can be tested on the
cluster for the performance and failover. Performance testing includes testing of job
completion and the time taken, utilization of the memory and other resources, data
throughput, etc. Failover testing included failure of one or more daemons running in
Hadoop, namely, NameNode, DataNode, Resource Manager, Node Manager or failure of
the device through which the distributed environment is made available.
33
34. Testing and Delivering Big Data Applications for
Performance and Functionality
Testing big data applications have several challenges, which include the following:
• Automation: Support of automation tools for performing testing is not available. Thus,
automation in testing for big data requires someone with technical expertise. Also, automated
tools are not equipped to handle unexpected problems that arise during testing.
• Virtualization: Testing, especially unit testing, is usually performed in a virtual environment.
It is one of the fundamental phases of testing. Virtual machine latency creates timing
problems in real time big data testing. Also, managing images in big data is a hassle.
• Large dataset: The amount of data is huge and can have many variations. Further they can
originate from different sources, thus integrating data is a major challenge. Thus, more data
needs to be verified and this needs to be done at faster rate.
• Testing across platforms: Hadoop is a collection various tools. The applications can be
written using any of the tools. Thus, there is a need of tools that will enable testing across
different platforms.
• Monitoring and diagnostic solution: There are limited solutions that can monitor the entire
execution environment and detect bottleneck or failures. 34
35. Challenges with Data Administration
• The Data administrator is responsible for designing and maintaining data stores.
• Data administration is the method by which data is monitored, managed and
maintained by a person or an organisation.
• Data administration allows an organisation to check its data resources, along with their
processing and communications with different applications and business processes.
• Data Administrator needs to integrate data from multiple resources and provide it to
various applications.
• Data administrator deals with designing of the logical and conceptual models treating
the data at an organisational level whereas Database administrator deal with
implementation of databases required and in use.
35
36. Challenges with Data Administration
Responsibility of Data Administrator
1. Data Policies, Procedures, Standards
• Data administrator should set the data creation and handling policies which include details of
which application can interact with which data, how that data can be changed and what is the effect
of the change.
• Data Procedures are documented plan of actions to be taken to perform a certain activity like
backup and recovery procedures. Data administrator’s role is to ensure that these procedures are
defined and communicated to all concerned employees.
• Data Standards are unambiguous conventions and behaviours that need to be followed so that
the maintenance becomes easy. It can also be used to evaluate database quality.
2. Planning
• Effective administration of data requires an understanding of the organisations needs and the
ability to lead the development of an information architecture that will meet the diverse needs of
the organisation.
• Thus a data administrator needs to plan for an effective administration of data and also provide
support for future needs.
36
37. Challenges with Data Administration
Responsibility of Data Administrator
3. Data Conflict(ownership) Resolution
• Data stores are planned to be shared and usually involve data from several different departments of
the organisation.
• Ownership of data in a sensitive issue in every organisation.
• Data administrator should establish procedures for resolving any conflicts in ownership.
4. Managing the Data Repository
• Data Repositories contain metadata that holds data description of the data stored in data stores.
• They describe an organisations data and data processing resources.
• As the data stores are increasing in size and incorporating unstructured data, data repositories need
to be enhanced to incorporate new and unseen data.
5. Internal Marketing of DA Concepts
• For data administration to be effective, established policies and procedures must be made known
to the internal staff. These may reduce resistance to changes or ownership problems.
37
38. Challenges with Data Administration
Responsibility of Data Administrator
1. Designing the Database
• The administrator is responsible for defining and creating the logical data model, physical
database model and prototyping.
2. Security and Authorization
• The database administrator ensures that there is no unauthorized access to data. In general,
the data should not be accessible to everyone.
• In a database system, user may be granted permission to access only certain views and
relations.
• The administrator can enforce various authentication and authorization techniques through
which the access can be guaranteed only to specific entities.
• Authentication techniques will ensure that the person is an individual who is supposed to
access the data while authorization techniques decide what data has to be given access to.
38
39. Challenges with Data Administration
Responsibility of Data Administrator
3. Data Availability and Recovery from Failures
• The administrator makes sure that the data is available at all times.
• In case of database failure, the administrator should ensure that the data is made
available to its user in such a way that the users are unaware of the failure.
• The administrator also ensures that the data remains in a consistent state and
appropriate techniques to achieve these are implemented.
4. Database Tuning
• Data needs to be evolved with time as the users need change.
• The administrator should modify the structure or design of the database to incorporate
these changes.
• The DBA is responsible for modifying the database in particular the conceptual and
logical design.
39
40. Challenges with Data Administration
Challenges of Data Administrator
• Creating the Data Repository
– With huge amount of data flowing in from various sources, integrating it to create
a common data repository is challenging.
– This is further complicated since the data is in an unstructured format.
– Pre-processing is an important step in preparing the data for processing and
efficient techniques need to be developed.
• Evolving Nature of Data Consideration in Analysis
– A modern administrator is required to have an understanding of the vast domains
as organizations are now dealing with new types of data.
– Eg: A machine data is centrally logged and stored. For tracking the machines
performance its data needs to be understood well enough to gain insight from it even
if they do not possess the relevant technical background.
40
41. Challenges with Data Administration
Challenges of Data Administrator
• Emphasize the capability to build a database quickly, tune it for maximum
performance and restore it to production quickly when problems develop.
• Enforcing the data policies and standards especially those related to security.
• As the organizations needs are changing, efficient support should be provided to
incorporate the changes and make provision for future scope.
• Ownership criteria of the data in not restricted to the internal staff. With the social
media, it is tricky to define the ownership of data.
• The administrator is always expected to keep abreast with new technologies and is
usually involved in mission critical applications.
• Another challenging aspect is that data administrators are required to have a
comprehensive understanding of a wide variety of topics to understand and improve
business processes in their organization.
41