3. ARMA NS. Louise Spiteri
Big data is high-volume, high-velocity and
high-variety information assets that demand
cost-effective, innovative forms of information
processing for enhanced insight and decision
making. http://www.gartner.com/it-glossary/big-data/
Big data is a term that describes large volumes
of high velocity, complex, and variable data
that require advanced techniques and
technologies to enable the capture, storage,
distribution, management, and analysis of the
information.
http://www.techamerica.org/Docs/fileManager.cfm?f=techamerica-
bigdatareport-final.pdf
Defining
Big Data
4. ARMA NS. Louise Spiteri
“Insights from Big Data can enable you to
make better decisions. They can help you
facilitate growth and organizational
transformation, reduce costs and manage
volatility and risk. This enables you to
capitalize on new sources of revenue and
generate more value for your organization.”
Financial Accounting Advisory Services (n.d.). Big data strategy to support the CFO and
governance agenda
The
value of
Big Data
6. ARMA NS. Louise Spiteri
How much
data does
your
organization
generate?
7. ARMA NS. Louise Spiteri
Big Data tends to be measured in terms of
terabytes and petabytes (1024 terabytes).
Definitions of “big” are relative, and fluctuate,
especially as storage capacities increase over
time.
Data is generated by every computerized
system in the organization, including human
resources solutions, supply-chain management
software, and social media tools for marketing.
Volume
8. ARMA NS. Louise Spiteri
Google indexes 20 billion pages per day.
Twitter has more than 500 million users and 400
million tweets per day.
Facebook generates 2.7 million
‘Likes’, 500 TB processed, and 300 million photos
that are uploaded per day.
http://bit.ly/1SVxPwp; http://bit.ly/1SVy76j; http://bloom.bg/1SVyldK
Examples
of volume
9. ARMA NS. Louise Spiteri
What types
of data do
you collect
& manage?
10. ARMA NS. Louise Spiteri
Organizations generate various types of structured,
semi-structured, and unstructured data.
Structured data is the tabular type found in
spreadsheets or relational databases (about 10% of
most data).
Text, images, audio, and video are examples of
unstructured data, which sometimes lacks the
structural organization required by machines for
analysis
Variety
11. ARMA NS. Louise Spiteri
How
quickly
does your
data grow
& change?
12. ARMA NS. Louise Spiteri
Velocity refers to the rate at which data is
generated and the speed at which it should be
analyzed and acted upon.
The proliferation of digital devices such as
smartphones has led to an unprecedented rate
of data creation and is driving a growing need
for real-time analytics and evidence-based
planning
Velocity
13. ARMA NS. Louise Spiteri
How
accurate
& reliable
is your
data?
14. ARMA NS. Louise Spiteri
Some data is inherently unreliable; for
example, customer comments in social media,
as they entail judgment.
We need to deal with imprecise and uncertain
data. Is the data that is being stored, and
mined meaningful to the problem being
analyzed?
Veracity
15. ARMA NS. Louise Spiteri
Big Data is often characterized by relatively
“low value density”. That is, the data received
in the original form usually has a low value
relative to its volume. However, a high value
can be obtained by analyzing large volumes of
such data.
Value
16. ARMA NS. Louise Spiteri
Value is any application of big data
that:
• Drives revenue increases (e.g. customer
loyalty analytics)
• Identifies new revenue opportunities,
improves quality and customer satisfaction
(e.g., Predictive Maintenance),
• Saves costs (e.g., fraud analytics)
• Drives better outcomes (e.g., patient care).
Value
18. ARMA NS. Louise Spiteri
Blogs, tweets, social networking sites (such as
LinkedIn and Facebook), blogs, news feeds,
discussion boards, and video sites all fall under
Big Data.
Social
media
19. ARMA NS. Louise Spiteri
Machine-generated data constitutes a wide variety
of devices, from RFIDs to sensors, such as optical,
acoustic, seismic, thermal, chemical, scientific, and
medical devices, and even the weather.
Machine-
generated
data
20. ARMA NS. Louise Spiteri
From the GPS systems in our cars, in planes, and ships, to
GPS apps on smartphones, we use GPS to guide our
movements.
GPS is used to track our movements, such as emergency
beacons, and retailers who use in-store WiFi networks to
access shoppers’ smartphones and track their shopping
habits.
Location Based Services (LBS) allow us to deliver services
based on the location of moving objects such as cars or
people with mobile phones.
GPS
and
spatial
data
22. ARMA NS. Louise Spiteri
It is generally thought that the true value of Big Data is
seen only when it is used to drive decision making.
You need efficient processes to turn high volumes of
fast-moving and varied data into meaningful insights.
As information managers, you might not be doing the
analysis, but you have a crucial role to play in
managing this data to enable this analysis.
Big Data
analytics:
How do
we mine
our data?
23. ARMA NS. Louise Spiteri
Text analytics extract information from textual
data.
• Social network feeds, emails, blogs, online forums, survey
responses, corporate documents, news, and call centre
logs are examples of textual data held by organizations.
Text analytics enable organizations to convert
large volumes of human generated text into
meaningful summaries, which support
evidence-based decision-making.
Text
analytics
24. ARMA NS. Louise Spiteri
Audio analytics analyze and extract information
from unstructured audio data. Customer call
centres and healthcare are the primary
application areas of audio analytics.
• Call centres use audio analytics for efficient analysis
of recorded calls to improve customer experience,
evaluate agent performance, and so forth.
• In healthcare, audio analytics support diagnosis and
treatment of certain medical conditions that affect the
patient’s communication patterns
(e.g.,schizophrenia), or analyze an infant’s cries to
learn about the infant’s health and emotional status.
Audio
analytics
25. ARMA NS. Louise Spiteri
Video analytics involves a variety of techniques to
monitor, analyze, and extract meaningful information
from video streams.
The increasing prevalence of closed-circuit television
(CCTV) cameras and of video-sharing websites are
the two leading contributors to the growth of
computerized video analysis. A key challenge,
however, is the sheer size of video data.
Video
analytics
26. ARMA NS. Louise Spiteri
Social media analytics refer to the analysis of
structured and unstructured data from social
media channels.
• Social networks (e.g., Facebookand LinkedIn)
• Blogs (e.g., Blogger and WordPress)
• Microblogs (e.g.,Twitter and Tumblr)
• Social news (e.g., Digg and Reddit)
• Socia bookmarking (e.g., Delicious and StumbleUpon)
• Media sharing (e.g., Instagram and YouTube)
• Wikis (e.g., Wikipedia and Wikihow)
• Question-and-answer sites (e.g., Yahoo! Answers and Ask.com)
• Review sites (e.g., Yelp, TripAdvisor)
Social
media
analytics
27. ARMA NS. Louise Spiteri
Predictive analytics comprise a variety of
techniques that predict future outcomes based
on historical and current data, e.g., predicting
customers’ travel plans based on what they
buy, when they buy, and even what they say on
social media.
Predictive
analytics
29. ARMA NS. Louise Spiteri
• More data translates = higher risk of exposure in the event of a
breach.
• More experimental usage = the organization's governance and
security protocol is less likely to be in place
• New types of data are uncovering new privacy implications, with
few privacy laws or guidelines to protect that information (e.g.,
cell phone beacons that broadcast physical location, & health
devices such as medical, fitness and lifestyle trackers).
• Data linkage and combined sensitive data. The act of combining
multiple data sources can create unanticipated sensitive data
exposure.
Considerations
for Big Data
30. ARMA NS. Louise Spiteri
“The protection of information and
information systems from unauthorized
access, use, disclosure, disruption,
modification, or destruction in order to
provide confidentiality, integrity, and
availability.” National Institute of Standards and Technology
http://nvlpubs.nist.gov/nistpubs/ir/2013/NIST.IR.7298r2.pdf
Information
security:
Definition
31. ARMA NS. Louise Spiteri
“The claim of individuals, groups
or institutions to determine for
themselves when, how and to
what extent, information about
them is communicated to
others.” International Association of Privacy Professionals
https://iapp.org/resources/privacy-glossary
Data
privacy:
Definition
32. ARMA NS. Louise Spiteri
Under the federal Personal Information Protection and
Electronic Documents Act (PIPEDA), “personal
information” is “information about an identifiable
individual, but does not include the name, title or
business address or telephone number of an
employee of an organization.”
Regulatory
framework
for big
data
33. ARMA NS. Louise Spiteri
The protection of personal information in Canada rests
on three fundamental goals:.
• Transparency – providing people with a basic understanding of how
their personal information will be used in order to gain informed
consent
• Limiting use plus consent – the use of that information only for the
declared purpose for which it was initially collected, or purposes
consistent with that use; and,
• Minimization – limiting the personal information collected to what is
directly relevant and necessary to accomplish the declared purpose
and the discarding of the data once the original purpose has been
served.
PIPEDA
and big
data
34. ARMA NS. Louise Spiteri
Organizations that attempt to implement Big Data
initiatives without a strong governance regime in place,
risk placing themselves in ethical dilemmas without set
processes or guidelines to follow.
A strong ethical code, along with process, training,
people, and metrics, is imperative to govern what
organizations can do within a Big Data program.
Big Data
governance
35. ARMA NS. Louise Spiteri
Data used for Big Data analytics can be gathered
combined from different sources, and create new data
sets.
Organizations must make sure that all security and
privacy requirements that are applied to their original
data sets are tracked and maintained across Big Data
processes throughout the information life cycle, from
data collection to disclosure or retention/destruction.
Respecting
the original
intent of the
information
gathered
36. ARMA NS. Louise Spiteri
Data that has been processed, enhanced, or changed
by Big Data should be anonymized to protect the
privacy of the original data source, such as customers
or vendors.
Data that is not properly anonymized prior to external
release (or in some cases, internal as well) may result
in the compromise of data privacy, as the data is
combined with previously collected, complex data
sets.
Re-
Identification
37. ARMA NS. Louise Spiteri
Matching data sets from third parties may provide
valuable insights that could not be obtained with
your data alone.
You need to consider and evaluate the adequacy of
the security and privacy data protections in place at
the third-party organizations.
Third-
party
use
38. ARMA NS. Louise Spiteri
Big data’s potential for predictive analysis raises
particular concerns for data security and privacy.
• Think of the famous case of Target, which sent
coupons to a teenage girl, based upon her
shopping preferences, which suggested she
was pregnant, as well as her due date (Target
was accurate). The girl’s family found out
about her pregnancy through these coupons.
• Did the girl know that her shopping information
would be used for this purpose?
• Was she informed of Target’s privacy policy?
The risks of
predictive
analytics
39. ARMA NS. Louise Spiteri
There are growing concerns that Big Data is
straining the privacy principles of identifying
purposes and limited use.
Consumers are called upon to agree to privacy
policies and consent forms that no one has the
time to read. The burden is increasingly placed
on the consumers, as these policies take the
form of disclaimers for the orgnizations.
Increasing
burden on
the
consumer
40. ARMA NS. Louise Spiteri
“Just because commercial organizations
can collect personal information and run
it through the revealing algorithms of
predictive analytics, doesn’t mean that
they should.” Jennifer Stoddard
https://www.priv.gc.ca/media/sp-d/2013/sp-d_20131017_e.asp
Can we
vs.
should
we?
41. ARMA NS. Louise Spiteri
A useful tool is the Privacy Maturity Model
designed by the American Institute of Certified
Public Accountants (AICPA) or the Canadian
Institute of Chartered Accountants (CICA).
These sections are particularly relevant:
• 1.2.3: Personal Information Identification and classification
• 1.2.4: Risk Assessment
• 1.2.6: Infrastructure and Systems Management
• 3.2.2: Consent for new Purposes and uses
• 4.2.4: Information developed About Individuals
• 8.2.1: Information security Program.
• http://bit.ly/1SrCcih
Privacy
assessment
42. ARMA NS. Louise Spiteri
Privacy Life cycle (from Maturity Model)
44. ARMA NS. Louise Spiteri
Strong data governance policies
and procedures are important:
• Who owns the data?
• Who is responsible for protecting the
data?
• How is data collected?
• What data is collected?
• How is the data retained?
Handling
&
retaining
data
45. ARMA NS. Louise Spiteri
What security & privacy regulations apply to your
data?
What are the compliance provisions of your
agreements with any third parties or service providers.
What are their privacy and security policies?
Developing a solid compliance framework with a risk-
based map for implementation and maintenance.
Compliance
46. ARMA NS. Louise Spiteri
Develop case scenarios where you would use Big
Data.
Identify what data will be used and how.
Identify possible risks
In this way, you are prepared for when you actually
use the Big Data, rather than be in a position to react
if something goes wrong.
Data
use
cases
47. ARMA NS. Louise Spiteri
Tell your customers what personal data you
collect and how you use it.
Provide consistent consent mechanisms
across all products
Ensure that customers have the means to
withdraw their consent at the individual device
level.
Manage
consent
48. ARMA NS. Louise Spiteri
Have rigorous controls over who has access to
the data.
Have periodic review of who has access rights,
and ensure that rights are removed
immediately, as and when required.
Access
management
49. ARMA NS. Louise Spiteri
Remove all Personally Identifiable
Information (PII) from a data set and turn it into non-
identifying data.
Monitor anonymization requirements and analyze
the risks of re-identification.
Anonymization
50. ARMA NS. Louise Spiteri
Maintain your responsibility to your customers
when you share data with third parties.
Include specific Big Data provisions within
contractual agreements.
Monitor third parties for compliance with data-
sharing agreements.
Data
sharing
55. ARMA NS. Louise Spiteri
Internal Revenue Service (US)
• An unnamed source used an IRS app to download
forms on 200,000 people.
• They were successful in downloading half this
amount and used 15,000 of the forms to claim tax
refunds in other people’s names.
Government
breach, 1
56. ARMA NS. Louise Spiteri
Australian Immigration
Department
• An employee of the department
inadvertently sent passport, visa,
and personal information of all the
world leaders attending the Brisbane
Summit to the organizers of the
Asian Cup football tournament.
Government
breach, 2
58. ARMA NS. Louise Spiteri
Louise.Spiteri@dal.ca
• @Cleese6
• LinkedIn: http://bit.ly/1SrCm9g
• AboutMe: https://about.me/louisespiteri
• ResearchGate: http://bit.ly/1SrCqWB
• School of Information Management: www.dal.ca/sim
Contact
information
59. ARMA NS. Louise Spiteri
http://www.looiconsulting.com/home/enterprise-big-data/
http://www.ibmbigdatahub.com/sites/default/files/infographic
_file/4-Vs-of-big-data.jpg
http://www.kscpa.org/writable/files/AICPADocuments/10-
229_aicpa_cica_privacy_maturity_model_finalebook.pdf
http://blog.templatemonster.com/2013/04/30/thank-you-
pages-optimization/
Image
sources