So, I’m starting by defining how I think of Big Data.
Experimenting by falsely attributing this definition. This definition as of today has zero hits on google. Experiment with a search over time to see how, or if Google manages to find and attribute the quote to Oscar or myself.
With Big Data you are sifting a larger data set, looking for more specific information than has previously been possible. Sometimes patterns emerge that weren’t previously identified at a macro scale, that’s more often in scientific efforts; business is typically looking to being better able to exploit an existing market than break new ground.
So what are you looking to analyse? What are the data sets and how have they been compiled? What is their provenance? What about the data quality? Where Big Data projects have provided meaningful benefits a trend shows that these companies have three aspects in place;
Strong staff who are interested in asking the right questions, not obsessed in ‘big data’ as a buzzword. Big Data doesn’t change the Garbage In, Garbage Out principle; Mature data quality processes are a must Responsible approach, several aspects big data can expose more details that are not palatable to the general public or sometimes to the company; you need to recognise that the analysis may challenge the hypothesis. RBAC is critical, exposing these data sets can result in significant harm to your organisation and everyone referred to either directly or indirectly Compliance becomes critical in this as soon as you have data sets which correlate to identify individuals instead of groups. Whilst personalised healthcare, advertising that predicts what we want just in time for us to purchase it and identifies criminals automatically is the goal, far too often we have found that new technologies tend to be exploited for less laudable goals. Big data under the GDPR will associate with big fines….
Gunter Ollman of NCC Domain Services proposed that these controls give an overlapping set that work together across network, vulnerability, behaviour and (to a degree) stupidity to jointly reduce the likelihood and impact of a breach.
Track all access and processing of the data, encrypt sensitive data as soon as possible, ideally at the source. Don’t leave the keys in the same place as the data. Log everything and monitor it. Leverage the anomaly detection systems to reduce the signal to noise ratio until humans can realistically review the volume of data. Use automated scanning to constantly monitor systems for vulnerabilities and malware. Monitor network egress for anomalies in traffic. Create a number of "false flag“ records. These will automatically alert your security team if they are accessed. Configure alerts and blocks to identify and prevent data breaches.
We can split companies use of big data into what happened and what will happen and further segment that to provide a maturity model.
Descriptive analytics is where most activity remains in the IT sector at the moment with regards to big data. Log collation and some analysis. In some instances we have a breach and move to diagnostic analytics as we look to analyse the detail, but this takes effort and because still many organisations do not report breaches the patterns are not always clear enough to derive a confident conclusion. This is a reactive position. Predictive analytics some of the more advanced and security focussed organisations are moving to. Threat modelling efforts sit here. Prescriptive analytics; crystal ball gazing is now moving into pre-crime, yet this is happening now for several police forces in the US. https://www.theguardian.com/technology/2016/feb/04/us-police-data-analytics-smart-cities-crime-likelihood-fresno-chicago-heat-list
On a call, officers respond, Beware checks the address and get names of residents, these are checked against public data sources to threat model them RAG.
How this is done is a trade secret, but could identify a PTSD sufferer who has tweeted about having bad experiences….. Your tweets could influence whether the officer approaches the door, and if you are flagged red, say because your account has recently been hacked then the outcome may be violent.
Traditional IT staff are often the wrong fit for big data, they focus on the T and not the I. Specialist skills are required, and only a few organisations work truly at Big Data Exabyte scales, so they are in high demand.
The analysis can improve by ensuring that the importance of data quality is embedded in all your systems to ensure that the data sets are filtered as they progress through downstream systems before they hit the Big Data aggregation point.
Experimenting by falsely attributing this definition.
This definition as of the today has zero hits on google.
I’ve configured a Google alert to track this quote and I’m looking forward to seeing who it gets attributed to.
Big Data Analytics
Big Data Analytics
What is Big Data?
“The dynamically linked super set of multiple significant
scale discrete data sets.”
• Large volumes, typically adding terabytes of data daily
• Aggregation of many historically discrete data sets
• Dynamic links between the data sets
• Any analysis is a point in time position
• Better intelligence which can be leveraged in business, healthcare etc. to
• Cost of a DNA analysis has reduced by around 5 orders of magnitude
since the process became possible, making personalised medicines a
reality in the near future.
• If you are investing in Big Data projects, the risk of data loss doesn’t
necessarily change. The Volume of loss is potentially colossal with impacts that
aren’t understood for an extended period.
• Customers hold concerns about companies taking a role of Orwellian Big
There’s no Best Practice…yet
• Snowden showed that Government organisations with specific focus on
security struggle to control Big Data and the associated risks.
• Panama Papers showed that legal firms with an inherently high level of
confidentiality in their practices struggle.
• Harder to define the purpose of data exploration.
• Big Data breaches tend to be….bigger.
• Regulators will expect technology to be used equally to exploit and
control Big Data.
Key Controls for Big Data
1. Track all access that collects, views, and manipulates sensitive data, and ensure that it is
encrypted at each point.
2. Encryption keys for sensitive data can't be stored at the same location as the data.
3. All access and processing of data must be logged. These logs must be subject to human and
4. Use automated scanning to constantly monitor systems for vulnerabilities and malware.
5. Monitor network egress for anomalies in traffic.
6. Create a number of "false flag“ records. Configure alerts and blocks to identify and prevent
How to use Big Data Analytics?
How can we influence the future?
How can we plan for the future?
Why did this happen?
Do we know what happened?
Police use of Predictive Analytics
The California city of Fresno is just one of
the police departments in the US already
using a software program called “Beware”
to generate “threat scores” about an
individual, address or area.
As reported by the Washington Post in
January, the software works by
processing “billions of data points,
including arrest reports, property records,
commercial databases, deep web
searches and the [person’s] social media
Photo: Nick Otto/For The Washington Post
How to do it well
• Specialist Skills are in demand;
• Big Data
• Data Management
• Have a plan to recruit and retain them!
• Big Data Leaders show maturity in data quality
Big Data is a pre-requisite of the desire for better
analytics, the desire to better understand. Of itself, its
just a large data set waiting to breach.
Points of contact
M: +44 (0) 7545 503 311
NCC Group Blogs
TED Talks on Big Data
“The dynamically linked super set of multiple
significant scale discrete data sets.”
Well that’s a lie.
Manchester - Head Ofﬁce