Put simply .. Not a new concept … is a more powerful version of knowledge discovery in databases or data mining which has been defined as « the non trivial extraction of implicit, previously unknown and potentially useful information from data » which also enables firms to discover or infer previously unknown facts and patterns in a databse. The term big data describe a new generation of technologies and architectures designed to economically extract value for large volumes of a wide variety of data. Obviously as tech changes and improves the size of a dataset that would qualify as big data would also change.
1.- Volume: the main attraction to BD analytics. Most immediate challenge for to conventional IT structures because you need scalable storage and distributed approach for querying . 2.- Velocity: important to take data fast from input to decision (called streaming data).input and output data. The quicker the greater the competitive advantage. The results might go directly into a product such as a recommendation feature or into dashboard used to drive decision-making. 3.- Variety: rarely does the data present itself in a form perfectly ordered and ready for processing. It can be data feed direcly from a sensor source and social network data. None of this things come ready for integration into an application. Risk of loss of information when moving from source data to processed application data. Choice on software depending on how structured the data are (variety comes into play). The terms has been invented by big tecnology companies eager to sell their software and software. Some of the big players are IBM, HP, Oracle, … ANALYTICAL USE to gain competitive advantage. Extract value: mathematitians are now suddenly sexy. As a lawyer i have always found those with a facility with numbers to be appeling. I’m happy to see im not the only one and others agree wiith me. Successfully exploiting the value in BD requires experimentation and even access to best data decyphering tool is not guarantee of great wisdom. Very few companies have people on staff with the training not to only evaluate mountains of data but also to do something with it. Capturing data is one thing making it useful is a whole other.
-> what this means is that the amount of data that companies, governements and people are creating is growing exponentially and that does not even begin to point across. -< yotabytes: 1 billion zetabytes Generally speaking experts consider petabytes of data volumes as the starting point for BD Market research firm IDC estimates that 1200 exabytes of data will be generated this year alone 3 exabytes every ten minutes. Projected 2012 sales of 367,2 million PCs, 107 million tablets, 650 million smartphones.
Not only persons feed data to the Internet, things can do it. Low cost sensors (RFID: key of your car, packages logistics sector) : digital thermostat combining sensors, machine learning and web technology, it senses not just air temperature, but the movements of people in the house their comings and goings and adjust rooms temperature to save energy. There is a lot more data generated with these sources and we can observe that they are entirely new sources of data (sensors) not just more stream of data. There are now countless digital sensors worlwide in industrial equipment,automobiles … that are communicating data to computing intellenge creating the IoT or the Industrial Internet.
New context: BG trend is MORE DATA, FASTER COMPUTERS and NEW ANALYTIC TECHNIQUES Hardware falling computing costs and scalable, distributed data processing models and open source software as Hadoop bring BD processing into the reach of the less well resourced. Hadoop is an open source software for working with BD. It was derived from Google tech and put into practice by Yahoo and others. But BD is too varied and complex for a one size fits all solution. While Hadoop has surely captured the greatest name recognition it is just one of the 3 classesof tech weel suited to storing and managing BD. The other 2 are non SQL and Massive Parallel Processing data stores. Sense making over data: which is why we have the data to begin with. Also big players providing BD solutions: IBM, Oracle, SAP, Microsoft, HP. Google (bigquery software that can scan terabytes of information in seconds).
Uses of big data can be transformative, potential benefits are vast and still largely unrealised. Smart grid: directional data flux the user receives electricity as usual but send information about what how much it consumes to be analysed, companies supplying electricity can manage this good more efficiently and adopt more rational decions about energy production (once produced electricity can be stored and must be consumed immediately). Companies: Analysts at Forrester Research estimate that enterprises use only 5% of their available data, leaving the field open to those who wants to fill up the remaining 95% and obtain th hidden value their data holds, illuminating trends, unlock new sources of economic value, improve business processes and more. Google flu trends a tool using aggregate search queries to identfy flu outbreaks by region.
I would’nt claim to have all the answers INCREASE OF DATA SUBJECTS WHOSE DATA WILL BE PROCESSED INCREASE OF DATABASES CONTAINING THESE TYPE OF DATA INCREASE OF ‘INTELLIGENCE’ OF PROCESSINGS: AGGREGATED DATA Privacy and data protection means the same thing in the age of big data as it always has but the capacity of machines to capture, store, process, synthetise and analyse details about everyone has forced new boundaries. Digital data now available to organizations or the novel ways in which BD combines these diverse data sets. BD not suprinsingly intensify existing privacy concerns over tracking and profiling.
Data is not deidentified simply because you strip of a name or an address, now much of our personal information is linked to specific devices like smartphones or laptops through UDIDs, IP adresses, fingerprinting an other means which are personally identifiable.
And once created would be regulated as personal data? Regulatory dilemma.
An identifiable person is one who can identified, directly or indirectly, in particular by reference to an identitication number or to one or more specific factors
Neither silence nor inactivity can constitute valid consent.
AID gains importance as far as BD intensifies the use of automated decision – making by substantially improving its accuracy and scope Knowledge of the logic involved in any automatic processing of data concerning him Limited remedies: it requires that the data controller brings some human judgement by reviewing the factors forming the basis of the automated decision
AID gains importance as far as BD intensifies the use of automated decision – making by substantially improving its accuracy and scope Knowledge of the logic involved in any automatic processing of data concerning him Limited remedies: it requires that the data controller brings some human judgement by reviewing the factors forming the basis of the automated decision. Should include the the controller obligation to inform data subjects on techinques and procedures for profiling (algortyms). As well as document results of profiling in case of complaints
BD’s impact on privacy requires some new and hard thinking of all of us. Be clear about what you collect: Compete case (FTC De-identify but do not ignore the fact big data can increase the risk of re-identification We need to pay attention to these issues so that bd IS REALIZED and the risks are kept to minimum. Industriy has a strong and justifiable need to contnue to innovate but we need to discuss further about collection and use in this ecoystem to instill consumer trust in the online and mobile marketplace.
Privacy in the Age of Big Data
Privacy in the age of’BIG DATA’ 56th UIA Dresde Congress - November 1st, 2012 ‘Rights of the Digital Person’ Marc Gallardo email: firstname.lastname@example.org
# Summary 1.- What is ‘Big Data’ 2.- Big Benefits3.- Big Privacy Challenges 4.- Final Remarks
# 1 Definition‘Big data usually refers to data setswhose size is beyond the ability ofcommonly-used technology tools tocapture, store, manage, and processthe data within a tolerable elapsedtime and cost’ Not a new concept: « data mining »
5 exabytes of information created between the dawn of civilization through 2003 Now 3 exabytes are created every day 1 terabyte (TB) = 1000 gigabytes (GB) 1 petabyte (PB) = 1.000.000 gigabytes (GB) 1 exabyte (EB) = 1.000.000.000 gigabytes (GB) 1 zettabyte (ZB) = 1.000.000.000.000 gigabytes (GB) 90 % of the data that now exists has been created in the last 2 years … and the pace is growing
# 3 Privacy RisksBig Data challenges some of the core privacy principles
Is the information amassed for such analysis TRULY ANONYMOUS? We can not be sure !!!It can be relatively easy to take some types of de-identified data andreassociate it with specific individuals
Re-identification of data subjectsusing Non Personal Data (NPD)Whether or not NPD that formsthe basis for data extractions ofnew knowledge is covered by ourdata protection laws
Personal data is anyinformation aboutidentified or identifiableperson
# De Lege Ferenda Definition of PD and data subject might be expanded to cover technologies (i.e. data mining) that make reverse engineering of forms of « anonymisation » more feasible. > crux point for the Regulation not to become quickly obsolete.
Consent of Data Subject:Freely given, specific, informed & explicit:statement or affirmative action.The problem under BD scenario is the DCdon’t know in advance what he may discoverafter mining data so the data subject cannotknowingly consent to the use of his data.
Automated individual decisions (AID) art. 15 DPDGrants the right not to be subject to a decisionthat produces legal effects which is based solely onautomated processing of data intented to evaluatecertain personal aspects.Art. 12(a) grants the right to discover « theknowledge of the logic ».Limited scope: human intervention / knowledgeand remedy.
Automated individual decisions (AID) art. 20 DPRGrants same right to oppose more broadly: notonly « evaluate » but analyse or predict theperson’s perfomance at work, economic situation,location, health, personal preferences, reliability orbehaviour.Right to « know the logic » is eliminated.Right to know the existence and envisaged effectof profiling.
To BD collectors & processors:I. Engage PIA to identify and address risks relatingto BD analysis2.- Be clear about what you collect and process3.- Use de-identification techniques4.- Secure the data to avoid data breaches
Good trend and the real challenge for regulatorsPreserve BD rewards whilst seeking tominimize privacy risks