Big Data Paper

764 views

Published on


Convergence Partners has released its latest research report on big data and its meaning for Africa. The report argues that big data poses a threat to those it overlooks, namely a large percentage of Africa’s populace, who remain on big data’s periphery.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
764
On SlideShare
0
From Embeds
0
Number of Embeds
85
Actions
Shares
0
Downloads
22
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Big Data Paper

  1. 1. 17 December 2013 1
  2. 2. 17 December 2013 ‘Data is the new oil; like oil, it must be refined before it can be used.’ Summary Of concern to us in the developing world is that the current ecosystem around big data creates a new kind of digital divide: the big data rich (developed world) and the big data poor (developing world). This report argues that big data poses a threat to those it overlooks, namely a large percentage of Africa’s populace, who remain on big data’s periphery. As most Africans use feature phones, and not smartphones ‘they do not regularly contribute data to be analysed, as they do not routinely engage in activities that big data is designed to capture’1. Additionally, the report discusses the political economy of big data, its implications on policymaking, warns against a scramble for Africa’s data and outlines opportunities for the Continent to fully exploit the advent of big data. It is argued that that there is a requirement for the active involvement of policymakers, business and civil society to ensure that Africa leverages the benefits, and addresses the potential pitfalls that the big data phenomenon may create. Background The phenomenal adoption of mobile phones on the African continent over the last twenty years2, in tandem with the proliferation of connected devices and fledgling ‘Internet of things’ has heralded the arrival of the big data era on the Continent. Africans are increasingly emitting and creating digital information with their mobile phones, Internet use and various forms of digital transactions. Globally, ‘computing has become ubiquitous, creating countless new digital puddles and oceans of information’3. Google states that ‘the first five exabytes of information created were between the dawn of civilisation and 2003, whereas that much information is now created every two days, with the pace increasing.’4 Every animate and inanimate 1 2 3 4 Lerman J. (2013) Big Data and its exclusions. (Stanford Law Review: Stanford) ITU estimates 63.5% mobile penetration in 2013 Bollier D. (2010) The Promise and Peril of big Data. (The Aspen Institute: Maryland) Ibid 2
  3. 3. 17 December 2013 object on earth will soon be generating data, and Cisco forecasts that thirty-seven billion intelligent devices will connect to the Internet by 2020.5 These devices and sensors drive exponentially growing data traffic, which in 2012 was almost twelve times larger than all global Internet traffic in 2000. This wealth of new data, in turn, accelerates advances in computing – creating a virtuous cycle of big data. Analytics is now more accessible, owing to both the precipitous drop in the price of storage technologies and processing bandwidth6. Cluster computing systems provide the storage capacity, computing power and highspeed local area networks to handle these large data sets. In conjunction with ‘new forms of computation combining statistical analysis, optimisation and artificial intelligence’ 7 , researchers are able to construct statistical models from large collections of data to infer how the system should respond to new data. Notwithstanding these developments, the analytics of big data is still in its infancy globally, and more so in emerging economies. 5 6 7 http://www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf Russom P. (2011) Big Data Analytics. (TDWI research: Washington) Bollier D. (2010) The Promise and Peril of big Data. (The Aspen Institute: Maryland) 3
  4. 4. 17 December 2013 Big data is, in many ways, a poor term. Traditionally, it has been understood using three characteristics, namely: volume, variety and velocity. Source: TDWI research Big data is thus conceived of as a ‘massive volume of both structured and unstructured data, generated internally by and externally to organisations, that is so large that it's difficult to process with traditional database and software techniques’8. Though there is little doubt that the quantities of data now available are often quite large, this is not the defining characteristic of this new data ecosystem. ‘Big data is less about data that is big, than it is about a capacity to search, aggregate and crossreference large data sets.’9 Karen Levy argues that ‘data is big not because of the number of points that comprise a particular dataset, nor the statistical methods used 8 9 Lerman J. (2013) Big Data and its exclusions. (Stanford Law Review: Stanford) Boyd D., and Crawford K. (2013) Critical questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. (Information, Communication and Society Journal) 4
  5. 5. 17 December 2013 to analyse them, nor the computational power on which such analysis relies. Instead, data is big because of the depth to which it has come to pervade our personal connections to one another’10. Big data is thus better defined as a socio-technical phenomenon that rests on the interplay of: • ‘Technology: maximising computation power and algorithmic accuracy to gather, analyse, link, and compare large data sets; • Analysis: drawing on large data sets to identify patterns in order to make economic, social, technical, and legal claims; and • Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy’11 Big Data and its Discontents Since the turn of the century, and particularly since the advent of social media, consumers have volunteered volumes of personal data. Unstructured data, which constitutes 80% of all data, describes information formatted as natural language rather than numerical figures. Unstructured data encompasses everything from social media interactions, to recordings, to emails and more. As previously stated, the proliferation of smartphones, tablets and other devices has exponentially accelerated data creation to the extent that it is now estimated that the rate at which data is generated and captured is doubling every 90 days. Though the promise of big data lies within the ability to make predictions based on it, it is imperative that a cautionary note be sounded to data evangelists who have a utopian view of the promise of big data. Though admittedly more useful than traditional statistics, big data is not a panacea as there are questions around the reliability, accuracy and representativeness of its data sets. ‘Technology is neither good nor bad; nor is it neutral. Technology’s interaction with the social ecology is 10 Levy K. (2013) Relational Big Data. (Stanford Law Review: Stanford) 11 Boyd D., and Crawford K. (2013) Critical questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. (Information, Communication and Society Journal) 5
  6. 6. 17 December 2013 such that technical developments frequently have environmental, social and human consequences that go far beyond the immediate purposes of the technical devices and practices themselves’12. Like other socio-technical phenomena, big data triggers both utopian and dystopian rhetoric. ‘On one hand, big data is seen as a powerful tool to address various societal ills, offering the potential of new insights into areas as diverse as medical research and climate change. On the other, big data is seen as a troubling manifestation of big brother enabling invasions of privacy, decreasing civil liberties and increasing state control’13. Of particular concern to us in the developing world is that the current ecosystem around big data creates a new kind of digital divide: the big data rich (developed world) and the big data poor (developing world). This report argues that big data poses a threat to those it overlooks, namely a large percentage of Africa’s populace, who remain on big data’s periphery. As most Africans use feature phones, and not smartphones ‘they do no regularly contribute data to be analysed, as they do not routinely engage in activities that big data is designed to capture’14. Consequently, their preferences and needs risk being ignored when governments use big data and advanced analytics to shape public policy. The danger is that as we increasingly rely on big data’s numbers to speak for themselves, we risk misunderstanding the results and in turn misallocating important public resources15. Thus, with every big data set, we need to ask which people and data sets are excluded. Many African data sets exhibit this ‘signal problem’ where data are assumed to accurately reflect the social world, whereas there are significant gaps with little or no signal coming from particular communities. In a future where big data, and the predictions it makes possible, will fundamentally reorder government and the marketplace, ‘the exclusion of poor and otherwise marginalised people from datasets has troubling implications for economic opportunity, social mobility and democratic participation’16. These technologies may create a new kind of voicelessness, where certain groups’ preferences and behaviours receive little or no consideration when political elites decide how to 12 Ibid Ibid 14 Lerman J. (2013) Big Data and its exclusions. (Stanford Law Review: Stanford) 13 15 16 http://blogs.hbr.org/2013/04/the-hidden-biases-in-big-data/ Lerman J. (2013) Big Data and its exclusions. (Stanford Law Review: Stanford) 6
  7. 7. 17 December 2013 distribute goods and services, and how to reform public and private institutions. Of course, the poor (and most Africans by extension) are in many ways already marginalised, but big data could reinforce and exacerbate existing problems. Moreover, the use, abuse and misuse of data are a troubling lesson about the limitations of information as the world hurtles toward the big data era. The underlying data in most African countries are of poor quality, unrepresentative and can be biased meaning it is more likely they will be misanalysed and used misleadingly. Even more damning is that data can fail to capture what it purports to quantify. As big data is largely in the languages of the developed world, it further isolates African language content. However, an opportunity exists for the creation of these African language specific data sets by Africans, whether by converting existing large amounts of analogue African data (through crowd sourcing the digitisation process), or uploading the extensive video content that resides in African broadcasters’ archives. Big data holds substantial potential for the future, and large dataset analysis has important uses. However, the promise of big data is, and will be, best fulfilled when its limitations, biases and features are adequately understood and taken into account when interpreting the data 17 . As is evident, big data is the source of both tremendous promise and disquieting surveillance. In reality, like any complex social phenomenon, big data is both of these, a set of heterogeneous resources and practices deployed in multiple ends toward diverse ends. The Political Economy of Big Data As articulated above, big data has the potential to ‘solidify existing inequalities and stratifications, and to create new ones’18. It could restructure societies so that the only people who matter – quite literally the ones who count – are those who regularly contribute to the right data flows. Manovich has argued that there are three classes of people in the realm of big data, namely: ‘those who create data (both consciously and by leaving digital footprints), those who have the means to collect it, and those 17 United Nations. (2012) Big Data for Development: Challenges and Opportunities. (United Nations Global Pulse: New York) 18 Ibid 7
  8. 8. 17 December 2013 who have the expertise to analyse it. This last group is the smallest, and most privileged as they are the ones that get to determine the rules about how big data will be used and who gets to participate’ 19 . However, in the African context it is necessary to ask questions about what all this data means, who gets to access it, how data analysis is deployed, and to what ends. It is worth noting that there is a scarcity of data analysts on the Continent, which then begs the question of who will determine the African agenda, asking relevant questions and ensuring inclusivity in the research undertaken. It is imperative that big data on the Continent do away with the ‘politics of the missing’ to render visible the poor and marginalised in developing countries. However, Africa’s paucity of reliable communications infrastructure poses a significant challenge for the application of big data, as the network backbone required for big data systems is sorely lacking. Key constraints are that current network deployments do not have sufficient reach into the populace, are of poor quality, overpriced and a low capacity. It is imperative that these factors be addressed, as they are vital for a thriving big data ecosystem. The data emanating from mobile phones holds particular promise, in part because for many low-income people it is their only form of interactive technology. Utilising this data created by mobile phones can improve our understanding of vulnerable populations, and quicken governments’ response to the emergence of new trends20. Big Data and Development Though big data and real-time analytics are no modern panacea for age-old development challenges, ‘the diffusion of data science into the realm of development constitutes a genuine opportunity to bring powerful new tools to the fight against poverty, hunger and disease’21. To this end, the United Nations launched Global Pulse in 2009 ‘to leverage innovations in digital data, rapid data collection and analysis to help decision makers gain a real-time understanding of how crises impact 19 Boyd D., and Crawford K. (2013) Critical questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. (Information, Communication and Society Journal) 20 World Economic Forum. (2012) Big Data, Big Impact: New Possibilities for International Development. (World Economic Forum: Geneva) 21 Ibid 8
  9. 9. 17 December 2013 vulnerable populations 22 .’ Big data for development is about turning imperfect, complex, often unstructured data into actionable information. This implies using advanced computational tools, such as machine learning, which have developed in other fields, to reveal trends and correlations within and across large datasets that would otherwise remain undiscovered. Additionally, the GSMA has developed a ‘Mobile for Development Intelligence’ with the aim of persuading mobile operators to share data with researchers and development organisations. Its mission statement reads that ‘open access to high quality data will improve decision making, increase total investment from the commercial mobile industry and development sector, and accelerate economic, environmental and social impact from mobile solutions23.’ The data philanthropy discussed above, which entails corporations anonymising their data and providing it to development organisations to mine for insights, patterns and trends in (or near) real-time is still in its infancy. Data philanthropy is a laudable advancement as it seeks to minimise Africa’s information asymmetries through the creation of data commons, which are a critical input of big data for development. This data can be conceived of as a public good, as it is both non-rivalrous and nonexcludable, ensuring that one’s use of the data does not restrict its availability to others. As such, the benefits of creating and maintaining a data commons are that the information benefits society as a whole, while protecting individual security. A more concerted effort is required to make open data commons a reality, and success. 22 Ibid 23 United Nations. (2012) Big Data for Development: Challenges and Opportunities. (United Nations Global Pulse: New York) 9
  10. 10. 17 December 2013 Source: United Nations Global Pulse However, though data may be public (or semi-public) this does not simplistically equate with full permission being given for all uses24. Big data researchers rarely acknowledge that there is a ‘considerable difference between being in public (eg. sitting in a park) and being public (eg. actively courting attention)’. The ethical and policy implications of big data will be addressed below. 24 Boyd D., and Crawford K. (2013) Critical questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. (Information, Communication and Society Journal) 10
  11. 11. 17 December 2013 Policy Implications of Big Data The advent of big data presents significant opportunities and challenges for Africa’s information and communications technologies (ICT) policy making. Big data is, at its core, a social phenomenon – though the dominant narrative reduces people to mere data points to be acted upon. ‘Big data and its attendant practices aren’t monoliths; rather diverse and socially contingent, a fact which any policy analysis of big data phenomena must consider’ 25 . As lines between the physical and digital world continue to blur, and as big data and advanced analytics increasingly ‘shape governmental decision-making about the allocation of resources, equality and privacy principles will grow increasingly intertwined’ 26 . Moreover, exclusion or underrepresentation in government datasets, then could mean losing out on important government services and public goods. Policymakers thus need to be aware of the possibility that the big data revolution may create new forms of inequality and subordination, which raise broad democracy concerns. As such, ensuring that the big data revolution is a joint revolution, ‘one whose benefits are broadly and equitably shared, may also require, paradoxically, a right not to be forgotten – a right against exclusion’27. A data antisubordination policy28 would ensure this. This antisubordination policy would, at a minimum ‘provide those who live outside or on the margins of data flows some guarantee that their status as persons with ‘light data footprints’ will not subject them to unequal treatment by the state in the allocation of public goods and services’ 29 . This mooted data antisubordination policy would also ensure that public institutions be required to mitigate the disparate impact that their use of big data may have on persons who live outside or on the margins of government datasets. Similarly, public servants relying on big data for policymaking and other core democratic functions should be compelled to take steps to ensure that big data’s marginalised groups continue to have a voice in democratic processes. 25 Levy K. (2013) Relational Big Data. (Stanford Law Review: Stanford) 26 Lerman J. (2013) Big Data and its exclusions. (Stanford Law Review: Stanford) 27 Ibid 28 Ibid 29 Ibid 11
  12. 12. 17 December 2013 In the field of public policy, ‘it is the predictive power of big data analytics that understandably attracts the most attention as insights on human behaviour can be gleaned from these data’30. The increase in the availability of data has occurred relatively fast, and as such is not yet balanced by the emergence of privacy legislation or ethical frameworks that can mitigate potentially damaging uses of the data. As the big data pools are predominantly in the hands of powerful intermediary institutions, not ordinary people, they may thus be misused and abused. If policymakers do not insist on ‘building privacy, transparency, autonomy and other protections into big data related activities from the outset, this will diminish big data’s lofty ambitions’31. There is a need for a healthier balance of power between those who generate the data, and those who make inferences and decisions based on it. ‘African countries represent a strong testing ground for data protections as the power imbalance between the producers and users (mainly large multinational corporations) of personal data there, is one of the largest anywhere’32. It is highly likely that individuals, and even governments, may lack the information, resources or access to make corporations or countries accountable when they breach data protection guidelines. It is evident that increasingly powerful and secretive algorithms, such as PRISM, combined with numerous other massive datasets pose a significant risk to personal privacy and civil liberties, especially in the African context. In the era of big data, policy should include protections lest these worrisome orthodoxies crystalise. 30 United Nations. (2012) Big Data for Development: Challenges and Opportunities. (United Nations Global Pulse: New York) 31 http://blogs.oii.ox.ac.uk/policy/the-scramble-for-africas-data Ibid 32 12
  13. 13. 17 December 2013 The Scramble for Africa’s Data After the last decade’s exponential rise in ICT use, Africa is fast becoming a source of big data as Africans increasingly ‘emit digital information with their mobile phone calls, internet use and various forms of digitised transactions’33. ‘The emergence of big data in Africa has the potential to make the continent’s citizens a rich mine of information, with the default mode being for this to happen without their consent or involvement, and without ethical and normative frameworks to ensure data protection or to weigh the risks against the benefits’34. It is increasingly likely that there will be a new scramble for Africa: a digital resource grab, and African countries need to be fully cognisant of this, and circumspect in their approach and monitoring thereof. Opportunities for Africa Notwithstanding the severe lack of qualified people on the Continent to exploit the attendant benefits of big data, significant opportunities exist. ‘In light of the serious problems with both illiteracy and information access in the developing world, especially Africa, there is a widespread belief that speech technology can play a significant role in improving the quality of life of developing-world citizens’35. It is oft said that African societies rely on oral traditions to transfer knowledge and culture inter-generationally, and the developing field of phonetic search, machine learning and natural language processing coupled with big data portend well for Africa’s ability to fully harness and leverage analytic power. There is a great number of languages on the Continent, over 2000, with the development of ‘voice-search systems being a useful tool in delivering on the original promise’ of big data analytics. Though ‘speech technology has to date played a much smaller role in the developing world, the rapid spread of telephone networks through the developing world’36, leads to optimism that this situation will change significantly in years to 33 34 35 Ibid Ibid Barnard E., Moreno P., Schalkwyk J., and van Heerden C. (2010) Voice Search for Development. (Human Language Technologies Research Group: Pretoria) 36 Ibid 13
  14. 14. 17 December 2013 come. Recently, ‘a novel application of speech technology, namely the use of speech recognition to perform searches through Web content and personal information’37, has become increasingly popular in the developed world and it is this paper’s contention that this could be duplicated and appropriated for the African continent. However, though ‘voice search lends itself to efficient and low-cost data collection (thereby addressing resource constraints)’38, ‘digital content that is relevant to people of the developing world is generally scarce and distributed across numerous sources without any form of integration’39. African universities and the graduates produced (mainly computer scientists and linguists) will thus have to redouble their efforts to ensure that ‘voice search makes web-based content available regardless of the original source of the data, which will go some way towards solving issues of content availability’40. Africa has already demonstrated its excellent science and engineering skills by designing and starting to build the 64-dish MeerKAT telescope – as a pathfinder to the Square Kilometre Array (SKA) – in South Africa. ‘The technology being developed is cutting-edge and the project is creating a large group of young scientists and engineers with world-class expertise in technologies that will be crucial for development’41. Though many of today’s data scientists are formally trained in computer science, maths or economics, they can emerge from any field with a strong data and computational focus. Hal Varian argues that ‘the ability to take data understand it, process it, extract value from it, visualise it, and communicate it, is going to be a hugely important skill in coming decades’42. Ben Fry takes this a step further and argues for an entirely new field that combines the skills from often disjointed areas of expertise in the analytics of big data. 37 Ibid Ibid 39 Ibid 38 40 41 Ibid Botman H. (2013) The role of universities in the development of Africa. (Paper presented to the Swiss Federal Institute of Technology) 42 http://www.mckinsey.com/insights/innovation/hal_varian_on_how_the_web_challenges_managers 14
  15. 15. 17 December 2013 He argues that fields such as statistics, data mining, graphic design, and information visualisation each offer meaning to and can find patterns to data, but practitioners of each are often unaware of, or unskilled in, the methods of the adjacent fields required for a solution. As such, to fully exploit opportunities that stem from big data, African universities need to reorientate themselves and their curricula to ensure that all their graduates are ‘data literate’ (meaning competent in finding, manipulating, managing, and interpreting data), as well as being adept at mathematical and hypothetical deductive reasoning. Conclusion The increased analytics and predictive power associated with big data conjure utopian and dystopian scenarios. This paper argues that the advancement of big data and the Internet of things, though a significant milestone in the development of social science and the Internet, is not an end in itself. Having highlighted the gains of big data analytics and its ability to transform society, the paper warns against a ‘dictatorship of data’ wherein data governs us in ways that may do as much harm as good. The cautionary note sounded cited ‘political and social equality considerations where the vulnerable are likely to be further relegated to an inferior status’43, as well as the policy implications of big data and a likely scramble for Africa’s data as reasons to be circumspect of the promise of big data. ‘Technology is neither good nor bad; nor is it neutral’44. 43 Lerman J. (2013) Big Data and its exclusions. (Stanford Law Review: Stanford) Boyd D., and Crawford K. (2013) Critical questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. (Information, Communication and Society Journal) 44 15
  16. 16. 17 December 2013 Disclosure Section The information and opinions in this report were prepared by Convergence Partners Management (Proprietary) Limited (“Convergence Partners”). Unless otherwise stated, the individuals listed on the cover page of this report were responsible for drafting and compiling this report. This report is provided for information purposes only and no reliance should be placed on the contents hereof. This report does not constitute investment advice or any other advice, nor is it an offer to buy or sell or the solicitation of an offer to buy or sell any securities. If any reader wishes to place any reliance on any information furnished in this report, he or she does so entirely at his or her own risk and Convergence Partners hereby disclaims any liability for any such reliance placed. Readers are strongly urged to do their own independent analysis of any facts stated in this report. Significant reliance has been placed on information that is either publicly available or that was obtained from third party sources in the preparation of this report. Wherever practically possible, Convergence Partners has quoted the sources of such information. Whilst, Convergence Partners has sought to utilise third party information obtained from reliable sources, no warranty as to the accurateness or completeness of this information is provided and, accordingly, no warranty as to the accuracy or completeness of this report in its entirety should be inferred. Third party information providers make no warranties or representations of any kind relating to the accuracy, completeness or timeliness of the information they provide and shall not have liability for any damages of any kind relating to such information. The directors of Convergence Partners have not reviewed this report. Convergence Partners may, from time to time, be invested in, or have other business dealings with, companies that are listed in this report. This report or any portion hereof may not be reprinted, sold or redistributed without the written consent of Convergence Partners. © 2013 Convergence Partners 16
  17. 17. 17 December 2013 REFERENCES Barnard E., Moreno P., Schalkwyk J., and van Heerden C. (2010) Voice Search for Development. (Human Language Technologies Research Group: Pretoria) Bollier D. (2010) The Promise and Peril of big Data. (The Aspen Institute: Maryland) Botman H. (2013) The role of universities in the development of Africa. (Paper presented to the Swiss Federal Institute of Technology) Boyd D., and Crawford K. (2013) Critical questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. (Information, Communication and Society Journal) Clarke R., and Wigan M. (2013) Big Data’s unintended consequences. (IEEE: Washington) Crawford K. (2013) The Hidden Biases in Big Data. (Harvard Business Review Blog Network) Retrieved from http://blogs.hbr.org/2013/04/the-hidden-biases-in-big-data/ Cukier K., and Mayer-Schonberger V. (2013) The Dictatorship of Data. (MIT Technology Review) Retrieved from http://www.technologyreview.com/news/514591/the-dictatorship-of-data/ Einav L., and Levin J. (2013) The Data Revolution and Economic Analysis. (Working Paper 19035) Retrieved from http://www.nber.org/papers/w19035 Hartzog W., and Selinger E. Big Data in Small Hands. (Stanford Law Review: Stanford) King J., and Richards N. (2013) Three paradoxes of Big Data. (Stanford Law Review: Stanford) Lerman J. (2013) Big Data and its exclusions. (Stanford Law Review: Stanford) Levy K. (2013) Relational Big Data. (Stanford Law Review: Stanford) Michael K., and Miller K. (2013) Big Data: New Opportunities and New Challenges. (IEEE: Washington) Pietsch W. (2013) Big Data – The New Science of Complexity. (Munich Center for Technology in Society: Munich) Polonetsky J., and Tene O. (2013) Privacy and Big Data: making ends meet. (Stanford Law Review: Stanford) Russom P. (2011) Big Data Analytics. (TDWI research: Washington) Taylor L. (2013) The Scramble for Africa’s Data: Resource Grab or Developmental opportunity. (Oxford Internet Institute: Oxford) United Nations. (2012) Big Data for Development: Challenges and Opportunities. (United Nations Global Pulse: New York) World Economic Forum. (2012) Big Data, Big Impact: New Possibilities for International Development. (World Economic Forum: Geneva) 17

×