Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to big data


Published on

An introduction to big data, the cloud, the Internet of Things, predictive analytics, data science, and behaviour change

Published in: Business, Technology
  • Be the first to comment

Introduction to big data

  1. 1. Prof Richard Vidgen Hull University Business School January 2014 Big data: an introduction
  2. 2. Internet  of   things   Ubiquitous   compu4ng   Big  data   Data   management   Data   science   Be9er   decisions   Big data in context Social  media   Data  genera4on   Data  storage  and  management   The  cloud   Data  analysis   Data   visualiza4on   Data  analysis  and  presenta4on   Vidgen,  R.,  (2014).  Big  data:  an  introduc4on.  The  BigDataScience  blog.  h9p://  
  3. 3. Big data •  Big data is a general term used to describe the voluminous amount of unstructured and semi- structured data a company creates -- data that would take too much time and cost too much money to load into a relational database for analysis •  Although Big data doesn't refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data h9p://­‐data-­‐Big-­‐Data  
  4. 4. Data volumes •  1 Gigabyte = 1000 megabytes •  1 Terabyte = 1000 gigabytes •  1 Petabyte = 1000 terabytes •  1 Exabyte = 1000 petabytes •  1 Zettabyte = 1000 exabytes •  1 Yottabyte = 1000 zettabytes Big  data   The  Large  Hadron  Collider  generates  15   petabytes  of  data  p.a.   Big  is  only  big  in  a  context   it  is  not  just  about  gigabytes  –  what  counts  is  how  data  can  be  used  to  create  value  for  individuals,   organisa4ons  and  society   but  …  
  5. 5. “The  ‘big’  there  is  purely   marke4ng,”  Mr.  Reed  said.  “This   is  all  fear  …  This  is  about  you   buying  big  expensive  servers  and   whatnot.”   “The  exci4ng  thing  is  you  can  get   a  lot  of  this  stuff  done  just  in   Excel,”  he  said.  “You  don’t  need   these  big  pla`orms.  You  don’t   need  all  this  big  fancy  stuff.  If   anyone  says  ‘big’  in  front  of  it,   you  should  look  at  them  very   skep4cally  …  You  can  tell   charlatans  when  they  say  ‘big’  in   front  of  everything.”   h9p://­‐data-­‐is-­‐bunk-­‐obama-­‐campaigns-­‐tech-­‐guru-­‐tells-­‐university-­‐leaders/47885   Hype?
  6. 6. Inter-­‐ connectedness   Big data is not just a technical problem – it is part of a complex sociotechnical entanglement … Regulatory  and   legal  aspects   Technologies   Ethical   implica4ons   Stakeholders   Problems  and   “solu4ons”   Socio-­‐poli4cal-­‐ economic  factors   … with unintended consequences
  7. 7. h9p://­‐data-­‐could-­‐create-­‐dystopian-­‐future-­‐for-­‐students/2010061.ar4cle   “I  fear  that  as  we  move   into  the  big  data  age  …   this  argument  will  not   hold  much  currency  any   more.  Then  I  worry  that   the  predic4ons  will  take   over,  and  schools,   universi4es  and  colleges   will  not  take  any  risks  any   more.”     Professor  Mayer-­‐ Schönberger,  Oxford   Internet  Ins4tute    
  8. 8. Big data – what’s special about it? •  Zikopoulos et al. (2012), in an IBM publication, describe ‘Big Data’ as consisting of: –  Volume - increasing amounts of data over traditional settings. –  Velocity - information is being generated at a rate that exceeds those of traditional systems. –  Variety - multiple emerging forms of data that are of interest to enterprises, such as social media data Zikopoulos  P,  Eaton  C,  DeRoos  D,  Deutsch  T,  Lapis  G.  2012.  Understanding  Big  Data:   Analy4cs  for  Enterprise  Class  Hadoop  and  Streaming  Data.  McGraw-­‐Hill.  
  9. 9. A technical challenge •  “As data is increasingly becoming more varied, more complex and less structured, it has become imperative to process it quickly. Meeting such demanding requirements poses an enormous challenge for traditional databases and scale-up infrastructures. . . . Big Data refers to new scale-out architectures that address these needs. Big Data is fundamentally about massively distributed architectures and massively parallel processing using commodity building blocks to manage and analyze data.” EMC.  2012.  Big  data-­‐as-­‐a-­‐service:  a  market  and  technology  perspec4ve,  h9p://   white-­‐papers/h10839-­‐big-­‐data-­‐as-­‐a-­‐service-­‐perspt.pdf,  July  (accessed  January  2013).  
  10. 10. Solution - the cloud •  Cloud computing is a general term for anything that involves delivering hosted services over the Internet •  A cloud service has three distinct characteristics that differentiate it from traditional hosting: –  It is sold on demand, typically by the minute or the hour –  It is elastic -- a user can have as much or as little of a service as they want at any given time –  The service is fully managed by the provider (the consumer needs nothing but a personal computer and Internet access) •  These services are broadly divided into three categories: –  Infrastructure-as-a-Service (IaaS) –  Platform-as-a-Service (PaaS) –  Software-as-a-Service (SaaS) •  The cloud can be public or private h9p://­‐compu4ng  
  11. 11. h9p://­‐25773266   “IBM  believes  the  cloud   services  market  could  be   worth  $200bn  by   2020.Businesses  are   increasingly  leasing  data   storage,  compu4ng  power   and  web  hos4ng  services   from  a  growing  number  of   specialist  cloud  companies  -­‐   effec4vely  outsourcing  their   IT  needs  to  cut  costs  and   improve  efficiency.”  
  12. 12. Internet of Things (IoT) •  Although the concept wasn't named until 1999, the Internet of Things has been in development for decades •  The first Internet appliance was a Coke machine at Carnegie Melon University in the early 1980s. The programmers could connect to the machine over the Internet, check the status of the machine and determine whether or not there would be a cold drink awaiting them, should they decide to make the trip down to the machine h9p://­‐of-­‐Things  
  13. 13. Internet of Things (IoT) •  The Internet of Things (IoT) is a scenario in which objects, animals or people are provided with unique identifiers and the ability to automatically transfer data over a network without requiring human-to- human or human-to-computer interaction •  So far, the Internet of Things has been most closely associated with machine-to-machine (M2M) communication in manufacturing and power, oil and gas utilities. Products built with M2M communication capabilities are often referred to as being smart, (e.g., smart meter) h9p://­‐of-­‐Things  
  14. 14. Things •  A thing, in the Internet of Things, can be: –  a person with a heart monitor implant (physio sensing) –  A person with a brain scanner (neuro sensing) –  a farm animal with a biochip transponder –  an automobile that has built-in sensors to alert the driver when tire pressure is low –  … or any other natural or man-made object that can be assigned an IP address and provided with the ability to transfer data over a network h9p://­‐of-­‐Things  
  15. 15. h9p://­‐things-­‐ma9er/  
  16. 16. Mr  Cameron  said  the  UK  and   Germany  could  find   themselves  on  the  forefront   of  a  new  "industrial   revolu4on".     "I  see  the  internet  of  things  as   a  huge  transforma4ve   development  -­‐  a  way  of   boos4ng  produc4vity,  of   keeping  us  healthier,  making   transport  more  efficient,   reducing  energy  needs,   tackling  climate  change,"  he   said.   BBC  NEWS   9  March  2014  
  17. 17. Ubiquitous computing •  Ubiquitous computing is the growing trend towards embedding microprocessors in everyday objects so they can communicate information •  Ubiquitous mean "existing everywhere“ - ubiquitous computing devices are completely connected and constantly available •  Ubiquitous computing relies on the convergence of wireless technologies, advanced electronics and the Internet •  The goal of researchers working in ubiquitous computing is to create smart products that communicate unobtrusively (e.g., wearable computers, Google glass, smart meters) h9p://­‐compu4ng  
  18. 18. h9p://www.droid-­‐­‐is-­‐how-­‐google-­‐glass-­‐works-­‐infographic/  
  19. 19. Big  data   Data   science   Be9er   decisions   Analysis and outcomes Data  analysis   Data   visualiza4on   Data  analysis  and  presenta4on   Vidgen,  R.,  (2014).  Big  data:  an  introduc4on.  The  BigDataScience  blog.  h9p://  
  20. 20. Using big data h9p://­‐data-­‐sorry-­‐data-­‐science-­‐what-­‐does-­‐a-­‐data-­‐scien4st-­‐do  
  21. 21. Better decisions - predictive analytics •  A predictive model that calculates strawberry purchases based on: –  Weather forecast –  Store temperature –  Freezer sensor data –  Remaining stock per shelf life –  Sales transaction point of sale feeds –  Web searches, social mentions h9p://­‐data-­‐sorry-­‐data-­‐science-­‐what-­‐does-­‐a-­‐data-­‐scien4st-­‐do  
  22. 22. Predictive analytics •  For example, what data might help us predict which students will drop out? –  Assessment grades at University –  Prior education attainment –  Social background –  Distance of home from University –  Friendship circles and networks (e.g., sports club memberships) –  Attendance at lectures and tutorials –  Interaction in lectures and tutorials –  Time spent on campus –  Time spent in library –  Number of accesses to electronic learning resources –  Text books purchased –  Engagement in subject-related forums –  Sentiment of social media posts –  Etc.
  23. 23. h9p://­‐data-­‐sorry-­‐data-­‐science-­‐what-­‐does-­‐a-­‐data-­‐scien4st-­‐do   Who works with the big data?
  24. 24. Some of the techniques data scientists use •  Classification •  Clustering •  Association rules •  Decision trees •  Regression •  Genetic algorithms •  Neural networks and support vector machines •  Machine learning •  Natural language processing •  Sentiment analysis •  Artificial intelligence •  Time series analysis •  Simulations •  Social network analysis
  25. 25. Technologies for data analysis: usage rates King,  J.,  &  R.  Magoulas  (2013).  Data  Science  Salary  Survey.  O’Reilly  Media.   R  and  Python  programming   languages  come  above  Excel   Enterprise  products  bo9om  of  the  heap  
  26. 26. Data   visualiza4on     Correla4on   matrix  based  on   MPG,   horsepower,   engine  size,   number  of   cylinders,  weight,   etc.   h9ps://­‐a-­‐correla4on-­‐matrix-­‐in-­‐tableau-­‐using-­‐r-­‐or-­‐table-­‐calcula4ons/   (Masera4  is  like  a   Ferrari;  Lotus  is  not   like  a  Cadillac)  
  27. 27. “According  to  a  recent  Gartner   report,  64%  of  enterprises  surveyed   indicate  that  they're  deploying  or   planning  Big  Data  projects.  Yet  even   more  acknowledge  that  they  s4ll   don't  know  what  to  do  with  Big   Data.”   Gartner  On  Big  Data:   Everyone's  Doing  It,  No   One  Knows  Why   Challenges of big data h9p://­‐on-­‐big-­‐data-­‐everyones-­‐doing-­‐it-­‐no-­‐one-­‐knows-­‐why#awesm=~ost43oe8yXjDzr  
  28. 28. Big data: it's about iteration •  Start small when tackling big data •  Go open source software •  Train existing employees who know the business rather than hunt for data talent •  Iterate on your project as you learn which data sources are valuable, and which questions yield real insights •  You don't have to know the end from the beginning, but you should have a clearer view of what you hope to achieve with Big Data than the Gartner report seems to indicate most have h9p://­‐on-­‐big-­‐data-­‐everyones-­‐doing-­‐it-­‐no-­‐one-­‐knows-­‐why#awesm=~ost43oe8yXjDzr  
  29. 29. Resources McKinsey (2011). Big data: The next frontier for innovation, competition, and productivity big_data_the_next_frontier_for_innovation Sogetti. Various reports on data analytics, privacy, legal aspects, predicting behaviour The Economist (2012). Big data: Lessons from the leaders EIU_SAS_BigData_4.pdf