Prof Richard Vidgen
Hull University Business School
January 2014
Big data:
an introduction
Internet	
  of	
  
things	
  
Ubiquitous	
  
compu4ng	
  
Big	
  data	
  
Data	
  
management	
  
Data	
  
science	
  
Be9er	
  
decisions	
  
Big data in context
Social	
  media	
  
Data	
  genera4on	
  
Data	
  storage	
  and	
  management	
  
The	
  cloud	
  
Data	
  analysis	
  
Data	
  
visualiza4on	
  
Data	
  analysis	
  and	
  presenta4on	
  
Vidgen,	
  R.,	
  (2014).	
  Big	
  data:	
  an	
  introduc4on.	
  The	
  BigDataScience	
  blog.	
  h9p://datasciencebusiness.wordpress.com/	
  
Big data
•  Big data is a general term used to describe the
voluminous amount of unstructured and semi-
structured data a company creates -- data that would
take too much time and cost too much money to load
into a relational database for analysis
•  Although Big data doesn't refer to any specific
quantity, the term is often used when speaking about
petabytes and exabytes of data
h9p://searchcloudcompu4ng.techtarget.com/defini4on/big-­‐data-­‐Big-­‐Data	
  
Data volumes
•  1 Gigabyte = 1000 megabytes
•  1 Terabyte = 1000 gigabytes
•  1 Petabyte = 1000 terabytes
•  1 Exabyte = 1000 petabytes
•  1 Zettabyte = 1000 exabytes
•  1 Yottabyte = 1000 zettabytes
Big	
  data	
  
The	
  Large	
  Hadron	
  Collider	
  generates	
  15	
  
petabytes	
  of	
  data	
  p.a.	
  
Big	
  is	
  only	
  big	
  in	
  a	
  context	
  
it	
  is	
  not	
  just	
  about	
  gigabytes	
  –	
  what	
  counts	
  is	
  how	
  data	
  can	
  be	
  used	
  to	
  create	
  value	
  for	
  individuals,	
  
organisa4ons	
  and	
  society	
  
but	
  …	
  
“The	
  ‘big’	
  there	
  is	
  purely	
  
marke4ng,”	
  Mr.	
  Reed	
  said.	
  “This	
  
is	
  all	
  fear	
  …	
  This	
  is	
  about	
  you	
  
buying	
  big	
  expensive	
  servers	
  and	
  
whatnot.”	
  
“The	
  exci4ng	
  thing	
  is	
  you	
  can	
  get	
  
a	
  lot	
  of	
  this	
  stuff	
  done	
  just	
  in	
  
Excel,”	
  he	
  said.	
  “You	
  don’t	
  need	
  
these	
  big	
  pla`orms.	
  You	
  don’t	
  
need	
  all	
  this	
  big	
  fancy	
  stuff.	
  If	
  
anyone	
  says	
  ‘big’	
  in	
  front	
  of	
  it,	
  
you	
  should	
  look	
  at	
  them	
  very	
  
skep4cally	
  …	
  You	
  can	
  tell	
  
charlatans	
  when	
  they	
  say	
  ‘big’	
  in	
  
front	
  of	
  everything.”	
  
h9p://chronicle.com/blogs/wiredcampus/big-­‐data-­‐is-­‐bunk-­‐obama-­‐campaigns-­‐tech-­‐guru-­‐tells-­‐university-­‐leaders/47885	
  
Hype?
Inter-­‐
connectedness	
  
Big data is not just a technical problem – it is part of a
complex sociotechnical entanglement …
Regulatory	
  and	
  
legal	
  aspects	
  
Technologies	
  
Ethical	
  
implica4ons	
  
Stakeholders	
  
Problems	
  and	
  
“solu4ons”	
  
Socio-­‐poli4cal-­‐
economic	
  factors	
  
… with unintended consequences
h9p://www.4meshighereduca4on.co.uk/news/big-­‐data-­‐could-­‐create-­‐dystopian-­‐future-­‐for-­‐students/2010061.ar4cle	
  
“I	
  fear	
  that	
  as	
  we	
  move	
  
into	
  the	
  big	
  data	
  age	
  …	
  
this	
  argument	
  will	
  not	
  
hold	
  much	
  currency	
  any	
  
more.	
  Then	
  I	
  worry	
  that	
  
the	
  predic4ons	
  will	
  take	
  
over,	
  and	
  schools,	
  
universi4es	
  and	
  colleges	
  
will	
  not	
  take	
  any	
  risks	
  any	
  
more.”	
  
	
  
Professor	
  Mayer-­‐
Schönberger,	
  Oxford	
  
Internet	
  Ins4tute	
  
	
  
Big data – what’s special about it?
•  Zikopoulos et al. (2012), in an IBM publication,
describe ‘Big Data’ as consisting of:
–  Volume - increasing amounts of data over
traditional settings.
–  Velocity - information is being generated at a rate
that exceeds those of traditional systems.
–  Variety - multiple emerging forms of data that are
of interest to enterprises, such as social media data
Zikopoulos	
  P,	
  Eaton	
  C,	
  DeRoos	
  D,	
  Deutsch	
  T,	
  Lapis	
  G.	
  2012.	
  Understanding	
  Big	
  Data:	
  
Analy4cs	
  for	
  Enterprise	
  Class	
  Hadoop	
  and	
  Streaming	
  Data.	
  McGraw-­‐Hill.	
  
A technical challenge
•  “As data is increasingly becoming more varied, more
complex and less structured, it has become imperative
to process it quickly. Meeting such demanding
requirements poses an enormous challenge for
traditional databases and scale-up infrastructures. . . .
Big Data refers to new scale-out architectures that
address these needs. Big Data is fundamentally about
massively distributed architectures and massively
parallel processing using commodity building blocks to
manage and analyze data.”
EMC.	
  2012.	
  Big	
  data-­‐as-­‐a-­‐service:	
  a	
  market	
  and	
  technology	
  perspec4ve,	
  h9p://www.emc.com/collateral/sojware/	
  
white-­‐papers/h10839-­‐big-­‐data-­‐as-­‐a-­‐service-­‐perspt.pdf,	
  July	
  (accessed	
  January	
  2013).	
  
Solution - the cloud
•  Cloud computing is a general term for anything that involves
delivering hosted services over the Internet
•  A cloud service has three distinct characteristics that differentiate
it from traditional hosting:
–  It is sold on demand, typically by the minute or the hour
–  It is elastic -- a user can have as much or as little of a service as
they want at any given time
–  The service is fully managed by the provider (the consumer
needs nothing but a personal computer and Internet access)
•  These services are broadly divided into three categories:
–  Infrastructure-as-a-Service (IaaS)
–  Platform-as-a-Service (PaaS)
–  Software-as-a-Service (SaaS)
•  The cloud can be public or private
h9p://searchcloudcompu4ng.techtarget.com/defini4on/cloud-­‐compu4ng	
  
h9p://www.bbc.co.uk/news/business-­‐25773266	
  
“IBM	
  believes	
  the	
  cloud	
  
services	
  market	
  could	
  be	
  
worth	
  $200bn	
  by	
  
2020.Businesses	
  are	
  
increasingly	
  leasing	
  data	
  
storage,	
  compu4ng	
  power	
  
and	
  web	
  hos4ng	
  services	
  
from	
  a	
  growing	
  number	
  of	
  
specialist	
  cloud	
  companies	
  -­‐	
  
effec4vely	
  outsourcing	
  their	
  
IT	
  needs	
  to	
  cut	
  costs	
  and	
  
improve	
  efficiency.”	
  
Internet of Things (IoT)
•  Although the concept wasn't named until 1999, the
Internet of Things has been in development for
decades
•  The first Internet appliance was a Coke machine at
Carnegie Melon University in the early 1980s. The
programmers could connect to the machine over the
Internet, check the status of the machine and
determine whether or not there would be a cold drink
awaiting them, should they decide to make the trip
down to the machine
h9p://wha4s.techtarget.com/defini4on/Internet-­‐of-­‐Things	
  
Internet of Things (IoT)
•  The Internet of Things (IoT) is a scenario in which
objects, animals or people are provided with unique
identifiers and the ability to automatically transfer
data over a network without requiring human-to-
human or human-to-computer interaction
•  So far, the Internet of Things has been most closely
associated with machine-to-machine (M2M)
communication in manufacturing and power, oil and
gas utilities. Products built with M2M communication
capabilities are often referred to as being smart, (e.g.,
smart meter)
h9p://wha4s.techtarget.com/defini4on/Internet-­‐of-­‐Things	
  
Things
•  A thing, in the Internet of Things, can be:
–  a person with a heart monitor implant (physio
sensing)
–  A person with a brain scanner (neuro sensing)
–  a farm animal with a biochip transponder
–  an automobile that has built-in sensors to alert the
driver when tire pressure is low
–  … or any other natural or man-made object that can
be assigned an IP address and provided with the
ability to transfer data over a network
h9p://wha4s.techtarget.com/defini4on/Internet-­‐of-­‐Things	
  
h9p://consumertechnik.wordpress.com/2013/03/20/why-­‐things-­‐ma9er/	
  
Mr	
  Cameron	
  said	
  the	
  UK	
  and	
  
Germany	
  could	
  find	
  
themselves	
  on	
  the	
  forefront	
  
of	
  a	
  new	
  "industrial	
  
revolu4on".	
  
	
  
"I	
  see	
  the	
  internet	
  of	
  things	
  as	
  
a	
  huge	
  transforma4ve	
  
development	
  -­‐	
  a	
  way	
  of	
  
boos4ng	
  produc4vity,	
  of	
  
keeping	
  us	
  healthier,	
  making	
  
transport	
  more	
  efficient,	
  
reducing	
  energy	
  needs,	
  
tackling	
  climate	
  change,"	
  he	
  
said.	
  
BBC	
  NEWS	
  
9	
  March	
  2014	
  
Ubiquitous computing
•  Ubiquitous computing is the growing trend towards
embedding microprocessors in everyday objects so they can
communicate information
•  Ubiquitous mean "existing everywhere“ - ubiquitous
computing devices are completely connected and constantly
available
•  Ubiquitous computing relies on the convergence of wireless
technologies, advanced electronics and the Internet
•  The goal of researchers working in ubiquitous computing is
to create smart products that communicate unobtrusively
(e.g., wearable computers, Google glass, smart meters)
h9p://searchnetworking.techtarget.com/defini4on/pervasive-­‐compu4ng	
  
h9p://www.droid-­‐life.com/2013/04/09/this-­‐is-­‐how-­‐google-­‐glass-­‐works-­‐infographic/	
  
Big	
  data	
  
Data	
  
science	
  
Be9er	
  
decisions	
  
Analysis and outcomes
Data	
  analysis	
  
Data	
  
visualiza4on	
  
Data	
  analysis	
  and	
  presenta4on	
  
Vidgen,	
  R.,	
  (2014).	
  Big	
  data:	
  an	
  introduc4on.	
  The	
  BigDataScience	
  blog.	
  h9p://datasciencebusiness.wordpress.com/	
  
Using big data
h9p://www.slideshare.net/datasciencelondon/big-­‐data-­‐sorry-­‐data-­‐science-­‐what-­‐does-­‐a-­‐data-­‐scien4st-­‐do	
  
Better decisions - predictive analytics
•  A predictive model that calculates strawberry
purchases based on:
–  Weather forecast
–  Store temperature
–  Freezer sensor data
–  Remaining stock per shelf life
–  Sales transaction point of sale feeds
–  Web searches, social mentions
h9p://www.slideshare.net/datasciencelondon/big-­‐data-­‐sorry-­‐data-­‐science-­‐what-­‐does-­‐a-­‐data-­‐scien4st-­‐do	
  
Predictive analytics
•  For example, what data might help us predict which students will drop out?
–  Assessment grades at University
–  Prior education attainment
–  Social background
–  Distance of home from University
–  Friendship circles and networks (e.g., sports club memberships)
–  Attendance at lectures and tutorials
–  Interaction in lectures and tutorials
–  Time spent on campus
–  Time spent in library
–  Number of accesses to electronic learning resources
–  Text books purchased
–  Engagement in subject-related forums
–  Sentiment of social media posts
–  Etc.
h9p://www.slideshare.net/datasciencelondon/big-­‐data-­‐sorry-­‐data-­‐science-­‐what-­‐does-­‐a-­‐data-­‐scien4st-­‐do	
  
Who works with the big data?
Some of the techniques data scientists use
•  Classification
•  Clustering
•  Association rules
•  Decision trees
•  Regression
•  Genetic algorithms
•  Neural networks and
support vector
machines
•  Machine learning
•  Natural language
processing
•  Sentiment analysis
•  Artificial intelligence
•  Time series analysis
•  Simulations
•  Social network
analysis
Technologies for data analysis: usage rates
King,	
  J.,	
  &	
  R.	
  Magoulas	
  (2013).	
  Data	
  Science	
  Salary	
  Survey.	
  O’Reilly	
  Media.	
  
R	
  and	
  Python	
  programming	
  
languages	
  come	
  above	
  Excel	
  
Enterprise	
  products	
  bo9om	
  of	
  the	
  heap	
  
Data	
  
visualiza4on	
  
	
  
Correla4on	
  
matrix	
  based	
  on	
  
MPG,	
  
horsepower,	
  
engine	
  size,	
  
number	
  of	
  
cylinders,	
  weight,	
  
etc.	
  
h9ps://boraberan.wordpress.com/2013/12/09/crea4ng-­‐a-­‐correla4on-­‐matrix-­‐in-­‐tableau-­‐using-­‐r-­‐or-­‐table-­‐calcula4ons/	
  
(Masera4	
  is	
  like	
  a	
  
Ferrari;	
  Lotus	
  is	
  not	
  
like	
  a	
  Cadillac)	
  
“According	
  to	
  a	
  recent	
  Gartner	
  
report,	
  64%	
  of	
  enterprises	
  surveyed	
  
indicate	
  that	
  they're	
  deploying	
  or	
  
planning	
  Big	
  Data	
  projects.	
  Yet	
  even	
  
more	
  acknowledge	
  that	
  they	
  s4ll	
  
don't	
  know	
  what	
  to	
  do	
  with	
  Big	
  
Data.”	
  
Gartner	
  On	
  Big	
  Data:	
  
Everyone's	
  Doing	
  It,	
  No	
  
One	
  Knows	
  Why	
  
Challenges of big data
h9p://readwrite.com/2013/09/18/gartner-­‐on-­‐big-­‐data-­‐everyones-­‐doing-­‐it-­‐no-­‐one-­‐knows-­‐why#awesm=~ost43oe8yXjDzr	
  
Big data: it's about iteration
•  Start small when tackling big data
•  Go open source software
•  Train existing employees who know the business
rather than hunt for data talent
•  Iterate on your project as you learn which data sources
are valuable, and which questions yield real insights
•  You don't have to know the end from the beginning,
but you should have a clearer view of what you hope to
achieve with Big Data than the Gartner report seems to
indicate most have
h9p://readwrite.com/2013/09/18/gartner-­‐on-­‐big-­‐data-­‐everyones-­‐doing-­‐it-­‐no-­‐one-­‐knows-­‐why#awesm=~ost43oe8yXjDzr	
  
Resources
McKinsey (2011). Big data: The next frontier for innovation, competition,
and productivity
http://www.mckinsey.com/insights/business_technology/
big_data_the_next_frontier_for_innovation
Sogetti. Various reports on data analytics, privacy, legal aspects, predicting
behaviour http://vint.sogeti.com/download-big-data-reports/
The Economist (2012). Big data: Lessons from the leaders
http://www.economistinsights.com/sites/default/files/downloads/
EIU_SAS_BigData_4.pdf

Introduction to big data

  • 1.
    Prof Richard Vidgen HullUniversity Business School January 2014 Big data: an introduction
  • 2.
    Internet  of   things   Ubiquitous   compu4ng   Big  data   Data   management   Data   science   Be9er   decisions   Big data in context Social  media   Data  genera4on   Data  storage  and  management   The  cloud   Data  analysis   Data   visualiza4on   Data  analysis  and  presenta4on   Vidgen,  R.,  (2014).  Big  data:  an  introduc4on.  The  BigDataScience  blog.  h9p://datasciencebusiness.wordpress.com/  
  • 3.
    Big data •  Bigdata is a general term used to describe the voluminous amount of unstructured and semi- structured data a company creates -- data that would take too much time and cost too much money to load into a relational database for analysis •  Although Big data doesn't refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data h9p://searchcloudcompu4ng.techtarget.com/defini4on/big-­‐data-­‐Big-­‐Data  
  • 4.
    Data volumes •  1Gigabyte = 1000 megabytes •  1 Terabyte = 1000 gigabytes •  1 Petabyte = 1000 terabytes •  1 Exabyte = 1000 petabytes •  1 Zettabyte = 1000 exabytes •  1 Yottabyte = 1000 zettabytes Big  data   The  Large  Hadron  Collider  generates  15   petabytes  of  data  p.a.   Big  is  only  big  in  a  context   it  is  not  just  about  gigabytes  –  what  counts  is  how  data  can  be  used  to  create  value  for  individuals,   organisa4ons  and  society   but  …  
  • 5.
    “The  ‘big’  there  is  purely   marke4ng,”  Mr.  Reed  said.  “This   is  all  fear  …  This  is  about  you   buying  big  expensive  servers  and   whatnot.”   “The  exci4ng  thing  is  you  can  get   a  lot  of  this  stuff  done  just  in   Excel,”  he  said.  “You  don’t  need   these  big  pla`orms.  You  don’t   need  all  this  big  fancy  stuff.  If   anyone  says  ‘big’  in  front  of  it,   you  should  look  at  them  very   skep4cally  …  You  can  tell   charlatans  when  they  say  ‘big’  in   front  of  everything.”   h9p://chronicle.com/blogs/wiredcampus/big-­‐data-­‐is-­‐bunk-­‐obama-­‐campaigns-­‐tech-­‐guru-­‐tells-­‐university-­‐leaders/47885   Hype?
  • 6.
    Inter-­‐ connectedness   Big datais not just a technical problem – it is part of a complex sociotechnical entanglement … Regulatory  and   legal  aspects   Technologies   Ethical   implica4ons   Stakeholders   Problems  and   “solu4ons”   Socio-­‐poli4cal-­‐ economic  factors   … with unintended consequences
  • 7.
    h9p://www.4meshighereduca4on.co.uk/news/big-­‐data-­‐could-­‐create-­‐dystopian-­‐future-­‐for-­‐students/2010061.ar4cle   “I  fear  that  as  we  move   into  the  big  data  age  …   this  argument  will  not   hold  much  currency  any   more.  Then  I  worry  that   the  predic4ons  will  take   over,  and  schools,   universi4es  and  colleges   will  not  take  any  risks  any   more.”     Professor  Mayer-­‐ Schönberger,  Oxford   Internet  Ins4tute    
  • 8.
    Big data –what’s special about it? •  Zikopoulos et al. (2012), in an IBM publication, describe ‘Big Data’ as consisting of: –  Volume - increasing amounts of data over traditional settings. –  Velocity - information is being generated at a rate that exceeds those of traditional systems. –  Variety - multiple emerging forms of data that are of interest to enterprises, such as social media data Zikopoulos  P,  Eaton  C,  DeRoos  D,  Deutsch  T,  Lapis  G.  2012.  Understanding  Big  Data:   Analy4cs  for  Enterprise  Class  Hadoop  and  Streaming  Data.  McGraw-­‐Hill.  
  • 9.
    A technical challenge • “As data is increasingly becoming more varied, more complex and less structured, it has become imperative to process it quickly. Meeting such demanding requirements poses an enormous challenge for traditional databases and scale-up infrastructures. . . . Big Data refers to new scale-out architectures that address these needs. Big Data is fundamentally about massively distributed architectures and massively parallel processing using commodity building blocks to manage and analyze data.” EMC.  2012.  Big  data-­‐as-­‐a-­‐service:  a  market  and  technology  perspec4ve,  h9p://www.emc.com/collateral/sojware/   white-­‐papers/h10839-­‐big-­‐data-­‐as-­‐a-­‐service-­‐perspt.pdf,  July  (accessed  January  2013).  
  • 10.
    Solution - thecloud •  Cloud computing is a general term for anything that involves delivering hosted services over the Internet •  A cloud service has three distinct characteristics that differentiate it from traditional hosting: –  It is sold on demand, typically by the minute or the hour –  It is elastic -- a user can have as much or as little of a service as they want at any given time –  The service is fully managed by the provider (the consumer needs nothing but a personal computer and Internet access) •  These services are broadly divided into three categories: –  Infrastructure-as-a-Service (IaaS) –  Platform-as-a-Service (PaaS) –  Software-as-a-Service (SaaS) •  The cloud can be public or private h9p://searchcloudcompu4ng.techtarget.com/defini4on/cloud-­‐compu4ng  
  • 11.
    h9p://www.bbc.co.uk/news/business-­‐25773266   “IBM  believes  the  cloud   services  market  could  be   worth  $200bn  by   2020.Businesses  are   increasingly  leasing  data   storage,  compu4ng  power   and  web  hos4ng  services   from  a  growing  number  of   specialist  cloud  companies  -­‐   effec4vely  outsourcing  their   IT  needs  to  cut  costs  and   improve  efficiency.”  
  • 12.
    Internet of Things(IoT) •  Although the concept wasn't named until 1999, the Internet of Things has been in development for decades •  The first Internet appliance was a Coke machine at Carnegie Melon University in the early 1980s. The programmers could connect to the machine over the Internet, check the status of the machine and determine whether or not there would be a cold drink awaiting them, should they decide to make the trip down to the machine h9p://wha4s.techtarget.com/defini4on/Internet-­‐of-­‐Things  
  • 13.
    Internet of Things(IoT) •  The Internet of Things (IoT) is a scenario in which objects, animals or people are provided with unique identifiers and the ability to automatically transfer data over a network without requiring human-to- human or human-to-computer interaction •  So far, the Internet of Things has been most closely associated with machine-to-machine (M2M) communication in manufacturing and power, oil and gas utilities. Products built with M2M communication capabilities are often referred to as being smart, (e.g., smart meter) h9p://wha4s.techtarget.com/defini4on/Internet-­‐of-­‐Things  
  • 14.
    Things •  A thing,in the Internet of Things, can be: –  a person with a heart monitor implant (physio sensing) –  A person with a brain scanner (neuro sensing) –  a farm animal with a biochip transponder –  an automobile that has built-in sensors to alert the driver when tire pressure is low –  … or any other natural or man-made object that can be assigned an IP address and provided with the ability to transfer data over a network h9p://wha4s.techtarget.com/defini4on/Internet-­‐of-­‐Things  
  • 15.
  • 16.
    Mr  Cameron  said  the  UK  and   Germany  could  find   themselves  on  the  forefront   of  a  new  "industrial   revolu4on".     "I  see  the  internet  of  things  as   a  huge  transforma4ve   development  -­‐  a  way  of   boos4ng  produc4vity,  of   keeping  us  healthier,  making   transport  more  efficient,   reducing  energy  needs,   tackling  climate  change,"  he   said.   BBC  NEWS   9  March  2014  
  • 17.
    Ubiquitous computing •  Ubiquitouscomputing is the growing trend towards embedding microprocessors in everyday objects so they can communicate information •  Ubiquitous mean "existing everywhere“ - ubiquitous computing devices are completely connected and constantly available •  Ubiquitous computing relies on the convergence of wireless technologies, advanced electronics and the Internet •  The goal of researchers working in ubiquitous computing is to create smart products that communicate unobtrusively (e.g., wearable computers, Google glass, smart meters) h9p://searchnetworking.techtarget.com/defini4on/pervasive-­‐compu4ng  
  • 18.
  • 19.
    Big  data   Data   science   Be9er   decisions   Analysis and outcomes Data  analysis   Data   visualiza4on   Data  analysis  and  presenta4on   Vidgen,  R.,  (2014).  Big  data:  an  introduc4on.  The  BigDataScience  blog.  h9p://datasciencebusiness.wordpress.com/  
  • 20.
  • 21.
    Better decisions -predictive analytics •  A predictive model that calculates strawberry purchases based on: –  Weather forecast –  Store temperature –  Freezer sensor data –  Remaining stock per shelf life –  Sales transaction point of sale feeds –  Web searches, social mentions h9p://www.slideshare.net/datasciencelondon/big-­‐data-­‐sorry-­‐data-­‐science-­‐what-­‐does-­‐a-­‐data-­‐scien4st-­‐do  
  • 22.
    Predictive analytics •  Forexample, what data might help us predict which students will drop out? –  Assessment grades at University –  Prior education attainment –  Social background –  Distance of home from University –  Friendship circles and networks (e.g., sports club memberships) –  Attendance at lectures and tutorials –  Interaction in lectures and tutorials –  Time spent on campus –  Time spent in library –  Number of accesses to electronic learning resources –  Text books purchased –  Engagement in subject-related forums –  Sentiment of social media posts –  Etc.
  • 23.
  • 24.
    Some of thetechniques data scientists use •  Classification •  Clustering •  Association rules •  Decision trees •  Regression •  Genetic algorithms •  Neural networks and support vector machines •  Machine learning •  Natural language processing •  Sentiment analysis •  Artificial intelligence •  Time series analysis •  Simulations •  Social network analysis
  • 25.
    Technologies for dataanalysis: usage rates King,  J.,  &  R.  Magoulas  (2013).  Data  Science  Salary  Survey.  O’Reilly  Media.   R  and  Python  programming   languages  come  above  Excel   Enterprise  products  bo9om  of  the  heap  
  • 26.
    Data   visualiza4on     Correla4on   matrix  based  on   MPG,   horsepower,   engine  size,   number  of   cylinders,  weight,   etc.   h9ps://boraberan.wordpress.com/2013/12/09/crea4ng-­‐a-­‐correla4on-­‐matrix-­‐in-­‐tableau-­‐using-­‐r-­‐or-­‐table-­‐calcula4ons/   (Masera4  is  like  a   Ferrari;  Lotus  is  not   like  a  Cadillac)  
  • 27.
    “According  to  a  recent  Gartner   report,  64%  of  enterprises  surveyed   indicate  that  they're  deploying  or   planning  Big  Data  projects.  Yet  even   more  acknowledge  that  they  s4ll   don't  know  what  to  do  with  Big   Data.”   Gartner  On  Big  Data:   Everyone's  Doing  It,  No   One  Knows  Why   Challenges of big data h9p://readwrite.com/2013/09/18/gartner-­‐on-­‐big-­‐data-­‐everyones-­‐doing-­‐it-­‐no-­‐one-­‐knows-­‐why#awesm=~ost43oe8yXjDzr  
  • 28.
    Big data: it'sabout iteration •  Start small when tackling big data •  Go open source software •  Train existing employees who know the business rather than hunt for data talent •  Iterate on your project as you learn which data sources are valuable, and which questions yield real insights •  You don't have to know the end from the beginning, but you should have a clearer view of what you hope to achieve with Big Data than the Gartner report seems to indicate most have h9p://readwrite.com/2013/09/18/gartner-­‐on-­‐big-­‐data-­‐everyones-­‐doing-­‐it-­‐no-­‐one-­‐knows-­‐why#awesm=~ost43oe8yXjDzr  
  • 29.
    Resources McKinsey (2011). Bigdata: The next frontier for innovation, competition, and productivity http://www.mckinsey.com/insights/business_technology/ big_data_the_next_frontier_for_innovation Sogetti. Various reports on data analytics, privacy, legal aspects, predicting behaviour http://vint.sogeti.com/download-big-data-reports/ The Economist (2012). Big data: Lessons from the leaders http://www.economistinsights.com/sites/default/files/downloads/ EIU_SAS_BigData_4.pdf