An Introduction To
           Social Network Data
                      David M Walker
               Data Management & Warehousing
                         May 2012




May 2012
                           S1           © 2012 Data Management & Warehousing
Hi, I’m on Facebook!


   S  I’m one of 900 Million people as of May 2012 that has a
           Facebook account

   S  That’s more than 1 in 8 of every man, woman and child on the
           planet (and the 6 crew of the International Space Station)
           regardless of age, race, religion, location, sexuality, etc.

   S  I’ve also completed my profile – it helps my family & friends find
           and communicate with me
           S  It even reminds people to wish me ‘Happy Birthday’


May 2012
                                          S
                                           2              © 2012 Data Management & Warehousing
My Profile Page




May 2012
                 S
                  3    © 2012 Data Management & Warehousing
But what am I sharing ?


   S  Depending on my privacy settings I will be sharing anything
           from ‘some data’ to ‘everything about my life’

   S  You can edit your privacy settings here:
       S  https://www.facebook.com/settings/?tab=privacy

   S  Remember:
       S  Todays ‘friends’ may not be tomorrows friends
       S  Sharing with family, school/work colleagues can have
           unexpected consequences


May 2012
                                     S4           © 2012 Data Management & Warehousing
How is this data used?


   S  Developers use this data to ‘profile’ people

   S  This is both free to use and easy to do

   S  Uses an Application Programming Interface (API) based on
           a URL
           S  Jargon for ‘just connect to the website with the right options’

   S  Try it:
       S  https://developers.facebook.com/tools/explorer


May 2012
                                          S5             © 2012 Data Management & Warehousing
George H Takei


   S      Helmsman Sulu in Star Trek (The Original Series)

   S      Gay Rights and Japanese American Internment Activist

   S      Popular Facebook Page (1,962,290 likes) and secured

   S      Basic Info
           S    https://developers.facebook.com/tools/explorer?method=GET&path=205344452828349
           S    https://graph.facebook.com/205344452828349

   S      Photographs
           S    https://developers.facebook.com/tools/explorer?
                 method=GET&path=205344452828349%2Fphotos
           S    https://graph.facebook.com/205344452828349/photos



May 2012
                                                  S6                © 2012 Data Management & Warehousing
George H Takei’s
                             photo and its data

                 George Takei posted this photograph

   API Output (Snippet):                               It Tells Me:
                                                     S    Trevor Mullins was one of several
   {                                                       hundred people who commented on this
       "id": "373438362685623_1722672",                    photo
       "from": {
                                                     S    He did so at 03:43:56 GMT on 9th Feb
       "name": "Trevor Mullins",                           2012
       "id": "1024732813"
       },                                            S    Which 3 people liked the comment
       "message": "This. So much this.",
                                                     S    And from his profile:
       "created_time": "2012-02-09T03:43:56+0000",         His username is Ertrov, he describes
       "likes": 3                                          himself as “Agnostic-atheist/Anti-
                                                           theist”, is male, likes SiFi, and is
   }                                                       affiliated to Sinclair Community College,

                                           S
                                                           Ohio and many, many more things
May 2012                                     7                     © 2012 Data Management & Warehousing
Back to me –
                             My profile contains:

   id:         Facebook's unique                 education: Where I went to school
               reference number for me           year:       And when I left
   name:       My Full Name                      type:       And what type of school it was
   username: My Username                         gender:     My Gender
   birthday: My Date of Birth                    relationship_status:
   hometown: Where I was born                                Am I married?
   location: Where I live now                    email:      My private email
   employer: Who I work for                      website:    My website
   employer: Who I used to work for              timezone: My timezone
   projects:   Which projects I worked           locale:     What language I read facebook in
               on for that employer              languages: What languages I speak
   sports:     Which sports I like               verified:   Have I verified my email address
   favorite_teams:                               updated_time:
               Who are my favourite teams                    When did I last update my profile
                                                 type:       What type of user account
                                                             do I have
   These are just some of the fields I could populate and developers could access
May 2012
                                            S
                                            8                  © 2012 Data Management & Warehousing
I like …


   S  If I ‘like’ a product or brand on Facebook then the owner of that
           brand can use the developers interface to get information about
           me and others who ‘like’ their product

   S  For example the developer can get the age, marital status, gender,
           sexual preference (‘interested in’) and location of the ‘likers’

   S  The developer can then look for groups of people who share the
           same characteristics (e.g. 18-25, single, female, straight, Liverpool)
           S  This is called Cluster Analysis – looking for groups of similar
               people


May 2012
                                           S
                                            9              © 2012 Data Management & Warehousing
This data is valuable:
                            Very Very Valuable

   S  Once the developer has identified a ‘cluster’ of people they can
           ask Facebook to advertise to others who don’t yet ‘like’ the
           product but share the same characteristics as those that do

   S  For example, based on our previous cluster, a nightclub may want
           to target adverts to similar people in their area

   S  Facebook makes this very easy to do, you just go here:

   S  https://www.facebook.com/ads/manage/adscreator/



May 2012
                                           S
                                            10             © 2012 Data Management & Warehousing
Very precise targeting –
           know exactly who is going to see your advert




May 2012
                    S
                     11              © 2012 Data Management & Warehousing
Target audiences
                        using their
                        stated preferences


May 2012
           S
           12   © 2012 Data Management & Warehousing
Very low cost –
           Know exactly how much you are going to spend
           From an advertisers point of view this is very cost effective
           For Facebook – done at scale - it is very very profitable


May 2012
                                 S
                                  13               © 2012 Data Management & Warehousing
Dealing with the data


   S  We can look at individuals manually

   S  We can deal with ‘small’ data sets with a spread sheet
       S  50,000 rows i.e. 50,000 individuals
       S  250 columns i.e. 250 different characteristics

   S  We can deal with ‘larger’ data sets with statistical tools
       S  There are commercial and open source tool to do the stats
       S  For example: ‘R’ is free and provide direct access to the
           Facebook API and functions to do complex cluster analysis

May 2012
                                   S
                                    14            © 2012 Data Management & Warehousing
Advanced Techniques


   S  Exploiting the social network
           S  Which of my ‘likers’ know each other?
           S  Is it possible to identify an individual in the group who is the
               ‘ring-leader’
           S  Can the ring-leader be influenced towards my offering/product
           S  Can the ring-leader influence others to follow them?




May 2012
                                          S15             © 2012 Data Management & Warehousing
My Social Network


   Small groups of friends that don’t know each other            Detail – Friends who know each other
                                                                     (initials only for confidentiality)
                                                               This group all worked on a project together

            A group of friends who I watch rugby with




A tight knit group of friends from where I used to work
     May 2012
                                                          S
                                                          16              © 2012 Data Management & Warehousing
Sentiment Analysis


   S  Analyse peoples comments and use this to change your interaction
           with the you customer
   S  Use feedback (positive and negative) to respond to customers –
           remember you are looking for the main affect, you will always
           have people who have a minority opinion

   S  Simple Examples
           S  “Don’t like the new flavour”
           S  “Wish the new website had a help button”

   S  There are plenty of more sophisticated examples


May 2012
                                         S
                                          17              © 2012 Data Management & Warehousing
Applications


   S  Facebook also allows users to develop Applications
       S  Socialcam (54M users), Cityville (35M users)
       S  Texas HoldEm (35M users), DrawSomething (29M users)

   S  Allows users to buy virtual tokens with real money
       S  This in itself is a revenue generating stream

   S  Allows developers to place very targeted adverts
       S  Revenue derived from selling targeted marketing

   S  Allows developers to monitor social interactions for new trends
       S  Who do you ‘Draw Something’ with?


May 2012
                                        S
                                         18              © 2012 Data Management & Warehousing
Third Party Vetting


   S  Looking for a new job?
           S  Someone you are friends with may also know someone at your
               new employer – what information will they share?
           S  Your social activities – don’t post that you are out partying and
               then call in sick
           S  Don’t tell the world what you think of your boss, even after you
               leave the organisation – you might need a reference from him or
               your new employer might not want to expose themselves in the
               future

   S  Journalists looking for background
           S  Those grainy news photos are often found on social websites


May 2012
                                           S
                                            19             © 2012 Data Management & Warehousing
Coffee with my son


   S  One day I had coffee with my son, I took this photo and uploaded
           it to Facebook, tagging him and adding the place
   S  Facebook stored the following data:
           S    The exact date, time & GPS location of where I checked in
           S    The details of the person I was with
           S    The application on my iPhone that I used to upload the picture
           S    The people who commented, their comments and their profile
           S    And more

   S  But the photograph told another part of the story …


May 2012
                                            S
                                             20             © 2012 Data Management & Warehousing
Photographic Data


   S  Digital Cameras store data too
       S  This is called Metadata (data about data)
       S  What each device stores varies
       S  But you can download a free tool to read the metadata
              S  http://www.sno.phy.queensu.ca/~phil/exiftool/
           S  Data is stored against images, audio and video files by most
               digital recording devices including cameras, phones, scanners.
               The data is known as EXIF data
           S  This data isn’t protected by your Facebook settings


May 2012
                                         S
                                          21             © 2012 Data Management & Warehousing
What the photo told me:


   S      File name, size and type

   S      Date and Time created

   S      GPS co-ordinates - longitude, latitude & altitude

   S      Make & Model of the device used to take the photo

   S      Technical details about the photo including focal length, exposure, whether a flash was
           used, etc

   S      Whether the photo has subsequently been edited and if so when and by what application

   S      Copyright information could also have be added to the image



May 2012
                                                   S
                                                    22                 © 2012 Data Management & Warehousing
What does all this add to the
                data stored by Facebook?

   S  I can validate the date, time and location of the check-in on
           Facebook

   S  I can understand what type of device the user carries around

   S  I can understand a breach of copyright for certain materials




May 2012
                                  S
                                   23           © 2012 Data Management & Warehousing
What about other sites?

           Facebook 900M users
   S 
                                                       S    This is not a Facebook specific thing
   S      Qzone (China) 480M users
                                                       S    All sites allow developers to access the data
   S      Twitter 300M users
                                                       S    Developer access is key to how organisations
           Sina Weibo (China) 300M users
   S 
                                                             make money from social websites
   S      Habbo (31 counties) 200M users
                                                       S    Many people put different data on different
   S      Google+ 170M users                                social websites

           Renren (China) 160M users
   S 
                                                       S    Developers can use common data (e.g. an e-
                                                             mail address) to piece together an even deeper
   S      Badoo (Europe & Latin America) 120M users
                                                             picture of an individual
   S      Linkedin 120M user


May 2012
                                                             S24                © 2012 Data Management & Warehousing
Non-social
                                    (internal) data

   S  Other organisations are gathering lots of data from internal
           sources rather than social networks
           S  Telematics devices for car insurance
           S  Smart metering devices for energy consumption
           S  Credit card transactions for fraud detection

   S  These are being manipulated and analysed using the same
           techniques
   S  These are the ‘Big Data’ stories you read about in the press


May 2012
                                         S
                                          25          © 2012 Data Management & Warehousing
Telematics Insurance


   S  Buy cheap car insurance in exchange for having a ‘black box’ installed in your
           car, known as a Telematics box

   S  This sends data back to a central computer periodically
       S  Typically every couple of minutes/miles
       S  All the data every 100ms over a 2 second interval when there is an impact

   S  Minimum data set
       S  Longitude, Latitude, Altitude, X-Acceleration, Y-Acceleration, Z-Acceleration,
           Speed, Compass Direction Of Travel

   S  More advance units gather more data
       S  Camera data, Engine data, Service History, etc.



May 2012
                                            S
                                             26                © 2012 Data Management & Warehousing
Telematics Plot




   S  Trip from Wokingham to Walton-Upon-Thames

   S  Rendered on Google Maps with a KML file (Free to use)
May 2012
                                  S
                                   27           © 2012 Data Management & Warehousing
Using Telematics Data


   S  Assess customer driving pattern
           S  Adjust the car insurance premium accordingly

   S  Assess accidents
           S  Can be used to determine fault in collisions
           S  Can be used to determine if whiplash is likely

   S  Assess other types of car insurance fraud

   S  Allows insurance companies to “optimize” premiums
           S  Charge as much as possible but be cheaper than the competition


May 2012
                                           S
                                            28             © 2012 Data Management & Warehousing
Telematics Insurers in the UK




Source: http://comparethebox.com
May 2012
                                   S
                                   29   © 2012 Data Management & Warehousing
Integrating Social Data
                       and Non-Social Data

   S  Organisations are starting to combine internal data with
           social network data to create an even deeper understanding
           of the customer

   S  All of the above examples given are from real projects that
           we, as a company, have already been involved in




May 2012
                                     S
                                      30           © 2012 Data Management & Warehousing
Integrated Data


   S  A youth buys cheap telematics insurance …
           S  When he gets it he ‘likes’ the product on on Facebook
               S  Positive Sentiment Analysis – Opportunity to thank customer
           S  When he gets charged for the top-up miles he ‘dislikes’ the cost
               S  Negative Sentiment Analysis – Opportunity to address concerns
           S  When he has an accident and tells his mates what really happened
               S  Fraud detection – Opportunity to check the veracity of the claim

   S  What you say and do socially now will affect your commercial
           transactions in the future


May 2012
                                             S
                                              31              © 2012 Data Management & Warehousing
Can I Opt-Out?


   S  No – you can limit your exposure but you can’t opt out of big data

   S  You don’t have to join social networks but:
           S  Many social activities are based around Twitter/Facebook
           S  Most business people will want to use LinkedIn
           S  Peer pressure to join, especially for younger people, is high

   S  Your data will be analysed by companies involved in
           S  Marketing, Financial (especially underwriting & fraud),
           S  Energy consumption, and many more
           S  They will source the data internally and from social networks


May 2012
                                            S
                                             32              © 2012 Data Management & Warehousing
What about crime?


   S      Most uses of social data are positive
           S    Reduce fraud, improve product, more precisely targeted marketing, energy efficiency

   S      But criminals can use this technology too
           S    Most of the technology is either low cost or free
           S    New techniques for exploiting data evolve very quickly

   S      Identity theft is just one possible outcome

   S      It’s an arms race – Can we (the good guys) find ways to protect ourselves and those that
           share their data with us faster than the bad guys develop techniques to exploit this
           information?

   S      Make sure you understand what you are sharing and with whom you are sharing data



May 2012
                                                      S33                 © 2012 Data Management & Warehousing
Security


   S  Remember
           S  Set your privacy settings on Facebook
           S  Things that help people communicate with you (data of birth, first
               school, first pet, mothers maiden name, etc.) are also the most
               common security questions for online banking, etc.
           S  Facebook friends are not real friends – beware of ‘friending’
               people you don’t actually know and ‘liking’ dubious groups
           S  Remember your ‘friends’ may not be so in the future or may have
               greater loyalties to others than they do to you
           S  You may get profiled and targeted as a ‘false positive’ i.e. you
               aren’t interested in the product/offering but match the criteria


May 2012
                                          S
                                           34             © 2012 Data Management & Warehousing
It’s not just
                                     social websites

   S  Other sites also hold complex social information

           S  Directory Websites: 192.com, company-director-check.co.uk

           S  Family History Websites: ancestry.co.uk, findmypast.com

           S  Large scale online retailers: amazon.com, apple.com, tesco.com




May 2012
                                           S
                                            35              © 2012 Data Management & Warehousing
Who does this work?


   S  Data Scientists
       S  A data scientist is a job title for an employee or business intelligence (BI)
           consultant who excels at analysing data, particularly large amounts of data, to
           help a business gain a competitive edge
       S  The position is gaining acceptance (and significant salaries) with large
           enterprises who are interested in deriving meaning from big data, the
           voluminous amount of structured, unstructured and semi-structured data that
           a large enterprise produces.
       S  A data scientist possesses a combination of analytic, machine learning, data
           mining and statistical skills as well as experience with algorithms and coding.
           Perhaps the most important skill a data scientist possesses, however, is the
           ability to explain the significance of data in a way that can be easily
           understood by others.
       S  Most often Maths or Computer Studies graduates with Business skills



May 2012
                                             S
                                              36                © 2012 Data Management & Warehousing
Notes on this presentation


   S  All trademarks and brand names are the property of their respective owners

   S  This presentation is designed to show capabilities, tools and techniques and is
           in no way condoning or condemning any organisation, product, technology or
           tool

   S  Other tools and products are available

   S  Data access may be restricted by user permissions

   S  Data access may be restricted by law

   S  Data access may be restricted by data provider terms & conditions



May 2012
                                            S37             © 2012 Data Management & Warehousing
Contact Us


   S  Data Management & Warehousing
       S  Website: http://www.datamgmt.com
       S  Telephone: +44 (0) 118 321 5930

   S  David Walker
       S  E-Mail: davidw@datamgmt.com
       S  Telephone: +44 (0) 7990 594 372
       S  Skype: datamgmt
       S  White Papers: http://scribd.com/davidmwalker


May 2012
                                  S
                                   38           © 2012 Data Management & Warehousing
About Us


     Data Management & Warehousing is a UK based consultancy that has
     been delivering successful business intelligence and data warehousing
                             solutions since 1995.

    Our consultants have worked with major corporations around the world
            including the US, Europe, Africa and the Middle East.

    We have worked in many industry sectors such as telcos, manufacturing,
      retail, financial and transport. We provide governance and project
         management as well as expertise in the leading technologies.



May 2012
                                     S
                                      39             © 2012 Data Management & Warehousing
Thank You
           ©2012 - Data Management & Warehousing
                  http://www.datamgmt.com




May 2012
                           S
                            40            © 2012 Data Management & Warehousing

An introduction to social network data

  • 1.
    An Introduction To Social Network Data David M Walker Data Management & Warehousing May 2012 May 2012 S1 © 2012 Data Management & Warehousing
  • 2.
    Hi, I’m onFacebook! S  I’m one of 900 Million people as of May 2012 that has a Facebook account S  That’s more than 1 in 8 of every man, woman and child on the planet (and the 6 crew of the International Space Station) regardless of age, race, religion, location, sexuality, etc. S  I’ve also completed my profile – it helps my family & friends find and communicate with me S  It even reminds people to wish me ‘Happy Birthday’ May 2012 S 2 © 2012 Data Management & Warehousing
  • 3.
    My Profile Page May2012 S 3 © 2012 Data Management & Warehousing
  • 4.
    But what amI sharing ? S  Depending on my privacy settings I will be sharing anything from ‘some data’ to ‘everything about my life’ S  You can edit your privacy settings here: S  https://www.facebook.com/settings/?tab=privacy S  Remember: S  Todays ‘friends’ may not be tomorrows friends S  Sharing with family, school/work colleagues can have unexpected consequences May 2012 S4 © 2012 Data Management & Warehousing
  • 5.
    How is thisdata used? S  Developers use this data to ‘profile’ people S  This is both free to use and easy to do S  Uses an Application Programming Interface (API) based on a URL S  Jargon for ‘just connect to the website with the right options’ S  Try it: S  https://developers.facebook.com/tools/explorer May 2012 S5 © 2012 Data Management & Warehousing
  • 6.
    George H Takei S  Helmsman Sulu in Star Trek (The Original Series) S  Gay Rights and Japanese American Internment Activist S  Popular Facebook Page (1,962,290 likes) and secured S  Basic Info S  https://developers.facebook.com/tools/explorer?method=GET&path=205344452828349 S  https://graph.facebook.com/205344452828349 S  Photographs S  https://developers.facebook.com/tools/explorer? method=GET&path=205344452828349%2Fphotos S  https://graph.facebook.com/205344452828349/photos May 2012 S6 © 2012 Data Management & Warehousing
  • 7.
    George H Takei’s photo and its data George Takei posted this photograph API Output (Snippet): It Tells Me: S  Trevor Mullins was one of several { hundred people who commented on this "id": "373438362685623_1722672", photo "from": { S  He did so at 03:43:56 GMT on 9th Feb "name": "Trevor Mullins", 2012 "id": "1024732813" }, S  Which 3 people liked the comment "message": "This. So much this.", S  And from his profile: "created_time": "2012-02-09T03:43:56+0000", His username is Ertrov, he describes "likes": 3 himself as “Agnostic-atheist/Anti- theist”, is male, likes SiFi, and is } affiliated to Sinclair Community College, S Ohio and many, many more things May 2012 7 © 2012 Data Management & Warehousing
  • 8.
    Back to me– My profile contains: id: Facebook's unique education: Where I went to school reference number for me year: And when I left name: My Full Name type: And what type of school it was username: My Username gender: My Gender birthday: My Date of Birth relationship_status: hometown: Where I was born Am I married? location: Where I live now email: My private email employer: Who I work for website: My website employer: Who I used to work for timezone: My timezone projects: Which projects I worked locale: What language I read facebook in on for that employer languages: What languages I speak sports: Which sports I like verified: Have I verified my email address favorite_teams: updated_time: Who are my favourite teams When did I last update my profile type: What type of user account do I have These are just some of the fields I could populate and developers could access May 2012 S 8 © 2012 Data Management & Warehousing
  • 9.
    I like … S  If I ‘like’ a product or brand on Facebook then the owner of that brand can use the developers interface to get information about me and others who ‘like’ their product S  For example the developer can get the age, marital status, gender, sexual preference (‘interested in’) and location of the ‘likers’ S  The developer can then look for groups of people who share the same characteristics (e.g. 18-25, single, female, straight, Liverpool) S  This is called Cluster Analysis – looking for groups of similar people May 2012 S 9 © 2012 Data Management & Warehousing
  • 10.
    This data isvaluable: Very Very Valuable S  Once the developer has identified a ‘cluster’ of people they can ask Facebook to advertise to others who don’t yet ‘like’ the product but share the same characteristics as those that do S  For example, based on our previous cluster, a nightclub may want to target adverts to similar people in their area S  Facebook makes this very easy to do, you just go here: S  https://www.facebook.com/ads/manage/adscreator/ May 2012 S 10 © 2012 Data Management & Warehousing
  • 11.
    Very precise targeting– know exactly who is going to see your advert May 2012 S 11 © 2012 Data Management & Warehousing
  • 12.
    Target audiences using their stated preferences May 2012 S 12 © 2012 Data Management & Warehousing
  • 13.
    Very low cost– Know exactly how much you are going to spend From an advertisers point of view this is very cost effective For Facebook – done at scale - it is very very profitable May 2012 S 13 © 2012 Data Management & Warehousing
  • 14.
    Dealing with thedata S  We can look at individuals manually S  We can deal with ‘small’ data sets with a spread sheet S  50,000 rows i.e. 50,000 individuals S  250 columns i.e. 250 different characteristics S  We can deal with ‘larger’ data sets with statistical tools S  There are commercial and open source tool to do the stats S  For example: ‘R’ is free and provide direct access to the Facebook API and functions to do complex cluster analysis May 2012 S 14 © 2012 Data Management & Warehousing
  • 15.
    Advanced Techniques S  Exploiting the social network S  Which of my ‘likers’ know each other? S  Is it possible to identify an individual in the group who is the ‘ring-leader’ S  Can the ring-leader be influenced towards my offering/product S  Can the ring-leader influence others to follow them? May 2012 S15 © 2012 Data Management & Warehousing
  • 16.
    My Social Network Small groups of friends that don’t know each other Detail – Friends who know each other (initials only for confidentiality) This group all worked on a project together A group of friends who I watch rugby with A tight knit group of friends from where I used to work May 2012 S 16 © 2012 Data Management & Warehousing
  • 17.
    Sentiment Analysis S  Analyse peoples comments and use this to change your interaction with the you customer S  Use feedback (positive and negative) to respond to customers – remember you are looking for the main affect, you will always have people who have a minority opinion S  Simple Examples S  “Don’t like the new flavour” S  “Wish the new website had a help button” S  There are plenty of more sophisticated examples May 2012 S 17 © 2012 Data Management & Warehousing
  • 18.
    Applications S  Facebook also allows users to develop Applications S  Socialcam (54M users), Cityville (35M users) S  Texas HoldEm (35M users), DrawSomething (29M users) S  Allows users to buy virtual tokens with real money S  This in itself is a revenue generating stream S  Allows developers to place very targeted adverts S  Revenue derived from selling targeted marketing S  Allows developers to monitor social interactions for new trends S  Who do you ‘Draw Something’ with? May 2012 S 18 © 2012 Data Management & Warehousing
  • 19.
    Third Party Vetting S  Looking for a new job? S  Someone you are friends with may also know someone at your new employer – what information will they share? S  Your social activities – don’t post that you are out partying and then call in sick S  Don’t tell the world what you think of your boss, even after you leave the organisation – you might need a reference from him or your new employer might not want to expose themselves in the future S  Journalists looking for background S  Those grainy news photos are often found on social websites May 2012 S 19 © 2012 Data Management & Warehousing
  • 20.
    Coffee with myson S  One day I had coffee with my son, I took this photo and uploaded it to Facebook, tagging him and adding the place S  Facebook stored the following data: S  The exact date, time & GPS location of where I checked in S  The details of the person I was with S  The application on my iPhone that I used to upload the picture S  The people who commented, their comments and their profile S  And more S  But the photograph told another part of the story … May 2012 S 20 © 2012 Data Management & Warehousing
  • 21.
    Photographic Data S  Digital Cameras store data too S  This is called Metadata (data about data) S  What each device stores varies S  But you can download a free tool to read the metadata S  http://www.sno.phy.queensu.ca/~phil/exiftool/ S  Data is stored against images, audio and video files by most digital recording devices including cameras, phones, scanners. The data is known as EXIF data S  This data isn’t protected by your Facebook settings May 2012 S 21 © 2012 Data Management & Warehousing
  • 22.
    What the phototold me: S  File name, size and type S  Date and Time created S  GPS co-ordinates - longitude, latitude & altitude S  Make & Model of the device used to take the photo S  Technical details about the photo including focal length, exposure, whether a flash was used, etc S  Whether the photo has subsequently been edited and if so when and by what application S  Copyright information could also have be added to the image May 2012 S 22 © 2012 Data Management & Warehousing
  • 23.
    What does allthis add to the data stored by Facebook? S  I can validate the date, time and location of the check-in on Facebook S  I can understand what type of device the user carries around S  I can understand a breach of copyright for certain materials May 2012 S 23 © 2012 Data Management & Warehousing
  • 24.
    What about othersites? Facebook 900M users S  S  This is not a Facebook specific thing S  Qzone (China) 480M users S  All sites allow developers to access the data S  Twitter 300M users S  Developer access is key to how organisations Sina Weibo (China) 300M users S  make money from social websites S  Habbo (31 counties) 200M users S  Many people put different data on different S  Google+ 170M users social websites Renren (China) 160M users S  S  Developers can use common data (e.g. an e- mail address) to piece together an even deeper S  Badoo (Europe & Latin America) 120M users picture of an individual S  Linkedin 120M user May 2012 S24 © 2012 Data Management & Warehousing
  • 25.
    Non-social (internal) data S  Other organisations are gathering lots of data from internal sources rather than social networks S  Telematics devices for car insurance S  Smart metering devices for energy consumption S  Credit card transactions for fraud detection S  These are being manipulated and analysed using the same techniques S  These are the ‘Big Data’ stories you read about in the press May 2012 S 25 © 2012 Data Management & Warehousing
  • 26.
    Telematics Insurance S  Buy cheap car insurance in exchange for having a ‘black box’ installed in your car, known as a Telematics box S  This sends data back to a central computer periodically S  Typically every couple of minutes/miles S  All the data every 100ms over a 2 second interval when there is an impact S  Minimum data set S  Longitude, Latitude, Altitude, X-Acceleration, Y-Acceleration, Z-Acceleration, Speed, Compass Direction Of Travel S  More advance units gather more data S  Camera data, Engine data, Service History, etc. May 2012 S 26 © 2012 Data Management & Warehousing
  • 27.
    Telematics Plot S  Trip from Wokingham to Walton-Upon-Thames S  Rendered on Google Maps with a KML file (Free to use) May 2012 S 27 © 2012 Data Management & Warehousing
  • 28.
    Using Telematics Data S  Assess customer driving pattern S  Adjust the car insurance premium accordingly S  Assess accidents S  Can be used to determine fault in collisions S  Can be used to determine if whiplash is likely S  Assess other types of car insurance fraud S  Allows insurance companies to “optimize” premiums S  Charge as much as possible but be cheaper than the competition May 2012 S 28 © 2012 Data Management & Warehousing
  • 29.
    Telematics Insurers inthe UK Source: http://comparethebox.com May 2012 S 29 © 2012 Data Management & Warehousing
  • 30.
    Integrating Social Data and Non-Social Data S  Organisations are starting to combine internal data with social network data to create an even deeper understanding of the customer S  All of the above examples given are from real projects that we, as a company, have already been involved in May 2012 S 30 © 2012 Data Management & Warehousing
  • 31.
    Integrated Data S  A youth buys cheap telematics insurance … S  When he gets it he ‘likes’ the product on on Facebook S  Positive Sentiment Analysis – Opportunity to thank customer S  When he gets charged for the top-up miles he ‘dislikes’ the cost S  Negative Sentiment Analysis – Opportunity to address concerns S  When he has an accident and tells his mates what really happened S  Fraud detection – Opportunity to check the veracity of the claim S  What you say and do socially now will affect your commercial transactions in the future May 2012 S 31 © 2012 Data Management & Warehousing
  • 32.
    Can I Opt-Out? S  No – you can limit your exposure but you can’t opt out of big data S  You don’t have to join social networks but: S  Many social activities are based around Twitter/Facebook S  Most business people will want to use LinkedIn S  Peer pressure to join, especially for younger people, is high S  Your data will be analysed by companies involved in S  Marketing, Financial (especially underwriting & fraud), S  Energy consumption, and many more S  They will source the data internally and from social networks May 2012 S 32 © 2012 Data Management & Warehousing
  • 33.
    What about crime? S  Most uses of social data are positive S  Reduce fraud, improve product, more precisely targeted marketing, energy efficiency S  But criminals can use this technology too S  Most of the technology is either low cost or free S  New techniques for exploiting data evolve very quickly S  Identity theft is just one possible outcome S  It’s an arms race – Can we (the good guys) find ways to protect ourselves and those that share their data with us faster than the bad guys develop techniques to exploit this information? S  Make sure you understand what you are sharing and with whom you are sharing data May 2012 S33 © 2012 Data Management & Warehousing
  • 34.
    Security S  Remember S  Set your privacy settings on Facebook S  Things that help people communicate with you (data of birth, first school, first pet, mothers maiden name, etc.) are also the most common security questions for online banking, etc. S  Facebook friends are not real friends – beware of ‘friending’ people you don’t actually know and ‘liking’ dubious groups S  Remember your ‘friends’ may not be so in the future or may have greater loyalties to others than they do to you S  You may get profiled and targeted as a ‘false positive’ i.e. you aren’t interested in the product/offering but match the criteria May 2012 S 34 © 2012 Data Management & Warehousing
  • 35.
    It’s not just social websites S  Other sites also hold complex social information S  Directory Websites: 192.com, company-director-check.co.uk S  Family History Websites: ancestry.co.uk, findmypast.com S  Large scale online retailers: amazon.com, apple.com, tesco.com May 2012 S 35 © 2012 Data Management & Warehousing
  • 36.
    Who does thiswork? S  Data Scientists S  A data scientist is a job title for an employee or business intelligence (BI) consultant who excels at analysing data, particularly large amounts of data, to help a business gain a competitive edge S  The position is gaining acceptance (and significant salaries) with large enterprises who are interested in deriving meaning from big data, the voluminous amount of structured, unstructured and semi-structured data that a large enterprise produces. S  A data scientist possesses a combination of analytic, machine learning, data mining and statistical skills as well as experience with algorithms and coding. Perhaps the most important skill a data scientist possesses, however, is the ability to explain the significance of data in a way that can be easily understood by others. S  Most often Maths or Computer Studies graduates with Business skills May 2012 S 36 © 2012 Data Management & Warehousing
  • 37.
    Notes on thispresentation S  All trademarks and brand names are the property of their respective owners S  This presentation is designed to show capabilities, tools and techniques and is in no way condoning or condemning any organisation, product, technology or tool S  Other tools and products are available S  Data access may be restricted by user permissions S  Data access may be restricted by law S  Data access may be restricted by data provider terms & conditions May 2012 S37 © 2012 Data Management & Warehousing
  • 38.
    Contact Us S  Data Management & Warehousing S  Website: http://www.datamgmt.com S  Telephone: +44 (0) 118 321 5930 S  David Walker S  E-Mail: davidw@datamgmt.com S  Telephone: +44 (0) 7990 594 372 S  Skype: datamgmt S  White Papers: http://scribd.com/davidmwalker May 2012 S 38 © 2012 Data Management & Warehousing
  • 39.
    About Us Data Management & Warehousing is a UK based consultancy that has been delivering successful business intelligence and data warehousing solutions since 1995. Our consultants have worked with major corporations around the world including the US, Europe, Africa and the Middle East. We have worked in many industry sectors such as telcos, manufacturing, retail, financial and transport. We provide governance and project management as well as expertise in the leading technologies. May 2012 S 39 © 2012 Data Management & Warehousing
  • 40.
    Thank You ©2012 - Data Management & Warehousing http://www.datamgmt.com May 2012 S 40 © 2012 Data Management & Warehousing