An introduction to social network data


Published on

A presentation that looks the data stored in social network sites and how that is used by developers to develop a better understanding of their users

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

An introduction to social network data

  1. 1. An Introduction To Social Network Data David M Walker Data Management & Warehousing May 2012May 2012 S1 © 2012 Data Management & Warehousing
  2. 2. Hi, I’m on Facebook! S  I’m one of 900 Million people as of May 2012 that has a Facebook account S  That’s more than 1 in 8 of every man, woman and child on the planet (and the 6 crew of the International Space Station) regardless of age, race, religion, location, sexuality, etc. S  I’ve also completed my profile – it helps my family & friends find and communicate with me S  It even reminds people to wish me ‘Happy Birthday’May 2012 S 2 © 2012 Data Management & Warehousing
  3. 3. My Profile PageMay 2012 S 3 © 2012 Data Management & Warehousing
  4. 4. But what am I sharing ? S  Depending on my privacy settings I will be sharing anything from ‘some data’ to ‘everything about my life’ S  You can edit your privacy settings here: S S  Remember: S  Todays ‘friends’ may not be tomorrows friends S  Sharing with family, school/work colleagues can have unexpected consequencesMay 2012 S4 © 2012 Data Management & Warehousing
  5. 5. How is this data used? S  Developers use this data to ‘profile’ people S  This is both free to use and easy to do S  Uses an Application Programming Interface (API) based on a URL S  Jargon for ‘just connect to the website with the right options’ S  Try it: S 2012 S5 © 2012 Data Management & Warehousing
  6. 6. George H Takei S  Helmsman Sulu in Star Trek (The Original Series) S  Gay Rights and Japanese American Internment Activist S  Popular Facebook Page (1,962,290 likes) and secured S  Basic Info S S S  Photographs S method=GET&path=205344452828349%2Fphotos S 2012 S6 © 2012 Data Management & Warehousing
  7. 7. George H Takei’s photo and its data George Takei posted this photograph API Output (Snippet): It Tells Me: S  Trevor Mullins was one of several { hundred people who commented on this "id": "373438362685623_1722672", photo "from": { S  He did so at 03:43:56 GMT on 9th Feb "name": "Trevor Mullins", 2012 "id": "1024732813" }, S  Which 3 people liked the comment "message": "This. So much this.", S  And from his profile: "created_time": "2012-02-09T03:43:56+0000", His username is Ertrov, he describes "likes": 3 himself as “Agnostic-atheist/Anti- theist”, is male, likes SiFi, and is } affiliated to Sinclair Community College, S Ohio and many, many more thingsMay 2012 7 © 2012 Data Management & Warehousing
  8. 8. Back to me – My profile contains: id: Facebooks unique education: Where I went to school reference number for me year: And when I left name: My Full Name type: And what type of school it was username: My Username gender: My Gender birthday: My Date of Birth relationship_status: hometown: Where I was born Am I married? location: Where I live now email: My private email employer: Who I work for website: My website employer: Who I used to work for timezone: My timezone projects: Which projects I worked locale: What language I read facebook in on for that employer languages: What languages I speak sports: Which sports I like verified: Have I verified my email address favorite_teams: updated_time: Who are my favourite teams When did I last update my profile type: What type of user account do I have These are just some of the fields I could populate and developers could accessMay 2012 S 8 © 2012 Data Management & Warehousing
  9. 9. I like … S  If I ‘like’ a product or brand on Facebook then the owner of that brand can use the developers interface to get information about me and others who ‘like’ their product S  For example the developer can get the age, marital status, gender, sexual preference (‘interested in’) and location of the ‘likers’ S  The developer can then look for groups of people who share the same characteristics (e.g. 18-25, single, female, straight, Liverpool) S  This is called Cluster Analysis – looking for groups of similar peopleMay 2012 S 9 © 2012 Data Management & Warehousing
  10. 10. This data is valuable: Very Very Valuable S  Once the developer has identified a ‘cluster’ of people they can ask Facebook to advertise to others who don’t yet ‘like’ the product but share the same characteristics as those that do S  For example, based on our previous cluster, a nightclub may want to target adverts to similar people in their area S  Facebook makes this very easy to do, you just go here: S 2012 S 10 © 2012 Data Management & Warehousing
  11. 11. Very precise targeting – know exactly who is going to see your advertMay 2012 S 11 © 2012 Data Management & Warehousing
  12. 12. Target audiences using their stated preferencesMay 2012 S 12 © 2012 Data Management & Warehousing
  13. 13. Very low cost – Know exactly how much you are going to spend From an advertisers point of view this is very cost effective For Facebook – done at scale - it is very very profitableMay 2012 S 13 © 2012 Data Management & Warehousing
  14. 14. Dealing with the data S  We can look at individuals manually S  We can deal with ‘small’ data sets with a spread sheet S  50,000 rows i.e. 50,000 individuals S  250 columns i.e. 250 different characteristics S  We can deal with ‘larger’ data sets with statistical tools S  There are commercial and open source tool to do the stats S  For example: ‘R’ is free and provide direct access to the Facebook API and functions to do complex cluster analysisMay 2012 S 14 © 2012 Data Management & Warehousing
  15. 15. Advanced Techniques S  Exploiting the social network S  Which of my ‘likers’ know each other? S  Is it possible to identify an individual in the group who is the ‘ring-leader’ S  Can the ring-leader be influenced towards my offering/product S  Can the ring-leader influence others to follow them?May 2012 S15 © 2012 Data Management & Warehousing
  16. 16. My Social Network Small groups of friends that don’t know each other Detail – Friends who know each other (initials only for confidentiality) This group all worked on a project together A group of friends who I watch rugby withA tight knit group of friends from where I used to work May 2012 S 16 © 2012 Data Management & Warehousing
  17. 17. Sentiment Analysis S  Analyse peoples comments and use this to change your interaction with the you customer S  Use feedback (positive and negative) to respond to customers – remember you are looking for the main affect, you will always have people who have a minority opinion S  Simple Examples S  “Don’t like the new flavour” S  “Wish the new website had a help button” S  There are plenty of more sophisticated examplesMay 2012 S 17 © 2012 Data Management & Warehousing
  18. 18. Applications S  Facebook also allows users to develop Applications S  Socialcam (54M users), Cityville (35M users) S  Texas HoldEm (35M users), DrawSomething (29M users) S  Allows users to buy virtual tokens with real money S  This in itself is a revenue generating stream S  Allows developers to place very targeted adverts S  Revenue derived from selling targeted marketing S  Allows developers to monitor social interactions for new trends S  Who do you ‘Draw Something’ with?May 2012 S 18 © 2012 Data Management & Warehousing
  19. 19. Third Party Vetting S  Looking for a new job? S  Someone you are friends with may also know someone at your new employer – what information will they share? S  Your social activities – don’t post that you are out partying and then call in sick S  Don’t tell the world what you think of your boss, even after you leave the organisation – you might need a reference from him or your new employer might not want to expose themselves in the future S  Journalists looking for background S  Those grainy news photos are often found on social websitesMay 2012 S 19 © 2012 Data Management & Warehousing
  20. 20. Coffee with my son S  One day I had coffee with my son, I took this photo and uploaded it to Facebook, tagging him and adding the place S  Facebook stored the following data: S  The exact date, time & GPS location of where I checked in S  The details of the person I was with S  The application on my iPhone that I used to upload the picture S  The people who commented, their comments and their profile S  And more S  But the photograph told another part of the story …May 2012 S 20 © 2012 Data Management & Warehousing
  21. 21. Photographic Data S  Digital Cameras store data too S  This is called Metadata (data about data) S  What each device stores varies S  But you can download a free tool to read the metadata S S  Data is stored against images, audio and video files by most digital recording devices including cameras, phones, scanners. The data is known as EXIF data S  This data isn’t protected by your Facebook settingsMay 2012 S 21 © 2012 Data Management & Warehousing
  22. 22. What the photo told me: S  File name, size and type S  Date and Time created S  GPS co-ordinates - longitude, latitude & altitude S  Make & Model of the device used to take the photo S  Technical details about the photo including focal length, exposure, whether a flash was used, etc S  Whether the photo has subsequently been edited and if so when and by what application S  Copyright information could also have be added to the imageMay 2012 S 22 © 2012 Data Management & Warehousing
  23. 23. What does all this add to the data stored by Facebook? S  I can validate the date, time and location of the check-in on Facebook S  I can understand what type of device the user carries around S  I can understand a breach of copyright for certain materialsMay 2012 S 23 © 2012 Data Management & Warehousing
  24. 24. What about other sites? Facebook 900M users S  S  This is not a Facebook specific thing S  Qzone (China) 480M users S  All sites allow developers to access the data S  Twitter 300M users S  Developer access is key to how organisations Sina Weibo (China) 300M users S  make money from social websites S  Habbo (31 counties) 200M users S  Many people put different data on different S  Google+ 170M users social websites Renren (China) 160M users S  S  Developers can use common data (e.g. an e- mail address) to piece together an even deeper S  Badoo (Europe & Latin America) 120M users picture of an individual S  Linkedin 120M userMay 2012 S24 © 2012 Data Management & Warehousing
  25. 25. Non-social (internal) data S  Other organisations are gathering lots of data from internal sources rather than social networks S  Telematics devices for car insurance S  Smart metering devices for energy consumption S  Credit card transactions for fraud detection S  These are being manipulated and analysed using the same techniques S  These are the ‘Big Data’ stories you read about in the pressMay 2012 S 25 © 2012 Data Management & Warehousing
  26. 26. Telematics Insurance S  Buy cheap car insurance in exchange for having a ‘black box’ installed in your car, known as a Telematics box S  This sends data back to a central computer periodically S  Typically every couple of minutes/miles S  All the data every 100ms over a 2 second interval when there is an impact S  Minimum data set S  Longitude, Latitude, Altitude, X-Acceleration, Y-Acceleration, Z-Acceleration, Speed, Compass Direction Of Travel S  More advance units gather more data S  Camera data, Engine data, Service History, etc.May 2012 S 26 © 2012 Data Management & Warehousing
  27. 27. Telematics Plot S  Trip from Wokingham to Walton-Upon-Thames S  Rendered on Google Maps with a KML file (Free to use)May 2012 S 27 © 2012 Data Management & Warehousing
  28. 28. Using Telematics Data S  Assess customer driving pattern S  Adjust the car insurance premium accordingly S  Assess accidents S  Can be used to determine fault in collisions S  Can be used to determine if whiplash is likely S  Assess other types of car insurance fraud S  Allows insurance companies to “optimize” premiums S  Charge as much as possible but be cheaper than the competitionMay 2012 S 28 © 2012 Data Management & Warehousing
  29. 29. Telematics Insurers in the UKSource: http://comparethebox.comMay 2012 S 29 © 2012 Data Management & Warehousing
  30. 30. Integrating Social Data and Non-Social Data S  Organisations are starting to combine internal data with social network data to create an even deeper understanding of the customer S  All of the above examples given are from real projects that we, as a company, have already been involved inMay 2012 S 30 © 2012 Data Management & Warehousing
  31. 31. Integrated Data S  A youth buys cheap telematics insurance … S  When he gets it he ‘likes’ the product on on Facebook S  Positive Sentiment Analysis – Opportunity to thank customer S  When he gets charged for the top-up miles he ‘dislikes’ the cost S  Negative Sentiment Analysis – Opportunity to address concerns S  When he has an accident and tells his mates what really happened S  Fraud detection – Opportunity to check the veracity of the claim S  What you say and do socially now will affect your commercial transactions in the futureMay 2012 S 31 © 2012 Data Management & Warehousing
  32. 32. Can I Opt-Out? S  No – you can limit your exposure but you can’t opt out of big data S  You don’t have to join social networks but: S  Many social activities are based around Twitter/Facebook S  Most business people will want to use LinkedIn S  Peer pressure to join, especially for younger people, is high S  Your data will be analysed by companies involved in S  Marketing, Financial (especially underwriting & fraud), S  Energy consumption, and many more S  They will source the data internally and from social networksMay 2012 S 32 © 2012 Data Management & Warehousing
  33. 33. What about crime? S  Most uses of social data are positive S  Reduce fraud, improve product, more precisely targeted marketing, energy efficiency S  But criminals can use this technology too S  Most of the technology is either low cost or free S  New techniques for exploiting data evolve very quickly S  Identity theft is just one possible outcome S  It’s an arms race – Can we (the good guys) find ways to protect ourselves and those that share their data with us faster than the bad guys develop techniques to exploit this information? S  Make sure you understand what you are sharing and with whom you are sharing dataMay 2012 S33 © 2012 Data Management & Warehousing
  34. 34. Security S  Remember S  Set your privacy settings on Facebook S  Things that help people communicate with you (data of birth, first school, first pet, mothers maiden name, etc.) are also the most common security questions for online banking, etc. S  Facebook friends are not real friends – beware of ‘friending’ people you don’t actually know and ‘liking’ dubious groups S  Remember your ‘friends’ may not be so in the future or may have greater loyalties to others than they do to you S  You may get profiled and targeted as a ‘false positive’ i.e. you aren’t interested in the product/offering but match the criteriaMay 2012 S 34 © 2012 Data Management & Warehousing
  35. 35. It’s not just social websites S  Other sites also hold complex social information S  Directory Websites:, S  Family History Websites:, S  Large scale online retailers:,, tesco.comMay 2012 S 35 © 2012 Data Management & Warehousing
  36. 36. Who does this work? S  Data Scientists S  A data scientist is a job title for an employee or business intelligence (BI) consultant who excels at analysing data, particularly large amounts of data, to help a business gain a competitive edge S  The position is gaining acceptance (and significant salaries) with large enterprises who are interested in deriving meaning from big data, the voluminous amount of structured, unstructured and semi-structured data that a large enterprise produces. S  A data scientist possesses a combination of analytic, machine learning, data mining and statistical skills as well as experience with algorithms and coding. Perhaps the most important skill a data scientist possesses, however, is the ability to explain the significance of data in a way that can be easily understood by others. S  Most often Maths or Computer Studies graduates with Business skillsMay 2012 S 36 © 2012 Data Management & Warehousing
  37. 37. Notes on this presentation S  All trademarks and brand names are the property of their respective owners S  This presentation is designed to show capabilities, tools and techniques and is in no way condoning or condemning any organisation, product, technology or tool S  Other tools and products are available S  Data access may be restricted by user permissions S  Data access may be restricted by law S  Data access may be restricted by data provider terms & conditionsMay 2012 S37 © 2012 Data Management & Warehousing
  38. 38. Contact Us S  Data Management & Warehousing S  Website: S  Telephone: +44 (0) 118 321 5930 S  David Walker S  E-Mail: S  Telephone: +44 (0) 7990 594 372 S  Skype: datamgmt S  White Papers: 2012 S 38 © 2012 Data Management & Warehousing
  39. 39. About Us Data Management & Warehousing is a UK based consultancy that has been delivering successful business intelligence and data warehousing solutions since 1995. Our consultants have worked with major corporations around the world including the US, Europe, Africa and the Middle East. We have worked in many industry sectors such as telcos, manufacturing, retail, financial and transport. We provide governance and project management as well as expertise in the leading technologies.May 2012 S 39 © 2012 Data Management & Warehousing
  40. 40. Thank You ©2012 - Data Management & Warehousing http://www.datamgmt.comMay 2012 S 40 © 2012 Data Management & Warehousing