Data Mining And Internet Profiling:<br />Approaches to Successful Online Social Media Investigations<br />Shellee Hale<br />
Shellee Hale - President of Camandago, Inc. <br />WA Licensed Private Investigator<br />CAS – Certified Anti-Terrorism Specialist<br />CEMS – Certified Emergency Management Specialist<br />Specializes in: <br />Cyber Tracing<br />Dataveillance <br />Cyber Warfare Threat Profiling <br />Constituent for the Overseas Security Advisory Council (OSAC)<br />Federal Advisory Committee with a U.S. Government charter to promote security cooperation between US private sector interests worldwide and U.S. Dept. of State<br />Infragard Member<br />Seattle FBI Citizens Academy Alumni Association<br />
Dataveillance<br /><ul><li>Dataveillance is the systematic use of digital personal data in the investigation or monitoring of the actions or communications of one or more persons
Social Media</li></li></ul><li>Search Engines<br />Search engines are algorithmic information retrieval systems that allow searching of massive web-based databases. A web search engine is designed to search for information on the World Wide Web and FTP servers. The search results are generally presented in a list of results and are often called hits. The information may consist of web pages, images, information and other types of files.<br /><ul><li>Google,
Allinbody, [ allinbody:keyword] (Allintitle, Allinurl) </li></ul> Boolean logic<br /><ul><li>Enclose OR statements in parentheses.
Always use CAPS Most engines require that the operators (AND, OR, AND NOT/NOT) be capitalized.
http://www.internettutorials.net/boolean.asp</li></li></ul><li>Always copy urls, because sometimes you can’t backtrack. Google updates its results constantly and with the more than 20 billion websites out there, you may never find the same info again. <br />Take screenshots of content, or consider making use of CAMTASIA, a screen recorder and editing software program.<br />CamtasiaStudio<br />Archive.org<br />History Of A Website<br />Doesn’t include adult<br />Not a complete archive <br />You can remove yourself from the machine with robots.txt <br />
Meta Search Engines<br />Search engines that search other search engines and directories. They extract the best of the searches from various popular search engines and directories and include the information in their own search results. <br /><ul><li>Dog pile,
Ixquick</li></li></ul><li>Directories<br />Search engines that search other search engines and directories. They extract the best of the searches from various popular search engines and directories and include the information in their own search results. <br /><ul><li>Yahoo Directory
Wikipedia</li></li></ul><li>Gateways<br />Collections of databases and informational sites assembled, reviewed and recommended by specialists used to access this material<br />Invisible Web<br /><ul><li>Large portion of the Web that search engine spiders cannot index - 60-80% of web material
Information that isn't static but assembled dynamically in response to specific queries</li></ul>Subject-Specific Databases (Vortals) are devoted to a single subject ie WebMD<br />
Verifying Sources<br /><ul><li>Unlike scholarly books and journal articles, web sites are seldom reviewed or refereed. It's up to you to check for bias and to determine objectivity. Try to assess the stability of the pages you reference.
Understand legitimacy of web address: edu, gov, mil- most reliable sources. com, net ,org Countries have specific codes .ca, .uk, etc
look closely at the page sponsor, last date updated, and the authority of the author(s) if possible.
Research Information on domain ownership whois.net/
Check web traffic Alexa.com</li></li></ul><li>Information Aggregators<br />These are tools which pull in information from multiple sources, and consolidate that information into a smaller and more easily digested number of streams<br />RSS Feeds (Google Reader, Bloglines) <br /> Pull blogs into a single stream of information<br />Spokeo - Big Brother Of Social Networking http://www.pandia.com/sew/620-spokeo.html<br />123people.com– Gateway to Paid databases. Shows available websites around a specific name.<br />Pipl- The most comprehensive people search on the web<br />yoName – Searches Social Networks<br />Brizzly, <br />SeesmicWeb, <br />HootSuite, <br />Dabr, <br />Slandr, etc. <br />Real time news interceder.net<br />Email alerts<br />Real-time news<br />
Website TOS, Privacy Laws And Proposed Regulations<br />Social Media is a key component to profiling a subject of investigation. The pool of information about each individual can form a distinctive “social signature,” But there are limitations to the info you can access on a Social Network due to privacy settings and anonymity. <br />
Issues With Anonymity<br />We have a right to it, but websites are not allowing it via TOS. You can be anonymous online, but how can u be anonymous online when they are asking for real info? <br />If you go into Facebook and setup a profile, their TOS say that is you. You have to have a valid email address, but how do you know that they are using any random email address and name? <br />It is not illegal for internet users to impersonate or create a false identity online. <br />Popularity of a site comes with vulnerability of attack. <br />We are seeing and increase in SPOOFING - ie reset password emails giving someone else ownership of your account. <br />Be advised that accounts under a persons name can be a result of spoofing and not nessicarily created by a user.<br />In the context of network security, a spoofing attack is a situation in which one person or program successfully masquerades as another by falsifying data and thereby gaining an illegitimate advantage.<br />
The Privacy Debate<br />We want privacy We expose private details of our lives online. <br />Once you post something, you are leaving a digital footprint that is owned by the site. <br />Facebook has been receiving a lot of bad press. Users fear of how their data might be used. Privacy Policies and TOS are constantly being changed<br />We are seeing 2 different agendas in terms of advocates in online privacy<br />We put pressure on websites to protect our information, and we do reserve that right.<br />But the same time because of the vast scope and information on social media the government wants a backdoor to get info for investigations and terrorism research. <br />this will leave personal info vulnerable to hackers...<br />Consider This…<br />There are different privacy laws in every country.<br />Check TOS and privacy laws on each websites. They may allow backdoors. <br />
Privacy Settings <br />Its Important to understand privacy laws and settings for major social networks to understand limitations, and how to potentially work around them...<br />Users can select their own privacy settings, and there are few ways to get around them,<br />Facebook Profiles Offer<br /><ul><li>Phone numbers,
Custom </li></li></ul><li>Public Tweets:<br />Your updates appear in Twitter’s public timeline — a flowing river of every member’s status.<br />Anyone can see your Twitter updates.<br />Your Twitter updates can be indexed by search engines.<br />Protected Tweets:<br />People will have to request to follow you and each follow request will need approval<br />Your Profile and Tweets will only be visible to users you've approved<br />Protected Profiles' Tweets will not appear in Twitter search<br />@replies sent to people who aren't following you will not be seen<br />You cannot share static page URL's with non-followers<br />
Default Settings:<br /> By default, people on MySpace can see when you’re online. Your profile and photo is also set to be viewable by everyone.<br />Privacy Options:<br />MySpace’s privacy options are very limited, but changing three key settings can provide you with some important privacy protection:<br /><ul><li>Online Now
Photos</li></li></ul><li>Tips & Tricks<br />If you have an email address you want to put a face to, you can also find who owns an email address by searching the email address in the Facebook search window. <br />Anyone can create a fake profile so use this to your advantage. Some users will allow friends of friends to access part if not all of a profile. Befriend a friend of someone you are investigating. <br />RESOURCE:<br />How To Protect Your Privacy on<br />Facebook, Myspace, And Linked In<br />http://www.mint.com/blog/moneyhack/howto-protect-your-privacy-on-facebook-myspace-and-linkedin/<br />How do you get in and see info if its been deleted?<br />Tweletedallowed you to recover Twitter message<br />If user a quotes user b who then removes tweet, it will still show up in user a'squotes. <br />
Properly DocumentingSocial Media Investigations<br />Always copy urls, because sometimes you cant backtrack. Google updates its results constantly and with the more than 20 billion websites out there, you may never find the same info again. <br />Take screenshots of content. (ie. craigslist ads)<br />Consider making use of CAMTASIA, a screen recorder and editing software program.<br /><ul><li>Take Screencapson the fly
Organizational tools - Search for your captures </li></ul> by date, website, or a custom flag that you create and assign.<br />
Centrifuge Systems<br />Centrifuge has created a powerful approach to analysis called “Interactive Analytics”. Our next generation approach provides groundbreaking visualizations accessible from any browser and any operating system. <br />“Interactive Analytics” (IA) is based on extensive work with the US Intelligence Community and brings together three innovations in analytics today, Interactive Data Visualization, Unified Data Views and Collaborative Analysis.<br />http://www.centrifugesystems.com/<br />
AskSam.com<br />Afree-form database designed for users rather than programmers (Like a CMS)<br /><ul><li>Easy to turn anything into a searchable database:
Self-preservation goal (‘likeable’) increases deception</li></ul> “Electronic mail is a godsend. With e-mail we needn’t worry about so much as a quiver in our voice or a tremor in our pinkie when telling a lie. Email is a first rate deception-enabler.” <br />~Keyes (2004) The Post-Truth Era<br />
True Personality vs. Embellished Identity<br /><ul><li> i,
hers </li></ul>Changing pronouns as benign as it seems is the queen mother of linguistic violations and is a very strong indication that deception might be present!<br />for instance our house vs. my house<br /><ul><li>In many cases if a person does not start with "i" the statement is more likely to be lacking credibility.
Online Deception<br />The ambiguity of the Internet allows complete anonymity, providing the user with the ability to create false and misleading profiles and identities online, thus hiding their true identity. <br /><ul><li>gender swapping online,
Adults posing as children etc</li></ul>lies or exaggerations of <br />one’s physical appearance, <br />personality or characteristics, <br />or even slight exaggerations of a genuine characteristic such as denying being a smoker, drinker, etc. <br />One can have ‘as many electronic personas as one has time and energy to create’ (Donath, 1999).<br />
CASE STUDY ON DECEPTION ON FACEBOOK<br />STUDY<br />the University of Texas at Austin that suggest users express their true personality – not an embellished identity – over online social networks such as Facebook.<br />The Texas researchers collected 236 profiles of college-aged users of Facebook in the United States and StudiVZ, the equivalent in Germany. The users filled out questionnaires about their personality and also about who they'd like to be. Strangers browsed and rated the online profiles, and the study authors compared the ratings with the users' questionnaires.<br />FINDINGS:<br />Networks such as Facebook are more “genuine mediums for social interactions than vehicles for self-promotion,”<br />But whether honesty on Facebook comes naturally or is necessitated by your audience is up for debate “You don't have full control over it. Other people can write things on your wall and tag you in unflattering photos. etc” Stated Professor Hancock<br />
Detecting Deception<br />Inconsistencies in actions or words do not necessarily indicate a lie, just as consistency is not necessarily a guarantee of the truth.<br />However, a pattern of inconsistencies or unexplainable behavior normally indicate deceit.<br />
Techniques For Identifying Deceit<br />Control Questions<br />Repeat questions<br /><ul><li>Should not be exact repetitions of an earlier question.
The investigator must rephrase or otherwise disguise the previous question.
Repeat questions also need to be separated in time from the original question so the information cannot easily be remembered.</li></ul>Developed from recently confirmed or known information that is not likely to have changed. <br />If the answer to a control question is not given as expected, it may be an indicator of deceit.<br />Example:<br />Q1 – What was the score of the baseball game? <br />A1 – Well, first of all, you wouldn’t believe how much the tickets cost; then I had to get something to eat, which is a total waste of money....<br />Topical Examples:<br /><ul><li> Last day of school, Vacation dates
Video game trivia</li></li></ul><li>Internal Inconsistencies<br />Frequently when someone is lying, an investigator will be able to identify inconsistencies in the timeline, the circumstances surrounding key events, or other areas within the questioning. <br />For example, someone spends a long time explaining something that took a short time to happen, or a short time telling of an event that took a relatively long time to happen. <br />Example:<br />Q1 – What was the score of the baseball game? <br />A1 – Well, first of all, you wouldn’t believe how much the tickets cost; then I had to get something to eat, which is a total waste of money....<br />
“Placement” and “Access”<br />Based on a person’s job, geographical location, age, etc., investigators should have a basic idea of the breadth and depth of information that such a person should know. <br />When answers show that someone does not have the expected level of information (too much or too little or different information than expected), this may be an indicator of deceit.<br />Example:<br />In an extreme case, if someone is interrupted in the middle of a statement on a given topic, they will have to start again at the beginning in order to “get the story straight.”<br />Repeated Information<br /><ul><li>Often if someone plans on lying about a topic, they will memorize or practice exactly what they are going to say.
If they always relate an incident using exactly the same wording, or answer ‘repeat’ questions identically (word for word) to the original question, it may be an indicator of deceit.</li></li></ul><li>Incongruent Appearance and Incongruent Language<br />If someone’s online appearance does not match their story, it may be an indication of deceit.<br />If the type of language, including sentence structure and vocabulary, does not match the story, this may also be an indicator of deceit.<br />Example:<br />If the suspected liar does not use the proper technical vocabulary to match an otherwise familiar story, this may be an indicator of deceit.<br />