Internet Intelligence




Robert Crayford


 •   Copyright © Halliwells LLP 2008 All rights reserved.
The Internet



  “Have you heard of this new thing called
  the internet? It's giving people new
  expectations. It's allowing them to
  become their own expert. Knowledge lies
  anxious at their fingertips”
Roy H. Williams
Internet Intelligence


• Open Source
• Social networking sites.
• Internet footprint
• Questions
Open Source searching


• Open source searching refers to any site that,
  does not need a password or log in to enter.
• The more common open source searches relate
  to search engines.
Deep Web Searching


• The term Deep Web refers to information found
  on Web sites that is hidden or generally
  inaccessible through traditional search methods
Deep Web searching




• Searching social networking sites and newsgroups/forums
  is an example of deep web searching.
• The information would not be found from searching search
  engines.
• It is important to remember that there is a lot of data that
  can only be found through deep web searching
• To search the deep web you need to locate online
  databases and forums and search them individually
Search Engines


• When you search the web using a search engine, you are
  always searching a somewhat stale copy of the real web
  page. When you click on links provided in a search engine's
  search results, you retrieve from the server the current
  version of the page.
• Search engine databases are selected and built by
  computer robot programs called spiders. These "crawl" the
  web, finding pages for potential inclusion by following the
  links in the pages they already have in their database (i.e.,
  already "know about").
Search engines
Search engines


• If a web page is never linked to in any other page, search engine
  spiders cannot find it. The only way a brand new page - one that
  no other page has ever linked to - can get into a search engine is
  for its URL to be sent by some human to the search engine
  companies as a request that the new page be included. All search
  engine companies offer ways to do this.
• Many web pages are excluded from most search engines by
  policy. The contents of most of the searchable databases
  mounted on the web, such as library catalogs and article
  databases, are excluded because search engine spiders cannot
  access them. All this material is referred to as the Invisible web-
  what you don't see in search engine results.
One Enough??


• Less than half the searchable Web is fully searchable in Google.
• The percent of total results unique to one search engine was
  established to be 88.3 percent.

• The percent of total results shared by any two search engines
  was established to be 8.9 percent.

• The percent of total results shared by three search engines was
  established to be 2.2 percent.

• The percent of total results shared by the top four search engines
  was established to be 0.6 percent.
One Enough??


• The majority of first page results are unique:
• On average, 69.6 percent of Google first page search results
  were unique to Google.

• On average, 79.4 percent of Yahoo! first page search results
  were unique to Yahoo!

• On average, 80.1 percent of Live first page search results were
  unique to Live.

• On average, 75.0 percent Ask first page search results were
  unique to Ask.
Social Networking Sites
The Top 9 Social Networking Sites by internet visits

Rank    Name                 Domain                      Market Share %


1       Facebook             www.facebook.com            37.7

2       Bebo                 www.bebo.com                28

3       Myspace              www.myspace.com             18.97

4       Faceparty            www.faceparty.com           2.01

5       Windows Live Space   Spaces.live.com             1.99

6       BBC h2g2             www.bbc.co.uk/dna           1.25

7       Stumble Upon         www.stumbleupon.com         1.19

8       Club Penguin         www.clubpenguin.com         1.05

9       Friends Reunited     www.friendsreunited.co.uk   0.88
Investigator footprint
I.P Addresses


• All computers across the internet are assigned a
  unique identifier called an IP address. They are
  used like street addresses so other computers
  can find them. An IP address could look
  something like this: 87.242.211.23.
• Websites can log any IP addresses that look at
  their site.
• IP addresses can then be traced back to the
  server.
IP Address
I.P


• They could then Google or yahoo “Halliwells” and
  “Manchester” to find our address.
• IP Address finder:


• http://www.ip-adress.com/
Search Results


• Webmasters can even trace, what search term
  you used to find their website.
• For example, if you searched for fraudulent
  people in Liverpool and then clicked on one of
  the search results, the owner of the site found in
  the search could see that you were searching for
  fraudulent people in Liverpool.
Search Results


• To avoid this, most search results provide the
  URL of the results. You can copy and paste this
  in to a new web browser.
Cloaking


• There are many web based proxys that claim to
  hide your IP address.
• These sites are untested- and this must be
  considered while using them.
• The websites records information of who blocked
  who, to look at what.
• http://www.the-cloak.com/anonymous-surfing-
  home.html
Tracing Emails


• You can trace a IP address of the server the email was sent
  from.
• Web mail tracing would reveal the IP address of the web
  mail server. e.g. Hotmail.
• The IP address is hidden in the internet header of the
  email.
• You can either search through the headers to find the IP
  address or you can paste the header on to an online
  engine and it will find it for you.
• http://www.ip2location.com/emailtracer.aspx
Tracing Emails
Tracing Emails
BBC news 6/12/1998
Halliwells Website 27.11.2004
Any Questions


• Robert Crayford
• Robert.crayford@halliwells.com
• 0161 618 4312

Internet Intelligence

  • 1.
    Internet Intelligence Robert Crayford • Copyright © Halliwells LLP 2008 All rights reserved.
  • 2.
    The Internet “Have you heard of this new thing called the internet? It's giving people new expectations. It's allowing them to become their own expert. Knowledge lies anxious at their fingertips” Roy H. Williams
  • 3.
    Internet Intelligence • OpenSource • Social networking sites. • Internet footprint • Questions
  • 4.
    Open Source searching •Open source searching refers to any site that, does not need a password or log in to enter. • The more common open source searches relate to search engines.
  • 5.
    Deep Web Searching •The term Deep Web refers to information found on Web sites that is hidden or generally inaccessible through traditional search methods
  • 6.
    Deep Web searching •Searching social networking sites and newsgroups/forums is an example of deep web searching. • The information would not be found from searching search engines. • It is important to remember that there is a lot of data that can only be found through deep web searching • To search the deep web you need to locate online databases and forums and search them individually
  • 7.
    Search Engines • Whenyou search the web using a search engine, you are always searching a somewhat stale copy of the real web page. When you click on links provided in a search engine's search results, you retrieve from the server the current version of the page. • Search engine databases are selected and built by computer robot programs called spiders. These "crawl" the web, finding pages for potential inclusion by following the links in the pages they already have in their database (i.e., already "know about").
  • 8.
  • 9.
    Search engines • Ifa web page is never linked to in any other page, search engine spiders cannot find it. The only way a brand new page - one that no other page has ever linked to - can get into a search engine is for its URL to be sent by some human to the search engine companies as a request that the new page be included. All search engine companies offer ways to do this. • Many web pages are excluded from most search engines by policy. The contents of most of the searchable databases mounted on the web, such as library catalogs and article databases, are excluded because search engine spiders cannot access them. All this material is referred to as the Invisible web- what you don't see in search engine results.
  • 12.
    One Enough?? • Lessthan half the searchable Web is fully searchable in Google. • The percent of total results unique to one search engine was established to be 88.3 percent. • The percent of total results shared by any two search engines was established to be 8.9 percent. • The percent of total results shared by three search engines was established to be 2.2 percent. • The percent of total results shared by the top four search engines was established to be 0.6 percent.
  • 13.
    One Enough?? • Themajority of first page results are unique: • On average, 69.6 percent of Google first page search results were unique to Google. • On average, 79.4 percent of Yahoo! first page search results were unique to Yahoo! • On average, 80.1 percent of Live first page search results were unique to Live. • On average, 75.0 percent Ask first page search results were unique to Ask.
  • 14.
  • 15.
    The Top 9Social Networking Sites by internet visits Rank Name Domain Market Share % 1 Facebook www.facebook.com 37.7 2 Bebo www.bebo.com 28 3 Myspace www.myspace.com 18.97 4 Faceparty www.faceparty.com 2.01 5 Windows Live Space Spaces.live.com 1.99 6 BBC h2g2 www.bbc.co.uk/dna 1.25 7 Stumble Upon www.stumbleupon.com 1.19 8 Club Penguin www.clubpenguin.com 1.05 9 Friends Reunited www.friendsreunited.co.uk 0.88
  • 16.
  • 17.
    I.P Addresses • Allcomputers across the internet are assigned a unique identifier called an IP address. They are used like street addresses so other computers can find them. An IP address could look something like this: 87.242.211.23. • Websites can log any IP addresses that look at their site. • IP addresses can then be traced back to the server.
  • 18.
  • 19.
    I.P • They couldthen Google or yahoo “Halliwells” and “Manchester” to find our address. • IP Address finder: • http://www.ip-adress.com/
  • 20.
    Search Results • Webmasterscan even trace, what search term you used to find their website. • For example, if you searched for fraudulent people in Liverpool and then clicked on one of the search results, the owner of the site found in the search could see that you were searching for fraudulent people in Liverpool.
  • 21.
    Search Results • Toavoid this, most search results provide the URL of the results. You can copy and paste this in to a new web browser.
  • 22.
    Cloaking • There aremany web based proxys that claim to hide your IP address. • These sites are untested- and this must be considered while using them. • The websites records information of who blocked who, to look at what. • http://www.the-cloak.com/anonymous-surfing- home.html
  • 23.
    Tracing Emails • Youcan trace a IP address of the server the email was sent from. • Web mail tracing would reveal the IP address of the web mail server. e.g. Hotmail. • The IP address is hidden in the internet header of the email. • You can either search through the headers to find the IP address or you can paste the header on to an online engine and it will find it for you. • http://www.ip2location.com/emailtracer.aspx
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
    Any Questions • RobertCrayford • Robert.crayford@halliwells.com • 0161 618 4312