2011 Mining Unique Information Sources & Deep Invisible-Hidden-Opaque Web Recap Final

749 views

Published on

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
749
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2011 Mining Unique Information Sources & Deep Invisible-Hidden-Opaque Web Recap Final

  1. 1. Anna F. ShallenbergerPresident & “Chief Archer”Shallenberger Intelligence Services June 14, 2011anna@targetedknowledge.com203.258.2383917.591.6732 faxwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use
  2. 2.  Definitions vary as to what it is / is not  Many names – deep, invisible, hidden, opaque etc Surface web is “visible” portion  Baseline Research, particularly re size, dated Term coined by Michael Bergman…Anna F. ShallenbergerPresident & “Chief Archer”Shallenberger Intelligence Servicesanna@targetedknowledge.com203.258.2383917.591.6732 faxwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 2
  3. 3.  Pages lower ranked due to Search Engine Optimization [SEO]  Sites coded to exclude bots  Dynamic content generated by page search – stats, etc  Search engine chooses not cover whole site due volume of context  Format – video/image w/o text/tags, or they’re incomplete  Site/pages not connected with pages browser(s)Anna F. ShallenbergerPresident & “Chief Archer”  Password protected pagesShallenberger Intelligence Servicesanna@targetedknowledge.com203.258.2383917.591.6732 faxwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 3
  4. 4. Figure 1. Search Engines: Dragging a Net Across the Webs SurfaceAnna F. ShallenbergerPresident & “Chief Archer”Shallenberger Intelligence Servicesanna@targetedknowledge.com203.258.2383917.591.6732 faxwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 4
  5. 5. Anna F. ShallenbergerPresident & “Chief Archer”Shallenberger Intelligence Servicesanna@targetedknowledge.com203.258.2383917.591.6732 faxwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.com Q4 NPD Search &Portal Site Study, reported by Search Engine Watchwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 5
  6. 6.  Pearl Grow …  ID Leakage Points [factoring in copyright & other IP concerns]  Non Central Hosts of Content  E. G. Content not controlled by “HQ”  Surgical Manual Browsing  Dark Web Browsers [also use pathfinders]Anna F. ShallenbergerPresident & “Chief Archer”  Leverage Your {on & offline} Networks ….Shallenberger Intelligence Servicesanna@targetedknowledge.com203.258.2383917.591.6732 faxwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 6
  7. 7. Conduct Search Evaluate What You Find… Mine the most on point for more ideas …Anna F. Shallenberger Refine your search strategy,President & “Chief Archer” Continue your investigation,Shallenberger Intelligence Services Repeat process as needed…anna@targetedknowledge.com203.258.2383917.591.6732 faxwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 7
  8. 8. Anna F. ShallenbergerPresident & “Chief Archer”Shallenberger Intelligence Servicesanna@targetedknowledge.com www.pearltrees.com203.258.2383917.591.6732 faxwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 8
  9. 9. Look For…  Concepts/Terms/Catch Phrases etc  Names – Experts/Reports/Publications  URL Roots , e.g. are the most relevant loaded on the same part of site  Revisit strategy adding incremental terms and/or re-weighting / editing Boolean linkages  Don’t forget the reverseAnna F. ShallenbergerPresident & “Chief Archer”  What are key terms repeating in the “false drops”Shallenberger Intelligence Servicesanna@targetedknowledge.com203.258.2383917.591.6732 faxwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 9
  10. 10. [factoring in copyright & other IP concerns]  Analogy: Similar to emotional conversations, where “speaker” may or may not  Intend for public [or so much of them] to “hear” [have access to] it  Fully comprehend others’ valuation of the information  Understand originators perspective – “But I only told that to… And they promised not to tell anyone…”  Information Leakage PointsAnna F. ShallenbergerPresident & “Chief Archer”  Reuse by others – clients, ex employeesShallenberger Intelligence Servicesanna@targetedknowledge.com  Conference Presentations203.258.2383917.591.6732 fax  Case Studies or other Sales & Marketing Collateralwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarian  Continuing Education [especially MBA classes]http://closetlibrarian.blogspot.com  Social Mediawww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 10
  11. 11. Non Central Content Hosts, e. g. content not “HQ” controlled  Branch offices of consultants, research firms, usually non-US  Biz units migrating tech platforms [generally post-merger]  Satellite campuses, larger academic institutions  Event-driven sites – conferences, product introductions, etc…Anna F. ShallenbergerPresident & “Chief Archer”  Non-merger partnerships / joint initiativesShallenberger Intelligence Servicesanna@targetedknowledge.com203.258.2383  “Relationships” NEC–specialized social networks , non-profits..917.591.6732 faxwww.targetedknowledge.com [sometimes exec bios NOT pasted verbatim from corp site]http://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 11
  12. 12. Dark Web / Specialized Browsers, etcAnna F. ShallenbergerPresident & “Chief Archer”Shallenberger Intelligence Servicesanna@targetedknowledge.com203.258.2383917.591.6732 faxwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 12
  13. 13. Surgical Browsing  Identify Potential High-Value Sites  Navigate Manually  Create Your Own Site Index Using a Browser It can include  Downloading {smaller} sites into Adobe to browse offline  Looking for cross-linking to site, especially several layers in  Locating historical content in caches or archiving sitesAnna F. ShallenbergerPresident & “Chief Archer”Shallenberger Intelligence Services Be Carefulanna@targetedknowledge.com203.258.2383  While limiting searches by doc type [pdf etc] is effective917.591.6732 faxwww.targetedknowledge.com Searchable layers can mask them behind other file typeshttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 13
  14. 14. Leveraging Who You Know, Digitally & Offline  What do people in that field read, on & offline ? What would they consider a waste of time? A large part of the challenge is indexing…  And you need to ID what they “miss”  Sometimes there’s no GPS, must already know where you’re going, or at least a mid-point…Anna F. ShallenbergerPresident & “Chief Archer”Shallenberger Intelligence Servicesanna@targetedknowledge.com203.258.2383917.591.6732 faxwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 14
  15. 15. Needles in Haystacks aren’t invisible, but they can be more work to locate Some even hide in plain sight Have a plan, but flex it as needed Take good notes, bookmark good leads, save best hits Might not find them again or they change ALWAYS Consider the Source Manage time spent, don’t get lostAnna F. Shallenberger Be Flexible, but still…President & “Chief Archer”Shallenberger Intelligence Services Plan Aheadanna@targetedknowledge.com203.258.2383917.591.6732 faxwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 15
  16. 16. Planning Ahead *AFS example based on model designed by KnowledgeInforrm Based On Questions You Are Seeking To Answer ID Potential Sources, & “Pearl Grow” From ThereAnna F. ShallenbergerPresident & “Chief Archer”Shallenberger Intelligence Servicesanna@targetedknowledge.com203.258.2383917.591.6732 faxwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 16
  17. 17. ID/Target Sources {categorization subjective – many fit multiple} Influencers Consultants / Think Tanks Pollsters / Market Researchers Academia Governmental NGOs & Advocacy Groups Trade/Professional Associations Other Niche Organizations Businesses & Publishers, NEC Aggregators/Re-packagers/Peer-Sharing , NECAnna F. Shallenberger PEOPLE NECPresident & “Chief Archer”Shallenberger Intelligence Servicesanna@targetedknowledge.com203.258.2383917.591.6732 faxwww.targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comwww.linkedin.com/in/annafayshallenbergerwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 17
  18. 18.  Deep Web: Surfacing Hidden Value www.brightplanet.com/images/uploads/DeepWebWhitePaper_20091015.pdf  Marcus Zillman: Deep Web Research www.llrx.com/features/deepweb2011.htm or www.deepwebresearch.info  Chris Sherman @ Information Online www.docstoc.com/Docs/Document-Detail-14.aspx?doc_id=84592274  August Jackson [for SCIP] Getting Most …. http://homepage.mac.com/cornfed/internetdeepweb.pdf  Using Web Investigative Reporting Tool www.slideshare.net/tccj/web-as-investigative-tool or http://campuscoverage.org/sites/default/files/Docs/Presentations/CCPInternet.ppt Model & Analyze Deep Web www.scribd.com/doc/59496007/Modeling-and-Analyze-the-Deep-Web-Surfacing-Hidden-ValueAnna F. ShallenbergerPresident & “Chief Archer”  Accurate & Efficient Crawling Deep WebShallenberger Intelligence Services www.scribd.com/doc/57147960/Accurate-And-Efficient-Crawling-The-Deep-Web-Surfacing-Hidden-Valueanna@targetedknowledge.com203.258.2383  Web & Twitter Archiving @ Library of Congress917.591.6732 fax www.slideshare.net/nullhandle/web-and-twitter-archiving-at-the-library-of-congresswww.targetedknowledge.comhttp://twitter.com/ClosetLibrarian CRS report to Congress www.docstoc.com/docs/84024621/CRS-Report-for-Congresshttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.com  How Much Information [UC Berkeley]www.linkedin.com/in/annafayshallenberger http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/printable_report.pdfwww.ci2020.com/profile/AnnaFShallenbergerhttp://tinyurl.com/AIIP-AFS Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 18
  19. 19. No member of a crew is praised for the rugged individuality of his rowing. ~Ralph Waldo EmersonThanks & Best Regards,Anna F. Shallenberger203.258.2383 cell917.591.6732 faxanna@targetedknowledge.comhttp://twitter.com/ClosetLibrarianhttp://www.slideshare.net/ClosetLibrarianhttp://closetlibrarian.blogspot.comAn experienced researcher, educator, author, blogger, strategist & consultant,Anna Shallenberger, aka the ClosetLibrarian, was recently recognized in Best ofthe Business Web & featured on SlideShare’s home page.At SLA 2011 , Anna was a panelist for “Integrating with Sales & Marketing toCapture & Deliver Intelligence” & led an "Intelligence Café“ discussion regardingUnique Information Sources & the Deep Web. She was also a spotlight panelist@ SLA 2010 & served as conference planner for the CI Division.Anna F Shallenberger, All Rights Reserved, for educational use only, not for redistribution or commercial re-use 19

×