Web Archiving Whitepaper


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Web Archiving Whitepaper

  1. 1. Web Archiving: The Next Phase in the Evolution of Archivingby An Osterman Research White Paper Published November 2010 SPONSORED BY ( ! ! ! ! ! ! ! ! !! ! ! ! !"#$!#%&()*( Osterman Research, Inc. • P.O. Box 1058 • Black Diamond, Washington 98010-1058 Tel: +1 253 630 5839 • Fax: +1 253 458 0934 • info@ostermanresearch.com www.ostermanresearch.com • Twitter: @mosterman !
  2. 2. Web Archiving: The Next Phase in the Evolution of ArchivingExecutive SummaryOVERVIEWThe web has become the primary communication and commerce channel for businessesand government agencies. Digital media (web sites and other web-based content) hasall but replaced print media as the primary mode of communication with customers,constituents, prospects, investors and others. The web is also becoming the primarychannel for transacting business, managing commerce for everything from onlinepurchases to tax payments.However, business and governments do not yet understand that they areliable for everything they publish online. Organizations that do not archiveweb content run the risk of not preserving a record of their claims, offers andother content posted on their web sites. Retaining this content has becomeboth a legal and regulatory requirement, and so the question is not if webcontent should be retained, but only how much and for how long it should bepreserved.Web archiving has been going on for quite some time, but enterprise-class solutionshave only recently become available. New, state-of-the-art technology is now availableto manage web archiving and it has the power and flexibility to meet existing andemerging web archiving requirements. As a result, any organization that uses the webto communicate or manage commerce should consider developing a web archivingpolicy and deploy the appropriate technology to support that policy.KEY TAKEAWAYSThe fundamental message of this white paper is:• Web archiving is, without question, a best practice for virtually any organization. Organizations that do not archive web content are placing their organizations at unnecessary risk from both a legal and regulatory viewpoint, and they are denying themselves the use of capabilities that can provide a distinct competitive advantage.• Web archiving is fundamentally identical to what many organizations have already implemented in the context of email archiving, file archiving and long-term retention of other types of important business content. In essence, web archiving is merely a superset of traditional types of archiving that are already well established in business and government.• Many current web archiving technologies are not designed with enterprise-class capabilities that will retain web content of evidentiary value.• Organizations should consider developing a web archiving policy, particularly as more content migrates to the web and web-based applications.©2010 Osterman Research, Inc. 1
  3. 3. Web Archiving: The Next Phase in the Evolution of ArchivingABOUT THIS WHITE PAPERThis white paper discusses the importance and benefits of web archiving and varioususe cases for it. It also briefly discusses the sponsor of this white paper and theirrelevant offerings in the space.Why the Web Represents the Next Phase of ArchivingWHAT IS WEB ARCHIVING?Web archiving is what its name implies: the capture and archival storage of web-basedcontent. This can include individual web pages, entire web sites, content from web 2.0applications like social networking sites, and other web-based content that is importantto capture and retain, normally for long periods.The concept of web archiving is not new. For example, the Wayback Machine – a webarchiving service maintained by the non-profit organization Internet Archive based inSan Francisco, California – has been archiving web content since 1996i. However, theWayback Machine has several limitations for use in a business context:• Web content is captured only periodically, not on a regular basis. This can prevent the capture of a large proportion of web content, particularly for sites that update content frequently. Further, changes to a web page or web site may not be captured if the change occurs between content “snapshots”, the frequency of which is determined by Internet Archive.• There is no guarantee that all web content will be captured.• Web content is not necessarily captured in a way that will satisfy evidentiary rules during legal or regulatory proceedings.As a result, while the Wayback Machine is a good first step toward archiving webcontent, more sophisticated – and enterprise-class – web archiving is becoming anecessity for a growing number of applications, as discussed below.WHAT DRIVES THE NEED FOR WEB ARCHIVING?Many of the drivers for web archiving are fundamentally the same as those for emailand other electronic content archiving:• Web content can be required for e-discovery and other litigation support requirements in much the same way that emails, word processing files, PDF files and other content are required.• Similarly, web content can be required to demonstrate an organization’s compliance (or lack thereof) with regulatory requirements in the context of advertising, forward- looking statements, claims of suitability and other content that must – or must not – be posted to web sites.©2010 Osterman Research, Inc. 2
  4. 4. Web Archiving: The Next Phase in the Evolution of Archiving• Many organizations have a requirement, often driven by a need to reduce risk or maintain adequate records, to preserve web site content as part of their overall records retention and records management strategy.• Unlike more traditional forms of archiving, web archiving can actually be used as a competitive and/or investigative tool to understand content posted on competitors’ web sites.WEB ARCHIVING vs. SERVER BACKUPSThere are some significant differences between web backups and web archives:• Although both a backup and an archive of a Web site can reproduce content at a later date for forensic, e-discovery or data mining purposes, a web archive will do so more quickly, more affordably and more easily.• Because of the ubiquity of database-driven web sites, a backup must retain archives of all of the files, as well as all of the databases that control the web site.• Searching through backups of a web site is much more difficult and more time- consuming than searching through an archive.WEB ARCHIVING: THE NEXT STEPWeb archiving can rightly be considered the next logical extension of an organization’straditional archiving of email, files and other electronic content. While email and othertypes of electronic content archiving tend to focus on internal content – emails sent toand from employees and business, word processing files and presentations created forinternal uses, and so forth – web archiving trends to focus much more on publiclyavailable content. Because the web – including static web sites, web applications, socialnetworking content, etc. – is primarily public-facing in nature, web archiving focusesprimarily on content that the public has already seen or has had the opportunity to see.As a result, web archiving is focused to a greater degree than traditional electroniccontent archiving on issues like brand protection; reputation management; policyenforcement; protection of content based on when it is created, posted and takendown; business continuity and corporate memory.Archiving Is Already an Established Best PracticeTHE WEB IS GROWING RAPIDLYThe amount of content on the web has ballooned exponentially in recent years. Forexample, as of December 2009, there were 234 million web sites, 47 million of whichwere added just in 2009ii - an average of nearly 129,000 web sites added every day.Further, even as far back as 2008 there were well in excess of one trillion unique URLson the web and the number continues to grow at a rapid pace.Growth of the web is being driven by a number of factors, including the ubiquity of webaccess, the ease and low cost with which content can be published and updated, and©2010 Osterman Research, Inc. 3
  5. 5. Web Archiving: The Next Phase in the Evolution of Archivinggreater cultural acceptance of the web as a medium of information-sharing andcommerce. For these reasons, both business and government are increasingly reliant onthe web as their primary means of communications and process management.Consequently, the market for web archiving – as well as archiving of email, files,SharePoint content and other information – is growing at a healthy pace. Webarchiving, currently a small segment of the total content archiving market, is poised tobecome an enormous area of growth, driven by the issues discussed in this white paper.GROWTH IN THE MARKET IS DRIVEN BY A VARIETY OF FACTORSFor just about any company, government agency or educational institution, there arefour primary drivers for archiving their electronic content. However, the importance ofthese drivers will vary by an organization’s size, the industry(ies) in which it participates,the advice of its internal and external legal counsel or compliance officers, and thelocales in which it operates:• Driver #1: Litigation Electronic content stores, including web sites, contain a growing proportion of business records that must be preserved for long periods of time. Further, this content is frequently requested during discovery proceedings because of the Federal Rules of Civil Procedure (FRCP) and state versions of the FRCP. As a result, it is critical that all relevant electronic content be made available for e-discovery purposes. Further, when a hold on data is required, it is imperative that an organization immediately be able to begin preserving all relevant data. For example, if a dispute arises because of a claim made on a page of a company’s web site, that content must be preserved for as long as a court, regulator or other authorized entity may deem necessary. An enterprise-class web archiving system allows organizations to immediately place a hold on data when requested by a court or on the advice of legal counsel. If an organization is not able to adequately place a hold on data when it is obligated to do so, it can suffer a variety of serious consequences, ranging from embarrassment to major legal sanctions or heavy fines. Litigants that fail to preserve electronic content properly are subject to a wide variety of consequences, including brand damage, additional costs for third-parties to review or search for data, court sanctions, directed verdicts or instructions to a jury that it can view a defendant’s failure to produce data as evidence of culpability. In addition to the e-discovery and legal hold benefits, an enterprise-class web archiving system allows an organization to perform either formal or informal early case assessment activities. For example, if a customer makes a claim against a company based on a statement made on the company’s web site, senior managers can search the archive for information that will help them determine the potential liability they face. If this assessment of the potential lawsuit results in a determination that the company was indeed wrong in making the claim, they can instruct legal counsel to pursue a quick legal settlement. If, on the other hand, the©2010 Osterman Research, Inc. 4
  6. 6. Web Archiving: The Next Phase in the Evolution of Archiving assessment results in the discovery of information that supports the company’s position, that information can be used to convince the customer to drop the case or it can help win the case if it goes to trial. In either case, an archiving system can help the organization to understand its position early on, either avoiding unnecessary legal fees or an adverse judgment, or reducing its costs by proving the sufficiency of its case.• Driver #2: Regulatory Compliance For just about every organization, there are a large and growing number of regulatory obligations to preserve electronic content. Some of the more important requirements are: o Sarbanes-Oxley Act of 2002 The Sarbanes-Oxley Act of 2002 requires all public companies and their auditors to retain such relevant records as audit workpapers, memoranda, correspondence and electronic records for a period of seven years. Further, Section 403 of Sarbanes-Oxley amended Section 16 of the Securities and Exchange Act of 1934 to include a requirement for public companies to post certain types of content on their web sites. Under Sarbanes-Oxley, company officers are obliged to report internal controls and procedures for financial reporting and auditors are required to test the internal control structures. Businesses have to ensure that information is preserved – whether paper or electronic – that would be relevant to the company’s financial reporting. o Health Insurance Portability and Accountability Act of 1996 (HIPAA) All organizations operating in the healthcare field need to comply with HIPAA to ensure the safety of Protected Health Information. Organizations are required to protect the data from unauthorized users, as well as to retain for six years a broad range of documentation regarding their compliance. As part of the American Recovery and Reinvestment Act of 2009 (ARRA), the provisions of HIPAA have been significantly expanded. A key component of ARRA is the Health Information Technology for Economic and Clinical Health Act (HITECH). Now, business partners of entities already covered by HIPAA, such as pharmacies, healthcare providers and others, are required to comply with HIPAA provisions. This includes attorneys, accounting firms, external billing companies and others that do business with covered entities. While these business associates were accountable to the covered entities with which they did business under the old HIPAA, these associates are now liable for governmental penalties under the new law. HIPAA violations have been expanded dramatically. For example, if a covered entity or one of their business associates loses 500 or more patient records, it must notify HHS and a “prominent media outlet” to let them know what has occurred. Section 13402 of HITECH requires that if a “covered entity has insufficient or out-of-date contact information for 10 or more individuals, the©2010 Osterman Research, Inc. 5
  7. 7. Web Archiving: The Next Phase in the Evolution of Archiving covered entity must provide substitute individual notice by either posting the notice on the home page of its web site or by providing the notice in major print or broadcast media where the affected individuals likely reside.” Fines for HIPAA violations can now reach as high as $1.5 million per calendar year. o Securities and Exchange Recent FINRA Disciplinary Commission Rules Actions Related to Web Content Members of national securities exchanges, brokers and dealers are • An individual posted false and obliged to preserve all records for a misleading information on a minimum of six years, the first two years Google Finance bulletin board in an easily accessible place (SEC Rule relating to securities recomm- 17a-4). The affected records are broad endations. The posting contained and encompass originals of predictions and projections of communications generated and received future prices for the securities that by individuals within financial institutions, were recommended, but the posting was made without including inter-office memoranda and approval. FINRA fined the internal audit working papers. Also individual $10,000 and suspended included are automated messages sent to him from associating with any all customers, which could include email FINRA member for six months. blasts. The records may be "immediately produced or reproduced on micrographic • A company made false and media [microfilm, microfiche or similar] misleading statements on its web or by means of electronic storage media. site related to low cost As noted above the Securities and commission rates and direct Exchange Act of 1934 has been amended access to traders. The company was censured and fined $20,000. to specifically include the requirement to post certain types of content on the web. • An affiliate of a company participated in and won CD o Financial Industry Regulatory auctions without disclosing it was Authority (FINRA) an auction participant. Further, FINRA is a non-governmental regulator the advertising materials used formed in 2007 by the merger of various contained misleading, functions of the New York Stock unwarranted and exaggerated Exchange and the National Association of statements, and published Securities Dealers. FINRA manages a misleading market clearing yields wide variety of rules that are imposed on its web site. The company was found to have violated Rule 2210 upon the more than 5,000 brokerage and fined $225,000. firms and nearly 675,000 registered representatives it oversees. FINRA requires that various types of communications with the public must be filed prior to their use, including content that often would be posted on web sitesiii. This includes CMO advertisements, sales literature and investment analysis tools.©2010 Osterman Research, Inc. 6
  8. 8. Web Archiving: The Next Phase in the Evolution of Archiving o Model Requirements for the Management of Electronic Records (MoReq) MoReq is a specification, originally developed in 2001, that defines the functional requirements for the manner in which electronic records are managed in an Electronic Records Management System. MoReq has been used widely in Europe and has been updated with MoReq2. o Other requirements A small sampling of the many other requirements for data retention are FINRA 3010, the Investment Advisors Act of 1940 (hedge funds), the Gramm-Leach- Bliley Act, IDA 29.7, FDA 21 CFR Part 11, OCC Advisory, the Financial Modernization Act 1999, Medicare Conditions of Participation, the Fair Labor Standards Act, the Americans with Disabilities Act, the Toxic Substances Control Act, the UK Companies Act, the UK Company Law Reform Bill - Electronic Communications, the UK Combined Code on Corporate Governance 2003, the UK Human Rights Act, Basel II, and the Markets in Financial Instruments Directive.• Driver #3: Knowledge Management and Data Mining There is an enormous amount of useful content that is posted to a company’s own web site or other sites. This includes identifying and extracting information about companies’ products, their public financial information, their participation in trade shows and a wealth of other types of content. Applications for this information include competitive analysis, determination of compliance with various statutes, performing analytics to determine at what time of year certain events take place, and so on.• Driver #4: Maintain Corporate Memory Web archiving can be very useful for maintaining a corporate record of what has been posted to a web site, how long this content was maintained or when it was replaced. For example, a company may want a record of its web site for historical purposes, or it may need an archive in order to re-use some of its content at a later date. Maintaining an accurate archive of web content can significantly reduce the costs associated with recreating this content.The Consequences of Not Archiving Web ContentThe vast majority of organizations do not adequately archive their web content and theyface a number of risks from not doing so:• Increased risk in legal disputes An inability to produce past content from web sites – as with any electronic content – carries with it increased risk during legal actions. This includes an inability to produce time-stamped copies of web pages that will be admissible in court, an inability to respond to e-discovery requests when specific web content is required, and an inability to place legal holds on data so that existing web content is not overwritten when a legal dispute has been initiated or is anticipated.©2010 Osterman Research, Inc. 7
  9. 9. Web Archiving: The Next Phase in the Evolution of Archiving• Risk of non-compliance with regulatory obligations Many heavily regulated organizations, such as broker-dealers, have specific obligations to make (or not make) statements or claims on their web site. For example, FINRA Rule 2210 requires broker-dealers to archive their institutional communications, retail communications and correspondence. Because advertising and other public-facing communications often appear on regulated entities’ web sites, it is critical that web content is archived.• Loss of context for notices, marketing messages, etc. An organization that is not able to archive its web content cannot easily provide the context for its various web-based marketing messages and other communications. The use of this otherwise lost historical data can help a company keep track of past marketing campaigns, offers, policy statements, notifications to the public and a wide range of other content.• An inability to prove when statements were made or retracted Similarly, not archiving web content makes it very difficult to prove exactly when content was posted or removed from a web site or web page. For example, if a press release is embargoed until a certain date and time, a web archiving system can demonstrate exactly when the content was posted, and conversely can prove that the content was not posted before the embargo had been lifted. Another example is that of warning letters issues by the US Food and Drug Administration. These letters warn pharmaceutical manufacturers and other regulated companies about misleading statements, missing information and other claims. As but one of the many examples of such letters is an October 18, 2010 letteriv to a pharmaceutical company, in which it was advised that two of its web pages discussing a magnetic resonance imaging contrast media it produces “omits important information about the approved indication for [the product], and both webpages misleadingly suggest unapproved new uses for the drugs.” Maintaining a web archive is critical to ensuring that an accurate record of content can be preserved and demonstrated when required.• Loss of digital heritage/corporate memory When web content is not archived, a significant proportion of an organization’s digital heritage – or corporate memory – simply disappears. Preservation of this content is important on a number of levels – legal, regulatory, productivity, etc. – but also because it represents something of the corporate history of the firm in the form of announcements to the public and other content that constitutes an organization’s digital record.• An inability to gauge the effectiveness of web campaigns Some organizations use their web site extensively to present marketing campaigns, post notifications of sales or special offers, and announce promotions of various types. If an organization cannot accurately archive its web content, it is at a disadvantage when attempting to correlate customer activity like sales calls or web inquiries with the specific timing of announcements and other web content.©2010 Osterman Research, Inc. 8
  10. 10. Web Archiving: The Next Phase in the Evolution of Archiving• Productivity and monetary loss from recreating unarchived content If web content is not archived and must be recreated, there can be significant time and money lost by those who created the original content, those who must code the content anew, etc. A web archive can, therefore, make various types of employees more efficient and save the organization money by allowing web content to be easily discovered and reused.There Are Many Use Cases for Web ArchivingThere is a large and diverse set of use cases for web archiving, some examples of whichare discussed below:• Facilitating regulatory compliance There is a wide range of applications for web archiving in the context of regulatory compliance. For example, state consumer protection agencies, the Federal Trade Commission, various watchdog groups and similar organizations worldwide have an interest in monitoring the claims, advertising, marketing messages and other content posted by companies on their web sites. Archiving web content from these organizations is crucial to monitoring their compliance with various regulations and statutes. As but one example of the myriad such compliance obligations that exist is the aforementioned FINRA Rule 2210, a set of compliance obligations imposed on broker-dealers and certain others in the financial services industry to advertise their services accurately. Similarly, government agencies have obligations with regard to state sunshine and freedom-of-information laws to provide content to citizens upon demand. Archiving of web content posted on government-operated web sites is key to helping government agencies fulfill their obligations under these requirements.• Checking web content for copyright violations Web archiving can be extremely useful in capturing content from various sources on the web and then searching that content for potential violations of copyright. For example, a major US-based men’s magazine uses the Wayback Machine roughly every month to search for content on the web that might be using its trademarked logo or other content, particularly its published images. As noted above, while the Wayback Machine offers some utility for this type of application, an enterprise-class web archiving capability can provide timelier and more complete information, not to mention the ability to accurately determine when content was posted and deleted from web pages. This can be particularly important in cases where a violator takes down content after receiving notice of a legal action by a copyright holder – an inability to prove exactly when the content was taken down can undermine a legal case. An important case in this regard was Innervision Web Solutions’ use of the domain name “DellComputersSuck.com”. Because Dell contended that Innervision had used the domain name to redirect visitors to the Innervision web site for commercial gain, and because they were able to prove this based on archived web content, Dell was©2010 Osterman Research, Inc. 9
  11. 11. Web Archiving: The Next Phase in the Evolution of Archiving able to have this domain transferred to its ownership because Innervision was found to have registered the domain in bad faith.• Proving the bona fides of expert witnesses The Federal Rules of Civil Procedure, Rule 26 requires that expert witnesses whose testimony is introduced during legal proceedings offer “the witnesses’ qualifications, including a list of all publications authored in the previous 10 years.” Because a growing proportion of many such experts’ publications are electronic in nature, such as blog posts or other web-based content, it is increasingly important for this content to be available to all parties during a legal proceeding. From the perspective of the litigating party that has not hired an expert witness, it is particularly important to be able to access web archives of all of the content offered by that witness. For example, if a litigant can access content older than 10 years, or if they can uncover an obscure blog post that might be contrary to the testimony offered in court, this may prove to be extremely valuable.• Demonstrating the veracity of electronic content In Vinhnee v. American Express, the defendant owed American Express in excess of $40,000 and the company sued to recover. Although American Express presented records of the defendant’s monthly statements, the company could not demonstrate the authenticity of these records and so lost the case, even after an appeal. In another case, Janssen-Ortho Inc. v. Novopharm Limited, an affidavit was presented that contained the link to a home page, but it did not include a copy of the page contents. The Federal Court in Canada that heard the case did not accept this affidavit, finding it to represent insufficient evidence. In both cases, a web archiving capability that could demonstrate the veracity of the information presented, along with verifiable time and date stamping, would likely have enabled the losing party to win its case.• Performing marketing analysis A web archiving capability can be very useful when researching various types of marketing messages as part of a promotional campaign, even when this research is about a competitor. For example, a hotel chain may wish to archive the web content of its three leading competitors to determine when specific messages were posted to the web and when they were taken down. This information can then be correlated with sales data, marketing reports and other information to determine which messages were most or least effective.• Conducting research A web archiving capability can be extraordinarily useful in a wide range of research applications, such as a journalist exploring the positions of a political candidate prior to conducting an interview, a customer researching exactly when a company’s stated policy was first posted to its web site or when it was withdrawn, a human resources staffer investigating the statements made to a blog post or Facebook wall by a prospective employee, or when and where information about a trade secret was first©2010 Osterman Research, Inc. 10
  12. 12. Web Archiving: The Next Phase in the Evolution of Archiving posted to the web, to name but a few of the tens of thousands of potential use cases for web archiving focused on research.THE BOTTOM LINEWhile there are a variety of applications for web archiving technology, the bottom line isthat web content must be preserved for the same reasons that email and otherelectronic content must be archived. This was summarized in a landmark court decisionvin which the presiding judge wrote, “This Court sees no reason to treat websitesdifferently than other electronic files.”Key Issues in Selecting a Web Archiving VendorThere are a number of features, functions and capabilities that decision makers shouldconsider as they evaluate web archiving solutions. Among these are the following:BREADTH OF WEB CONTENT ARCHIVINGA web archiving solution should be able to archive a wide variety of content, fromindividual web pages to entire web sites. This should also include social media pages,RSS feeds, blogs and any other content that might be required for e-discovery, researchor other uses.SUPPORT FOR A WIDE RANGE OF TECHNOLOGIESA wide and growing variety of technologies are used on the web, including Adobe Flash,AJAX, Javascript, PHP, various image formats (JPG, PNG, GIF, etc.), video content andother formats. Any web archiving technology must be able to archive all of thesetechnologies. Further, it must accommodate new technologies as they becomeavailable.FLEXIBILITY OF ARCHIVINGA web archiving platform must also provide flexibility in the timing of archiving. Unlikeemail or file archiving that is driven by the creation of discrete emails or files, webarchiving is based on specific timing requirements. For example, a web archive shouldbe able to archive all necessary web content at regular intervals, on a one-off basis,automatically, manually, etc. In short, a web archiving platform must be able to archiveweb content whenever it is required.ANALYSIS AND REPORTING TOOLSWeb archiving capabilities should also provide robust analysis and reporting tools so thatcontent can be analyzed for purposes of e-discovery, litigation support, regulatorycompliance, marketing analysis or other purposes; or for purposes of reporting high-level results to senior managers. For example, senior counsel may want to analyze anentire web site’s contents over a particular date range for a set of keywords that may berequired as part of an early case assessment exercise. Or, a marketing manager maywant to search a competitor’s blog over the past year to search for instances of businesspartners being mentioned. Analysis tools will ideally support the creation of charts toaid in the analysis of trends, such as comparisons of web content over time.©2010 Osterman Research, Inc. 11
  13. 13. Web Archiving: The Next Phase in the Evolution of ArchivingINTEGRATION WITH EXISTING SYSTEMSWeb archiving capabilities should integrate with other systems in place in theorganization, including analysis tools, existing archiving systems for email, etc. Theability to integrate with these systems will make searching and analyzing web contenteasier and more efficient, and will allow organizations to respond more quickly to time-sensitive requests. Further, integration with existing systems will allow data to beanalyzed without users learning a new tool, interface, etc.DELIVERY MODELSA web archiving platform should support a flexible delivery model. While manyorganizations prefer an on-premise solution that can be managed completely behind thecorporate firewall, a growing number of organizations are opting for cloud-basedsolutions that are completely managed by a third-party service provider.FISMA-COMPLIANCE FOR FEDERAL GOVERNMENT CUSTOMERSThe Federal Information Security Management Act of 2002 (FISMA) requires US federalagencies to create, implement and document an information security program tosupport their information management goals. A key goal of FISMA is the archiving ofinformation assets, including web sites. Consequently, a best practice focused on FISMAcompliance will include regular capture of all relevant web site content, including secure,long-term storage of this content.ABILITY TO PERFORM FULL-TEXT/CONTENT SEARCHINGAnother important feature of any web archiving solution is the ability to search forcontent using full-text/searching capabilities. This is particularly important whensearching for specific keywords or phrases during an e-discovery or similar exercise inmuch the same way that this type of search is critical for any other type of archivedcontent, such as email or files.USE OF ORGANIZATIONAL TOOLSOrganizational tools are also a very useful feature for a web archiving system because itallows reviewers to organize content for subsequent searches. For example, the abilityto organize content into folders, tag specific sections or pages for later review, or addnotes to pages or sections is very helpful for paralegals who are scouring archived webcontent for later and more thorough review by senior counsel.ABILITY TO COLLABORATE USING ARCHIVED CONTENTFinally, it is important that any web archiving capability allow users to collaborate basedon this archived content. Just as with email or other types of content archiving, teamsof individuals will normally work on large cases involving archived web content and theirability to collaborate is essential.©2010 Osterman Research, Inc. 12
  14. 14. Web Archiving: The Next Phase in the Evolution of ArchivingConclusion: Consider Web ArchivingBecause the web continues to grow in importance for both business and government asa medium for communication and commerce, archiving of web content should becomean essential element of any organization’s risk mitigation and compliance strategy. As aresult, organizations should seriously consider developing a web archiving policy anddeploying technology that can support this policy.About the Sponsor of This White PaperABOUT REED TECHNOLOGYReed Technology & Information Services (RTIS) offers the Reed Tech Web ArchivingService for corporate enterprises, government, and professional services companies.Reed Tech has been providing clients with information capture, conversion,management, distribution and transformation services for almost 50 years. Reed Tech’sclients include large government agencies like the U.S. Patent & Trademark Office, awide range of pharmaceutical and other life sciences companies, and law firms of allsizes.Reed Tech is a wholly-owned subsidiary of Reed Elsevier, an $8b global provider ofprofessional information and online workflow solutions in the Science, Medical, Legal,and Risk and Business sectors. With almost 1,000 full time employees, Reed Techreports in through LexisNexis, a leading global provider of content-enabled workflowsolutions to professionals in law firms, corporations, government, law enforcement, tax,accounting, academic institutions and risk and compliance assessment.ABOUT ITERASIIterasi Inc. - creates enterprise-class web archiving technology applications specificallyfor regulatory compliance, litigation protection, and e-discovery. Pete Grillo, CEO,founded the company in 2007.©2010 Osterman Research, Inc. 13
  15. 15. Web Archiving: The Next Phase in the Evolution of Archiving© 2010 Osterman Research, Inc. All rights reserved.No part of this document may be reproduced in any form by any means, nor may it be distributed without the permissionof Osterman Research, Inc., nor may it be resold or distributed by any entity other than Osterman Research, Inc., withoutprior written authorization of Osterman Research, Inc.Osterman Research, Inc. does not provide legal advice. Nothing in this document constitutes legal advice, nor shall thisdocument or any software product or other offering referenced herein serve as a substitute for the reader’s compliancewith any laws (including but not limited to any act, statue, regulation, rule, directive, administrative order, executiveorder, etc. (collectively, “Laws”)) referenced in this document. If necessary, the reader should consult with competentlegal counsel regarding any Laws referenced herein. Osterman Research, Inc. makes no representation or warrantyregarding the completeness or accuracy of the information contained in this document.THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND. ALL EXPRESS OR IMPLIEDREPRESENTATIONS, CONDITIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY ORFITNESS FOR A PARTICULAR PURPOSE, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS AREDETERMINED TO BE ILLEGAL.i http://www.archive.org/about/faqs.php#The_Wayback_Machineii http://royal.pingdom.com/2010/01/22/internet-2009-in-numbers/iii Filing Communications for FINRA Review Webcastiv http://www.fda.gov/ICECI/EnforcementActions/WarningLetters/ucm230796.htmv Arteria Prop. Pty Ltd. v. Universal Funding V.T.O., Inc., 2008 WL 4513696 (D.N.J. Oct. 1, 2008)©2010 Osterman Research, Inc. 14