Web Archiving Whitepaper Aleph Archives


Published on

Web archiving Platform CAMA By aleph archives whitepaper
for more information go to our website

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Web Archiving Whitepaper Aleph Archives

  1. 1. Web Archiving for Compliance & eDiscoveryALEPH ARCHIVES Ltd. ✉ 600 Blv de Maisonneuve suite 1700 - Montréal, Québec (Canada) / chemin des Croix-Rouges 16 - 1007 Lausanne (Switzerland) ✎ info@aleph-archives.com ☞ aleph-archives.com
  2. 2. Copyright © 2012 Aleph Archives. All Rights Reserved.WEB ARCHIVINGINTRODUCTIONQuick access to digital data and electronic information stored online is a «must have» when it turns toelaborate strategies in litigation or statutory compliance turmoil.There are however many obstacles to permit and manage such access in an efficient way, whilst tak-ing into account both the frequent complexity of the related turmoil and the legal context which needto be dealt with. It is often impossible or too late to obtain the relevant information when it is neces-sary to, such as during eDiscovery processes.Aleph Archives is an IT service provider dedicated to companies with specific needs regarding Web-content preservation. Aleph Archives offers turnkey tools to easily and efficiently retrieve relevant datastored online.According to recent researches, the average life expectancy of a website is less than 75 days, anddisputes over the content of websites are on the increase. In a certain number of countries, there areregulatory and archiving compliance regulations (i.e. Sarbanes-Oxley Act - US, Health Insurance Port-ability and Accountability Act - US, Gramm-Leach Bliley Act. -US, Federal Rules of Civil Procedure -US, etc) governing, and authorities (i.e. SEC and FINRA - US, Financial Services Authority - UK) basedthereon which supervise, the different industry sectors.Through a unique cloud-based Web archiving platform named CAMA®, Aleph Archives provides a«Web Preservation» services for regulatory compliance, litigation support and eDiscovery to help cor-porate entities, legal and governmental authorities in the collection, management and archiving of theirhuge and increasing Web content. CAMA® is the only platform that archives and keeps records ofyour websites, webpages, and web presence at large. CAMA® clearly evidences the content of web-site which has been shown to a particular enduser during its visit thereof and equally as important,which content – and hence which data - have not.Web archiving for eDiscovery process is a recent "technological niche", as opposed to legacy eDis-covery which has been used for years to preserve electronic data (eg. email, files, etc.). The Web ar-chiving eDiscovery process is based on three main features, as outlined by the Electronic Discoveryreference Model: thorough gathering of electronically stored information from Websites, full accessand playback of any archived web content and conversion to a form that allows full-text search. 1
  3. 3. Copyright © 2012 Aleph Archives. All Rights Reserved.PRODUCTS & INNOVATIONCAMA® Web Archiving PlatformAleph Archives is a pioneer in the domain of Web archiving. We offer a high-quality archive accessibil-ity and rendering. With CAMA®, Aleph Archives sets the web archiving process and the related qualityassurance (QA) to a higher level by working with crawl engineering experts, QA dedicated teams anda powerful - yet easy to use - archive access technology1. Load the archived version with a click Testimonials CAMA® in action: archived (07/04/2011) version of Toyota’s Corporate website and videosAleph Archives targets the companies in need of strict, reliable archiving processes toensure compliance with SEC and FINRA regulations. The CAMA® Web archiving platform is more effi-cient and more reliable than any solution of its main competitors. Aleph Archives offers open (WARC -ISO 28500:2009 2 ), adaptive (cloud-based computing) and innovative (scheduled crawls, export Webarchives as PDF/PNG, antiviral check, CAMA® Appliance, real-time results deduplication, multilingualsearch and translation), etc.1 Products demo at: http://www.youtube.com/user/alepharchives/2 WARC ISO file format: http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717 2
  4. 4. Copyright © 2012 Aleph Archives. All Rights Reserved.CAMA® belongs to the category of « client-based & web-served » archiving solution (refer to Appen-dix A and B for more details) that allow creating and maintaining stable, time-structured, verifiably au-thentic and independent versions of corporate web presence, « social media » included. CAMA® in action: archived (05/10/2011) version of AerzteZeitung online German Newspaper Play all embedded videos as usualAleph Archives’s strategy aims at satisfying any of its clients, as CAMA® offers high-quality archivedwebsites (which can be filed as evidence in case of litigation), easy-to-use browsing and access tools,and a full-Web-based service to reduce costs (refer to Appendix C). 3
  5. 5. Copyright © 2012 Aleph Archives. All Rights Reserved. Today (08/02/2011) live version of NY Daily newspaper TimelineQrcode, DigitalSigning, andTimestampingOptions Pane CAMA® in action: archived (10/05/2011) version of NY Daily newspaper 4
  6. 6. Copyright © 2012 Aleph Archives. All Rights Reserved.MARKET SECTORS: who isCAMA® suitable for?Corporatesa. E-DiscoveryLitigation Protection — Websites contain a growing proportion of business records that must be pre-served for long periods of time. This content is frequently requested during discovery proceedings be-cause of the Federal Rules of Civil Procedure (FRCP) and state versions of the FRCP. As a result, it iscritical that all relevant electronic content be made available for e-discovery purposes.Legal Hold — When a hold on data is required, it is imperative that an organization immediately beginspreserving all relevant data. Our web archiving platform CAMA® allows organizations to immediatelyplace a hold on data when requested by a court or on the advice of legal counsel. If an organization isnot able to adequately place a hold on data when it is obligated to do so, it can suffer a variety of se-rious consequences, ranging from embarrassment to major legal sanctions or heavy fines.b. Regulatory ComplianceFor just about every organization, there are a large and growing number of regulatory obligations to pre-serve electronic content. Some of the more important requirements are: • Sarbanes-Oxley Act of 2002 • Health Insurance Portability and Accountability Act of 1996 (HIPAA) • Securities and Exchange Commission Rules (SEC) • Financial Industry Regulatory Authority (FINRA) • Model Requirements for the Management of Electronic Records (MoReq)c. Maintain Corporate Memory & Knowledge ManagementWeb archiving can be very useful for maintaining a corporate record of what has been posted to a Website, how long this content was maintained or when it was replaced. For example, a company may wanta record of its Web site for historical purposes, or it may need an archive in order to re-use some of itscontent at a later date. Maintaining an accurate archive of Web content can significantly reduce thecosts associated with recreating this content. 5
  7. 7. Copyright © 2012 Aleph Archives. All Rights Reserved.GovernmentVirtually all government agencies have regulatory obligations to preserve electronic content. Becauseyour agency’s online content is increasing both in complexity and volume, and because governmentsare held accountable for the information they publish on the web, you need to employ a records re-tention policy.The 2006 changes to the Federal Rules of Civil Procedure indicate that all organizations (including go-vernments) must be able to find, capture, and produce electronically stored information that might berelevant to a judicial or regulatory request. This can’t be done with server backups, CMS revision con-trol, or other outdated methods. You need a solution that can provide indisputable proof of your onlinerecords integrity and authenticity (as required by the Federal Rules of Evidence).For example, 2010 saw the Executive Office of the President (EOP) issue a solicitation to:« Provide the necessary services to capture, store, extract to approved formats, and transfer contentpublished by EOP on publicly-accessible web sites, along with information posted by non-EOP personson publicly-accessible web sites where the EOP offices under PRA maintains a presence, throughoutthe term of the contract. »Other requirements come from: • Presidential Records Act (PRA) • National Archives and Records Administration (NARA) • E-GOV - electronic records management initiatives • Guidance on Managing Records in Web 2.0/Social Media Platforms, October 20th, 2010 • Library of Congress • Federal Rules of Civil Procedure (FRCP) • Department of Commerce • Department of Energy • Department of Justice • Environmental Protection Agency • Office of Management & Budget • Securities and Exchange Commission Rules (SEC) • Library & Archives Canada 6
  8. 8. Copyright © 2012 Aleph Archives. All Rights Reserved.Website and « social media archiving3 » is a good solution for e-discovery preparedness. Aleph Archi-ves technology uses web bots (i.e crawlers) that capture all web pages (including social media). Theweb pages are stored exactly as they are captured (including links, rich media, video, and Flash),which satisfies regulatory requirements for digital records. Aleph Archives also provides a digital times-tamp and signature for each archived page, ensuring data integrity and authenticity.  With this SaaSsolution (no tedious installation or software), governments can sign up and begin archiving in less thanan hour.Adopting a web archiving policy is essential. But it’s not just for big cities or the federal government.Aleph Archives’s pricing is competitive so that even small towns can stay prepared.  The Internet will only continue to grow in scale and complexity, and governments are increasingly in-terested in how it can be used for civic growth and development.The issue of records retention mustbe addressed from the start, so that agencies can move forward confidently online. « Government websites are public records and must be archived to comply with Public Records Laws. Start archiving now. »FinanceOnline marketing/communications can present a challenge for securities traders, investment advisors,banks, and others in the financial services industry. The benefits of advancing technologies must beweighed against the risks associated with non-compliance in the area of books and records retention.Failure to meet the demands of industry standards can result in hefty fines and bad publicity.Multiple sets of guidelines for the financial industry (issued by SEC, FDIC, FSA, SOX, FINRA, andothers) demand the preservation of business records (both paper AND electronic) in such a way thatthe data can be reproduced in a timely and complete manner to a regulator.  These requirements arenow being extended to include newer tools such as social media platforms, and FINRA has advisedthat no compliance grace period will be in effect for these new technologies.It’s critical that firms implement a robust records retention policy for their websites and « social mediapages ».  Should your corporate web presence be investigated or questioned, a perfect representa-tion of your company’s online activity is a necessity — and that’s exactly what CAMA® provides.  « Website archiving is vital to fulfilling many key FINRA and SEC regulations. Start complying today. »3 Twitter and Government Transparency 7
  9. 9. Copyright © 2012 Aleph Archives. All Rights Reserved.Food and Drugs CompaniesIn archiving their electronic data, public traded companies need to comply with the records manage-ment regulations of the Sarbanes-Oxley (SOX) Act.The past year has seen a dramatic increase in the FDA‘s enforcement of regulations that deal withproduct claims and labeling. In an effort to be more pro-active, the agency has been investigatingcompanies for compliance with the FD & C Act, particularly section 403 A, which deals specificallywith product descriptions and claims. As a result, a number of companies have received warning let-ters — which are viewable online, damaging brand reputation — addressing the product claims madeon their labels or websites.Since most marketing now happens via websites, social media, and other Internet tools, it is of ut-most importance for your company to have a reliable, accurate archive of all online activity. Shouldyour claims be investigated or questioned, defensible evidence of your website’s precise content is anecessity — and that’s exactly what CAMA® provides.Using crawling technology, we take automated snapshots of your website. Only new pages or chan-ged pages are archived, saving storage space. The whole process is automatic — you don’t have toremember to do anything. « Have a reliable, accurate and defensible archive of all online activity. »Law firmsCompanies creating content online or law firms can use CAMA® to provide legal proof of intellectualproperty. CAMA® provides each page with a digital timestamp and a digital signature that cannot bealtered without detection and, hence, creates legal proof of copyright. This trusted, non-refutable evi-dence stands up in a court of law if copyright ownership is ever questioned. « Use websites as legal evidence in court. Have CAMA® create integral and authentic evidence with support for e-Discovery. » “ This Court sees no reason to treat web sites differently than other electronic files. ” Arteria Prop. Pty Ltd. vs. Universal Funding V.T.O., Inc 8
  10. 10. Copyright © 2012 Aleph Archives. All Rights Reserved.CAMA® for Social Media e-DiscoveryOrganizations and their employees are leveraging social media tools at unprecedented levels. Withover 150 million blogs, an average of 140 million tweets every day, and +800 millions of users of socialmedia sites worldwide (Facebook, LinkedIn, MySpace...), organizations are challenged to defineusage policies and implement solutions to appropriately govern, discover and preserve relevant infor-mation from these complex and malleable data sources. Complicating the challenge of performingdiscovery on social media sites is the fact that these sites also include rich media such as audio andvideo, adding to an already complex environment. Legacy tools and manual processes cannot effecti-vely manage the risk associated with social media sites and interactive content.To successfully manage discovery of social media and protect themselves from potential risk, organi-zations must embrace new technologies to harness and understand the meaning of the social mediacontent. Since social media content can be subject to legal hold if it contains relevant information, le-gal teams must be prepared to search, identify, preserve and collect this information. Social mediasites must be managed as other enterprise data sources, as part of a comprehensive Social MediaeDiscovery and information governance program. Given the complexity and volume of social mediacontent, legal teams must be prepared with an automated solution that can understand meaning andcull through voluminous data sources to find relevant information.According to a report issued by Garner, Inc., a leading technology research and advisory firm, half ofall companies will have been asked to produce material from social media sites for e-Discovery by theend of 2013. Debra Logan, vice president and distinguished analyst at Gartner, wrote: « In e-Discovery, there is no difference between social media and electronic or even paper artifacts.The phrase to remember is if it exists, it is discoverable. Unique aspects of social media present addi-tional challenges, but as with an overall information governance strategy, the key to avoiding or miti-gating potential legal issues in the use of social media for business purposes is to have a governanceframework, policy and user education. ».In addition to the challenge of meeting the legal hold and preservation obligation, organizations inclu-ding those in the Financial Services, Healthcare, and Pharmaceutical industry, must ensure that em-ployees are not violating regulations by creating or posting non-compliant content. As regulators re-cognize the influence and risks associated with social media channels, they are beginning to requireorganizations to actively monitor and govern employees social media interactions.For instance, FINRA (Financial Industry Regulatory Authority) regulatory notice 10-06, requires mem-ber firms to supervise and archive content posted to social media sites. The Food and Drug Adminis-tration (FDA), Federal Trade Commission (FTC), and the National Futures Association (NFA) are also 9
  11. 11. Copyright © 2012 Aleph Archives. All Rights Reserved. developing rules associated with the use of social media, and the Federal Courts have issued guideli- nes for monitoring and managing social media sites usage (see Resources & Links section). For example, if you don’t have an archiving system, you could be in trouble trying to find something you posted. Loading archived version All media types (Flash, photos, videos, posts...) are preserved in their native formatNYTimes newspaperon FacebookAll links are clickable.Browse the archivedpages, play videos,load images... CAMA® in action: archived (05/17/2011) version of NYTimes newspaper on Facebook According to Facebook4: « Currently, you can only search for content that has been posted in the last 30 days. The range of the search history may be expanded in the future. » 4 same apply to Twitter and LinkedIn, see Archiving Social Media prepares you for e-Discovery 10
  12. 12. Copyright © 2012 Aleph Archives. All Rights Reserved.Aleph Archives’s advanced web archiving platform for e-Discovery enables organizations to proactive-ly manage, search for, identify and preserve any social media content. CAMA® enables organizationsto take advantage of the power and business value of social networks, while ensuring FRCP, and re-gulatory compliance.Unique Selling PropositionThe main competitive advantages of the CAMA® platform are: • superior technology to capture multiple web formats in dynamic websites, • more comprehensive web archiving process with crawl engineering experts, • high-quality archive accessibility and rendering, • Universal Archives View (UAW) independent from OSes and browser types or versions, • optimized fulltext search engine tailored to very large web archive collections (billions of documents), • deduplicated full-text search results in real-time, • daily archiving capabilities, • support of WARC ISO file format, • dedicated quality assurance teams and processes, • ability to be deployed over commodity machines, • fault tolerant software design, • high availability 5CAMA® is the only solution in the market capable of running without Internet connexion whileaccessing the archives and also being able to be fully deployed « In-House » (i.e inside the cus-tomer’s infrastructure). The « In-House » solution offers you the freedom of exploiting the potential ofCAMA® (training required). DISASTER & DATA RECOVERY « Your data safe and secure » Aleph Archives’s “retention service” includes shadow copies of your archived data in a geographically distinct locations (USA, Canada, Switzerland, France). This means that two copies of your web archives exist at any given time to provide high data availability and avoid data loss.5 See our Service Legal Agreement (SLA) 11
  13. 13. Copyright © 2012 Aleph Archives. All Rights Reserved.Pricing ModelCloud-based solutionThis section describes the implementation process for Aleph’s enterprise web archiving service andthe pricing for the Set Up phases and for the provision of archive services thereafter.Aleph may calculate the fees using one of two methods of estimation.1. Where requirements are not fully defined, a simple overall price can be provided, which will bebased on the size and scope of the archive policy in broad terms. A breakdown of these fees may beprovided for transparency.2. Where requirements are more fully defined, a more rigorous approach to estimating fees may beused. This will provide a price per URL (i.e archived resource), which will be more accurate than thesimple overall price, in that it is based on the specifics of an archive strategy defined by the more de-tailed requirements. Three parameters are involved here: the scope, the frequency, and the price perURL. • The scope defines which URLs are "in" a particular crawl: the list of URLs the customer would like to archive. • The archiving frequency for each scope can vary from daily, to weekly, to monthly to quarterly, to annually. Aleph Archives is the only web archiving company offering a daily archiving service. • The price per URL is composed of: ‣ System administration charges; ‣ Archiving services fees; ‣ Infrastructure and storage costs (retention, data integrity, data security, etc.).InHouse solutionAll interested customers in the InHouse version of CAMA® are welcome to contact us for a quote. 12
  14. 14. Copyright © 2012 Aleph Archives. All Rights Reserved. APPENDIX A.Web Archiving PolicyA web archiving policy is the only means of creating and maintaining a stable, time-structured, verifia-bly authentic and independent version of the corporate web presence. « Independent » means thataccess to the content must be possible without requiring the original CMS version to be installed,configured and running. Having a web archiving policy is the only way the corporate Web-publishinginfrastructure can evolve without threatening accessibility to legacy content. It is also the only way toavoid the continuous licensing and maintenance costs of legacy CMSs.A substantial and enduring web archive can be achieved by generating a flat, stable and time-struc-tred version of the published content, capturing authentic snapshots according to the corporate ar-chiving policy. These snapshots must be taken as user-centric views of the content, i.e. accuratelyreflecting the user’s experience of that particular content. In addition they must be stored and madeaccessible in precisely the same form, thereby meeting legal and compliance requirements as authen-tic copies. And they must enable discovery using familiar web paradigms such as full-text search, aswell as more sophisticated e-discovery techniques including metadata, tagging, filters and complexsearch.A1. How to choose your web archiving solution?Web archiving has made significant progress during the last five to seven years. It now offers a choiceof approach to both policy and supporting technology. These choices should be considered carefullyagainst business objectives before the decision is made. The main differences lie in the capture andaccess methods used.Three different methods exist to capture and archive web content: a. client-side archiving b. transaction archiving c. server-side archiving 13
  15. 15. Copyright © 2012 Aleph Archives. All Rights Reserved.A2. Client-side Archiving« Client-side archiving » uses an archival crawler, derived from search engine crawler technologies,with significant enhancements to ensure that complex and hard-to-reach content can be found andcaptured, as well as stored without change. Starting from seed pages or entry points, these tools au-tomatically capture pages and parse them to extract all links. The process repeats and continues aslong as newly discovered pages remain within the scope defined for the crawl. The captured webcontent and embedded files are stored unchanged — original and authentic copies, an exact equiva-lent of what the generic user would have received in their browser at the time — and preserved in aflat, standards-based and self- contained file format that can be confidently considered as future-proof. This is especially important within a legal context.To be effective this method requires a crawler with excellent link extraction and path-finding algorithmsthat can work in a wide range of circumstances and site/page designs. In addition to client-side archi-ving, there are two alternative methods to capture web content. Both methods need to be operatedfrom the server-side; require prior authorisation to services; and need access to both front-end andback-end servers.A3. Transaction ArchivingThe first of these alternative methods, called « transaction archiving », consists of the systematic cap-ture and archiving of all browser/server exchanges (request/response pairs), resulting from the interac-tion of users with sites, regardless of their content type and how they are produced.Transaction archiving enables tracking and recording of every actual instantiation of content in an au-thentic flat HTML form, easy to maintain and preserve over time. Moreover, it can be used to archivehidden web content, provided this content is requested, i.e. read, by the websites’ users during thecapture time.However, transaction archiving generates unnecessary duplicates of frequently-visited pages and rai-ses serious privacy concerns as the method implicitly relies on usage tracking. 14
  16. 16. Copyright © 2012 Aleph Archives. All Rights Reserved.A4. Server-side ArchivingThe second, and more obvious, alternative to client side archiving is « server-side archiving ». Thisconsists of directly copying files in the document folders to back-up servers. Although it might appearto be the simplest approach, it is in fact seriously flawed, from both the preservation and archive ac-cess points of view.To make certain that any web content archived using this method can be properly restored, server-side archiving requires that all original CMSs, databases and other software are archived alongside thecontent or are actively maintained in an operational state; or that the content is migrated to newerCMSs, databases, etc. In any case, these activities will be required for the whole period of archive re-tention. Interestingly, IT backups essentially rely on this method in almost all cases, systematically fai-ling to meet long-term preservation and ac- cess capabilities that are essential for legal and com-pliance requirements. However, for some types of hidden-web content, this method can prove to beuseful, mainly in situations where it is required to archive parts of websites that a client-side crawlercannot reach.A5. Comparison of Content Capture MethodsThe following table summarises the main content capture methods, where: ✔ = fully supportedand ● = possible/custom development. Server-side Transaction Client-side Content captured as user sees it, unchanged, and authentic ✔ ✔ Archive access independent of original publishing technology ✔ ✔ Able to capture interactive or query based content ✔ ✔ ● Retains web URL space (not dependent on server link mapping) ✔ ✔ De-duplication possible ● ✔ Easily directed and scheduled capture ✔ ✔ Flexible archival scope, for a wide range of needs ✔ ✔ Able to capture browser/server exchanges (request/response pairs) ✔ Web server technology independence ✔ Archiving services can be centralized in one place ✔ Cost effective and efficient operations over time ✔In most cases client-side archiving is the best approach for capturing content. The quality of the resul-ting archive will depend mainly on the capabilities of the crawler, particularly with respect to link ex-traction, even when links are encoded in scripts and executables. This is one of the key determinantsfor capture of all files in a consistent and timely manner. 15
  17. 17. Copyright © 2012 Aleph Archives. All Rights Reserved. APPENDIX B.Accessing your Web ArchivesTwo different methods exist to provide access to archives: a. website-copier approach b. Web-served approachThe choice is largely determined by how the files are stored. This is critically important, because webURLs use different naming conventions to file systems, with different permissible and reserved cha-racters, escaping rules, case sensitivity, etc.B1. Website-copier ApproachWebsite copiers write all captured files directly to disk, and therefore need to modify names and linksas they are stored in order to make the archive accessible. This results in an archive that is not an au-thentic version of the original server’s response stream.B2. Web-served ApproachArchive web servers, on the other hand, store responses from the original server unchanged in con-tainer files. This ensures the content and server response stream are kept in an authentic form.The emerging standard for web archive container files is WARC6 — the Web ARChiving file format —ISO standard ISO/DIS 28500. It is already being adopted as the foundation for web archive storageand preservation. A WARC file records the sequence of harvested web files captured by the crawler,each page preceded by a header containing metadata that briefly describes the harvested content, itslength and checksum.WARC ensures the preservation of the original naming scheme and linking, thereby providing archivestorage of content in an authentic form, as well as providing the means for additional integrity checksduring the entire period of custodianship.6 WARC file ISO format: http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717 16
  18. 18. Copyright © 2012 Aleph Archives. All Rights Reserved.B3. Comparison of Access MethodsThe following table summarises the main archive access methods, where: ✔ = fully supportedand ● = possible/custom development. Website Copy Web-served Archive Searchable ✔ ✔ Browsable ✔ ✔ Content directly navigable from disk ✔ ● Content stored and accessed unchanged, and authentic ✔ Links independent of naming conventions ✔ Storage and preservation of metadata ✔ ✔ Access independent of file location ✔ Standards-based archives ✔There is a consensus today that the website-copier approach has serious limitations concerning au-thenticity of the archive, whereas the Web-served approach can ensure authenticity by design. In pro-fessional use therefore, especially where legal and regulatory obligations are business priorities, theWeb-served approach is a necessity. 17
  19. 19. Copyright © 2012 Aleph Archives. All Rights Reserved. APPENDIX C.Web Archiving as a Best PracticeThe web has matured into a central communication channel for businesses and government agencies,with digital media (websites and other web-based content) all but replacing print media as the primarymode of communication with customers, constituents, prospects, investors, and others.Organizations using the web must keep accurate records of web content — online communication isjust as much of a liability as any other form of communication. As a recent case ruled: « This Court seesno reason to treat websites differently than other electronic files. »Web archiving has become a best practice for any organization using the web to communicate. Organi-zations who neglect to retain accurate records of their web presence are placing themselves at unne-cessary risk, both from a compliance and litigation standpoint.Protect your organization by regularly archiving web content with Aleph Archives Web Archiving Plat-form CAMA®. We provide all the technology and services you need to archive your websites and webpresence from any domain. 18
  20. 20. Copyright © 2012 Aleph Archives. All Rights Reserved. APPENDIX D. ALEPH ARCHIVES’s CAMA® PLATFORMARCHITECTURE OVERVIEW APPENDIX E. More details about the architecture internals are available upon request. 19
  21. 21. Copyright © 2012 Aleph Archives. All Rights Reserved. APPENDIX E.Elements of a Web Archiving PlanSetupAleph Archives runs, tests, and calibrates the CAMA® robots to get the best rules in order to captureyour website(s) with the highest quality.CaptureThe cost related to website crawl and engineering of the target URL’s on a specified frequency.RetentionThe cost of annual storage and retaining archives of target websites. Standard plan calls 7 years re-tention.OperationIncludes the maintaining the designated servers and machines up and running for CAMA®, archivesaccess, retention, and quality assurance.Quality Assurance (QA) - QA Level 1: we check and verify one level deeper (depth 1) from website root (i.e home page). - QA Level 2: we check and verify two levels deep from the root, and so on accordingly with QA Level 3 and QA Level 4. QA can go as far down in website depth as the client needs. In industry practice, QA Level 4 is sufficient for most enterprises for regulatory compliance, legal and operations purposes. - Exhaustive QA: we check and verify all designated websites and levels, verifying every page to the website’s full depth. Exhaustive QA may be cost prohibitive, depending on the customer’s requirements. Upon request, Aleph Archives will provide price quotation for Exhaustive QA. - Mixed QA: we combine a sampled QA per website level with an exhaustive QA to a certain level. 20
  22. 22. Copyright © 2012 Aleph Archives. All Rights Reserved. APPENDIX F.Aleph Archives provides the following CAMA® Plans: FEATURE PROFESSIONAL ENTERPRISE PREMIUM Crawl engineering team ✔ ✔ ✔ WARC format (ISO 28500:2009) compliance ✔ ✔ ✔ Scheduled crawls ✔ ✔ ✔ Archives summary pane ✔ ✔ ✔ Document format handling (HTML, Word, Power- ✔ ✔ ✔ Point, PDF, Flash …) Full text search standard advanced advanced Full text search history ✔ ✔ ✔ Full text search queries import & export ✔ ✔ ✔ Automatic language detection ✔ ✔ ✔ Documents metadata extraction and indexing ✔ ✔ ✔ Infinite archives retention ✔ ✔ ✔ ARC to WARC batch migration ✔ ✔ ✔ WARC to WARC batch conversion ✔ ✔ ✔ Archives verification and repair tools ✔ ✔ ✔ Text summarizer ✔ ✔ ✔ Audit trails identification and traceability ✔ ✔ Deduplicated full text search ✔ ✔ Archived resources export (PDF, PNG) ✔ ✔ Multi-core aware archives servers ✔ ✔ Archives redundancy ✔ ✔ Load balancing for archives access ✔ ✔ Antivirus checker ✔ ✔ Trusted archives (digital signatures) ✔ ✔ SEC 17a-4 and FINRA compliance ✔ ✔ Secured archives access (SSL Encryption) ✔ ✔ Multilanguage instant translator ✔ ✔ Custom Branding ✔ ✔ Archives compression ✔ ✔ Archived data processing and management ✔ 21
  23. 23. Copyright © 2012 Aleph Archives. All Rights Reserved. FEATURE PROFESSIONAL ENTERPRISE PREMIUM CAMA® Appliance ✔ CAMA® Appliance on USB pen drive ✔ CAMA® Kit (Access API) ✔ CAMA® 64bits ✔ Quality Assurance team (level) basic medium high Custom metadata limit 30 unlimited unlimited Collections limit 100 unlimited unlimited Accounts limit 10 unlimited unlimited Crawled resources per month up to 500K up to 5M unlimited Archived resources per month up to 500GB up to 1TB up to 2TBA « Custom Plan » is also available via an online form which allows customers to choose product fea-tures that best suit their needs. 22
  24. 24. Copyright © 2012 Aleph Archives. All Rights Reserved. RESOURCES & LINKS☞ Aleph Archives - Website - Products demo☞ Records Management Finance - FINRA Regulation Notices - FINRA Guidance - FINRA Regulatory Notice 10-06 on Social Media - Summary of NASD Rule 3110 — Books and Records - Federal Rules of Evidence 901 — Data Integrity & Authenticity - SEC — Division of Trading and Markets - SEC — Division of Investment Management - SEC Rule 17 a-4 — Books and Records - Sarbanes-Oxley Act (SOX) - Financial Services Authority (FSA) Handbook (Europe) - FSA Handbook Section 3.2 — see Records Requirements, Sec 3.2.20 (Europe) - Model Requirements for the Management of Electronic Records (MoReq) (Europe) Food and Drug Administration - Federal Rules of Evidence 901 — Data Integrity & Authenticity - FDA Guidance Documents — Food - FDA Compliance & Enforcement – Food - FDA Guidance Documents — Drugs - Code of Federal Regulations (CFR) Title 21 - Model Requirements for the Management of Electronic Records (MoReq) (Europe) - Pharma Social Media Wiki - FDASM (Everything About the FDA, Internet, Social Media) 23