HCLT Whitepaper: Cyber Scan

636 views

Published on

http://www.hcltech.com/engineering-rd-services/overview~ More on Engineering R&D services

Trying to stop online piracy and illegal distribution of content on the internet is nothing new. Like hiring security guards for a store front, combating online theft can be both costly and have unique challenges. Further, the criminal sites respond to business attempts to find and remove illegitimate and illegal content with increasing technical sophistication. Not only must the sites hosting pirated material be identified, but the sites that link to their hacked content. HCL CyberScan can help any business protect their key intellectual property. As online piracy continues to grow exponentially, companies must remain vigilant with technology to minimize copyright infringement and its resulting profit loss. Our unique solution is a new and effective way to combat online piracy, IP theft and illegal distribution by using automation and the latest internet/cloud technologies.

Let CyberScan stop piracy and secure your profits.

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
636
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

HCLT Whitepaper: Cyber Scan

  1. 1. CyberScan (Online IP Infringement Detection Service) July 2011
  2. 2. CyberScan (Online IP Infringement Detection Service) | July 2011 TABLE OF CONTENTS Abstract ............................................................................................. 3 Abbreviations .................................................................................... 4 The Problem ...................................................................................... 5 Business and Technical Challenges ................................................. 7 CyberScanSolution ........................................................................... 9 Key Features ................................................................................... 11 Key Capabilities .............................................................................. 12 How CyberScan Works ................................................................... 13 Business Impact Examples ............................................................. 15 Conclusion....................................................................................... 16 Author Info ....................................................................................... 16 © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.
  3. 3. CyberScan (Online IP Infringement Detection Service) | July 2011 Abstract Among the most profitable modern businesses on the internet today are media and content providers, and the biggest threat to their profits is from piracy of their copywritten products. Companies, especially, in the entertainment, software, and publishing industries continue to lose profits from the proliferation of pirated content being available on a vast number of sites illegally. HCL, a leading global IT service companyhas now tapped its proprietary skills and tools to develop a software solution that seeks out and protects against illegal hosting or linking of sold material. CyberScan brings online copyright infringement from a revenue loss into an automated evidence collector for direct action, and it uses the same stealth-like methods as the criminals do to identify and protect against illegal postings or links to “hacked” material. CyberScan provides an innovative and highly effective online copyright infringement detection service to help businesses reduce profit loss. Its state-of-the-art software uses web crawling, tracking and indexing, distributed agent „sniffers‟, and IP masking to identify, monitor, and report infringed content in a rapid but undetectable manner. It even bypasses the typical methods piracy sites use to hide from or fight off such detection. CyberScan employs its special technical methods automatically, reducing the cost of manually finding and tracking infringement or compliance. Infringement of any IP or copywritten content that can be sold and shared digitally - movies, pictures, audio files, eBooks, TV shows, documents, and software - can now be identified and addressed with CyberScan. Your business can finally fight online piracy of your content in an effective manner. CyberScan from HCL is a powerful tool and a major benefit for any business combating copyright infringement. © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.3
  4. 4. CyberScan (Online IP Infringement Detection Service) | July 2011 Abbreviations Sl. No. Acronyms (Page No.) Full form 1 IP(1) Intellectual property 2 URL(5) Uniform Resource Locator 3 AWS(9) Amazon Web Services 4 SaaS(11) Software as a Service © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.4
  5. 5. CyberScan (Online IP Infringement Detection Service) | July 2011 The Problem Online piracy and infringement of copywritten material continues to grow as a problem - and profit loss - for businesses more than retail theft in stores. The problem businesses face today in trying to combat online piracy include the following:  Identifying infringing content from a vast rising number of hosting/linking sites known or unknown to exist.  Searching, finding, and filtering through content in a timely matter even though it is propagated quickly across the internet.  Staffing manual operators to perform search and detection of infringing material, or hiring developers skilled in particular logic and coding algorithms for it.  Evading criminal website administrator techniques like blocking IP addresses based on number of hits so they avoid manual or scripted detection systems.  Poking through authentication techniques used by piracy site administrators to protect and firewall their illegal content.  Issuing Cease and Desist or Takedown Notices to an ever- growing number of dynamic hosting sites that change their URLs and addresses.  Fingerprinting infringement as evidence and enforcing compliance of removal after discovery or serving notice.  Reporting reasonable data out of the huge volumes of content found to derive infringement patterns, assess perpetrators, and make business decisions. Meeting The Challenge Businesses currently trying to solve these problems of content piracy and distribution find tall challenges and roadblocks to their efforts, but CyberScan solves them: Challenge Short Description CyberScan Solution URL Obfuscation Sites hide pirate links with format or Intelligent search expressions see layout tricks through formats Website Authentication Sites require login or user Sites are categorized and login credentials to access content credentials used for automation Infringement Detection Rapid changes and posts make Special search methods seek, tag, and finding and monitoring timely monitor based on site Web Crawler Obstruction Site admins limit, watch, and block Jobs spawn across globe from new detecting programs/users addresses to remain in stealth © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.5
  6. 6. CyberScan (Online IP Infringement Detection Service) | July 2011 The next 2 sections go into further detail on these business and technical challenges, and precisely how CyberScan provides the best solution to them available today. © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.6
  7. 7. CyberScan (Online IP Infringement Detection Service) | July 2011 Business and Technical Challenges Websites hosting infringing content are Trying to stop online piracy and illegal distribution of content on the responding to business internet is nothing new. Like hiring security guards for a store front, attempts to find and combating online theft can be both costly and have unique remove illegitimate and challenges. Further, the criminal sites respond to business attempts illegal content with to find and remove illegitimate and illegal content with increasing increasing technical technical sophistication. Not only must the sites hosting pirated sophistication. material be identified, but the sites that link to their hacked content. Each of the challenges listed are described in more detail below, and the next section discusses in more detail how CyberScan solves them: URL Obfuscation Websites that contain links to infringing content – from forums and blogs to search engines - commonly use various obfuscation techniques to prevent automated systems from detecting infringement. Tactics of these linking sites include posting plain text URLs instead of hyper-linked ones, replacing characters inside URLs in a way a human can identify but not a computer, using third party URL shortening services, and requiring registration to view content or posts. These URL obfuscation methods are all a challenge to a company trying to search and identify the sites that serve as a link or entry point to pirated material. Website Authentication Some infringing or linking websites require registration before content can be browsed. This closed-door firewall tactic is particularly difficult to address because there are no standard methods of authentication across the web. Many sites use customized form-based authentication, which any manual or programmed web crawler or sniffer must handle. To further complicate matters, some linking websites allow anonymous access to only small portions of the site, or require their own authentication before users can view links to infringing content and downloads. Infringing Content Detection Identification of infringement hidden inside unstructured content is a serious challenge, especially given the dynamic linking nature of the Internet and the frequency of new or updated posts. Ability to detect infringing content within hours of being posted is a desired capability of any detection system. A specialized approach to content detection and crawling methodologies that not only seeks and finds, but also continually monitors any type of website, is a great © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.7
  8. 8. CyberScan (Online IP Infringement Detection Service) | July 2011 challenge that must be met to protect pirated content from spreading. Webcrawler Obstruction Some linking sites take proactive and even reactive measures to hinder automated systems and manual sniffing for pirated content. Tactics includes blocking IP addresses according to their own criteria, user agent strings, and enforcing page view limits and quotas. These obstructions may occur programmatically or via manual intervention by the website administrators. A great challenge in crawling the web manually or automatically is to remain in a stealth mode so you can continue to detect and monitor IP infringement while remaining undetected yourself. Developed with these problems in mind, CyberScan provides key logic and proven components, allowing automated solutions to these unique challenges and more. © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.8
  9. 9. CyberScan (Online IP Infringement Detection Service) | July 2011 CyberScan offers a CyberScanSolution comprehensive solution CyberScan directly addresses the business and technical to online infringement by challenges in the piracy prevention sphere outlined above in the effectively leveraging the following specific ways: best of breed open source technologies and power URL Obfuscation of cloud computing. Infringing sites use cover methods like changing or masking their URLs, clouding the links to their site, or requiring registration to continue. CyberScan‟s custom webcrawler logic intelligently applies regular expressions to detect and process host site URLs inside of unstructured web content. CyberScan also supports custom website authentication mechanisms, which enables crawling entire domains under the guise of being of a registered user. This combination effectively deals with most forms of URL obfuscation. Website Authentication Sites hosting pirated content often require user authentication, keeping their illegal wares behind a closed and locked door. Using sophisticated analysis algorithms, linking sites are classified based on a wide range of criteria so the best applicable approach is selected. Nutch web crawler‟s authentication modules have been extended to support form-based authentication. Credentials gathered from manual site registration are supplied through the CyberScan Web Application and are used while crawl jobs are underway. This allows CyberScan web crawlers to access and analyze areas of suspect websites typical search engines are unable reach. Infringing Content Detection Infringing content can be hidden and changed by new posts, updates, propagation, and by the fast dynamic nature of the internet. CyberScan uses a combination of weighted regular expressions to detect infringing content. While within a website known to serve content suspected of infringing, the program is stricter in determining infringement possibilities. Within less known or new sites, search logic can also be applied. If for example the body of a post matches a regular expression designed to detect a customer‟s content title, and the URL also contains a particular flagged string, the code can accurately determine if the post is infringing or not. © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.9
  10. 10. CyberScan (Online IP Infringement Detection Service) | July 2011 Different crawling methodologies have been implemented depending on the layout of sites. For example, CyberScan in forum style websites attempts to use the site‟s search functionality to sniff and crawl links that have a high probability of infringement. In other site styles where search is not available, CyberScan crawls the index pages to analyze each post according to its logic. CyberScan will find infringement when it is there. Webcrawler Obstruction Pirate site administrators react and try to block access or views by legitimate enforcers, either manually or with programs that detect who is trying to detect them. CyberScan conducts its crawls inside of Amazons Elastic MapReduce AWS service. Each crawl job is conducted in a newly provisioned cluster, each using a different IP address and geographical location. The user agent string is set to the most common browsers/platforms on the web. To circumvent server-side page view limits or quotas, the client crawl jobs are configured to be low impact and “polite” to the web servers. By crawling the targeted sites in large but distributed jobs, the load is spread across the entire World Wide Web, while the system actively searches for infringement using varying aliases. This complex combination helps to keep the web crawlers under the radar of website administrators, and makes CyberScan very difficult to identify and block. © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.10
  11. 11. CyberScan (Online IP Infringement Detection Service) | July 2011 CyberScan’s rich Key Features featureset and capabilities CyberScan finds pirated content and the sites that provide paths to around the 4 efficient it in a way no other software can – effectively, secretly, and parts: Identification, automatically. CyberScan‟s key features and benefits include the Evidence collection, following: Reporting Infringement and Re-verification  Automatic identification of suspected infringement on provides 3600 protection intellectual property for your content.  Dynamic detection though multiple geographies to remain in “stealth mode”  Savvy “crawl/ sniff” logic that remains undetected by pirate administrators  Full evidence capture and archival for legal establishment of infringement  Thorough and fully automated domain traversal, parsing, and indexing  Powerful multi-faceted search for drilling into indexed content  Adaptive tracking of detected sites to ensure removal and compliance  Live feeds detailing newly discovered infringement  Cross-category crawls of sites and specific sniffing posed as a legitimate user  Interactive web interface for system monitoring and control  Prevalence analysis and reporting of pirated content and its service providers  Highly scalable and reliable cloud architecture using proven open source modules © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.11
  12. 12. CyberScan (Online IP Infringement Detection Service) | July 2011 Key Capabilities CyberScan was developed by HCL experts to include key capabilities and utilize proven components to specifically address content piracy concerns of provider businesses and their technical staff. The following are some highlights:  Customized proprietary version of the Nutch open source web crawler  Proven cloud infrastructure utilizing Amazon Web Services (AWS) to deploy/run  Advanced AWS services like elastic clusters for highly scalable and reliable system  Dynamic resource allocation across multiple domains, locations, and user strings, enabling CyberScan to work in an undetected stealth mode  Fully indexed suspicious domain lists and multi-faceted search results through a custom search engine UI, useful for research, analysis, and reporting  Adaptive revisits to suspicious content download and link pages, to detect when they are removed and ensure compliance  Coding logic that ensures “politeness” to servers being sniffed, to ensure web crawlers resemble normal users un- noticed by reactive pirate site administrators  Cloud-based architecture that allows for global efficiency and Software As a Service (SaaS) pay per use billing model. © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.12
  13. 13. CyberScan (Online IP Infringement Detection Service) | July 2011 The CyberScan Difference CyberScan‟s solution stands out far when compared to any mix of software for its features and capabilities; the following are some of the key benefits HCL adds when partnering with them to use the CyberScan solution and services: How CyberScan Works The secret to CyberScan‟s profit-saving features and benefits lies in HCL‟s selection and customization of technologies that can together perform the job required to quietly and efficiently detect copyright infringement and propagation. HCL found niche open source computing platforms and customized them, added an intelligent architecture geared for IP detection tasks, and tapped the power of the cloud. The result is a differentiating feature set outlined above. The following are some of the technologies and components used in this unique HCL assembly and coding:  Java – programming language and computing platform.  Nutch – a multi-threaded web crawler capable of full web scale indexing, serving as the core sniffer/crawling technology  SOLR – an enterprise grade search platform, constructs a full text searchable index of the content crawled by Nutch  Lucene – text search engine library, used by Nutch and SOLR © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.13
  14. 14. CyberScan (Online IP Infringement Detection Service) | July 2011  Hadoop – a powerful distributed computing framework, breaks large computational jobs into manageable fragments to be run in parallel on many servers. The following diagram further depicts some of the back-end components (the Hadoop layer) of CyberScan‟s architecture. © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.14
  15. 15. CyberScan (Online IP Infringement Detection Service) | July 2011 Business Impact Examples The innovation behind this powerful new tool has already led to the following business impact on beta-testing and initial customers. Listed here only as examples is how your business can rely on similar success:  A customer has realized more than 95% infringement detection accuracy.  A customer realized a 30% cost saving compared to its existing mix of service providers, with added benefits a single source, HCL, for the new provisions.  Customers express eagerness about a pay-per-use scheme, allowing them to worry only about their business while HCL takes care of the engineering, technology innovation, maintenance, support and research.  A customer division, based on is resounding success with CyberScan, is now introducing the solution to all Business Units and select partners of its company. © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.15
  16. 16. CyberScan (Online IP Infringement Detection Service) | July 2011 CyberScan is a effective Conclusion and cost efficient solution HCL CyberScan can help any business protect their key intellectual to combat online piracy property. As online piracy continues to grow exponentially, and reducing your profit companies must remain vigilant with technology to minimize loss. copyright infringement and its resulting profit loss. Our unique solution is a new and effective way to combat online piracy, IP theft, and illegal distribution by using automation and the latest internet/cloud technologies. Let CyberScan stop piracy and secure your profits. For more on how HCL CyberScan can benefit your organization, contact us at cyberscan@hcl.com Author Info Michael Grucz Technical Lead Research, Internet Security Business Unit Kiran Kumar Reddy . V Product Manager, Internet Security Business Unit. © 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.16
  17. 17. Hello, I’m from HCL’s Engineering and R&D Services. We enabletechnology led organizations to go to market with innovative products &solutions. We partner with our customers in building world classproducts & creating the associated solution delivery ecosystem to helpbuild market leadership. Right now, 14500+ of us are developingengineering products, solutions and platforms across Aerospace andDefense, Automotive, Consumer Electronics, Industrial Manufacturing,Medical Devices, Networking & Telecom, Office Automation,Semiconductor, Servers & Storage for our customers.For more details contact eootb@hcl.comFollow us on twitter http://twitter.com/hclers and our bloghttp://ers.hclblogs.com/Visit our website http://www.hcltech.com/engineering-services/About HCLAbout HCL TechnologiesHCL Technologies is a leading global IT services company, workingwith clients in the areas that impact and redefine the core of theirbusinesses. Since its inception into the global landscape after its IPO in1999, HCL focuses on „transformational outsourcing‟, underlined byinnovation and value creation, and offers integrated portfolio of servicesincluding software-led IT solutions, remote infrastructure management,engineering and R&D services and BPO. HCL leverages its extensiveglobal offshore infrastructure and network of offices in 26 countries toprovide holistic, multi-service delivery in key industry verticals includingFinancial Services, Manufacturing, Consumer Services, Public Servicesand Healthcare. HCL takes pride in its philosophy of „Employee First‟which empowers our 72,267 transformers to create a real value for thecustomers. HCL Technologies, along with its subsidiaries, had stconsolidated revenues of US$ 3.1 billion (Rs. 14,101 crores), as on 31December 2010 (on LTM basis). For more information, please visitwww.hcltech.comAbout HCL EnterpriseHCL is a $5.9 billion leading global technology and IT enterprisecomprising two companies listed in India - HCL Technologies and HCLInfosystems. Founded in 1976, HCL is one of Indias original IT garagestart-ups. A pioneer of modern computing, HCL is a globaltransformational enterprise today. Its range of offerings includesproduct engineering, custom & package applications, BPO, ITinfrastructure services, IT hardware, systems integration, anddistribution of information and communications technology (ICT)products across a wide range of focused industry verticals. The HCLteam consists of over 80,000 professionals of diverse nationalities, whooperate from 31 countries including over 500 points of presence inIndia. HCL has partnerships with several leading Global 1000 firms,including leading IT and technology firms. For more information, pleasevisit www.hcl.com

×