• Like
The Enterprise Use of Hadoop
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

The Enterprise Use of Hadoop


The enterprise path to cloud computing is intrinsically complex because of the need to bring forward existing applications and evolve organization structure and skill set. Hadoop, an Apache …

The enterprise path to cloud computing is intrinsically complex because of the need to bring forward existing applications and evolve organization structure and skill set. Hadoop, an Apache Foundation Open Source project, represents a way for enterprise IT to take advantage of Cloud and Internet capabilities sooner when it comes to the storage and processing of huge (by enterprise IT standards) amounts of data.

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. The Enterprise Use of Hadoop (v1) Internet Research Group November 2011About The Internet Research Groupwww.irg-intl.comThe Internet Research Group (IRG) provides market research andmarket strategy services to product and service vendors. IRG servicescombine the formidable and unique experience and perspective of thetwo principals: John Katsaros and Peter Christy, each an experiencedindustry veteran. The overarching mission of IRG is to help clientsmake faster and better decisions about product strategy, market entry,and market development. Katsaros and Christy published a book onhigh tech business strategy Getting It Right the First Time – Praeger,2005 www.gettingitrightthefirsttime.com. © 2011 Internet Research Group – all rights reserved
  • 2. IRG 2011: The Enterprise Use of Hadoop (v1) page i Table of Contents1. Overview .................................................................................................................... 12. Background ............................................................................................................... 13. What Is Hadoop? ...................................................................................................... 24. Why Is Embedded Processing So Important? ....................................................... 35. MapReduce Analytics ............................................................................................... 46. What is “Big” Data? .................................................................................................. 47. The Major Components of Hadoop ......................................................................... 58. The Hadoop Application Ecology............................................................................ 69. Cloud Economics ..................................................................................................... 610. Why Is Hadoop So Interesting? ............................................................................... 811. What Are the Interesting Sources of Big Data? ..................................................... 912. How Important Is Big Data Analytics? .................................................................. 1013. Things You Don’t Want to Do with Hadoop .......................................................... 1114. Horizontal Hadoop Applications ........................................................................... 1115. Summary ................................................................................................................. 12 © 2011 Internet Research Group – all rights reserved
  • 3. 1. Overview The last decade has seen amazing continuing progress in computer technology, systems and implementations, as evidenced by some of the remarkable Web and Internet systems that have been constructed such as Google and Facebook. Although most enterprise CIOs yearn to be able to take advantage of the performance and cost efficiencies that these pioneering Web systems deliver, the enterprise path to Cloud computing is intrinsically complex because of the need to bring forward existing applications and evolve organization structure and skill set, so achieving those economies will take some time. Hadoop, an Apache Foundation Open Source project, represents a way for enterprise IT to take advantage of Cloud and Internet capabilities sooner when it comes to the storage and processing of huge (by enterprise IT standards) amounts of data. Hadoop provides a means of implementing storage systems with Internet economics and doing large-scale processing on that data. It is not a general replacement for existing enterprise data management and analysis systems, but for many companies an attractive complement to those systems, as well as a way of making use of the large-volume data sets that are increasingly available. The Yahoo! Hadoop team argues that in five years, 50% of enterprise data will be stored in Hadoop – they might well be right.2. Background The last decade has been remarkable for the advances in computer technology and systems:  There has been continuing, relentless “Moore’s Law” progress in semiconductor technology (CPUs, DRAM and now SSD).  There has been even faster progress in disk price/performance improvement.  Google demonstrated the remarkable performance and cost-effectiveness that could be achieved using mega-scale systems built from commodity technology, as well as pioneering the application and operational adaptations needed to take advantage of such systems. The compounded impact of these improvements is seen most dramatically in various Cloud offerings (starting with Google or Amazon Web Services) where the cost of storage or computation is dramatically (orders of magnitude) cheaper than in typical enterprise computing. © 2011 Internet Research Group, all rights reserved Provided to Clients Under License
  • 4. IRG 2011: The Enterprise Use of Hadoop (v1) page 2 Hadoop presents an opportunity for enterprises to take advantage of Cloud economics immediately, especially in terms of storage, as we will sketch below.3. What Is Hadoop? Hadoop builds on a massive file system (Google File System or GFS) and a parallel application model (MapReduce) originally developed at Google. Google has an unbelievable number of servers compared to typical large enterprises (in all likelihood more than a million). Search is a relatively easy task to parallelize: many search requests can be run in parallel because they only have to be loosely synchronized (the same search done at the same time doesn’t have to get exactly the same response). GFS was developed as a file system for applications running at this scale. MapReduce was developed as a means of performing data analysis using these resources. Hadoop is an OpenSource reimplementation of GFS and MapReduce. Google’s systems run a unique and proprietary software “stack” so no one else could run Google’s MapReduce even if Google permitted it. Hadoop is designed to run on a conventional LINUX stack. Google has encouraged the development of Hadoop, recognizing the value in a broader population of people trained in the methodology and tools. Much of the development of Hadoop has been driven by Yahoo!. Yahoo! is also a large Hadoop user, internally running a cluster of more than 40,000 servers. Operationally we talk about a Hadoop “cluster”: a set of servers dedicated to a particular instance of Hadoop that may consist of just a few to the clusters of more than 4,000 servers in use at Yahoo!. Today a typical Hadoop server might be two sockets, a total of 8 cores (two 4- core servers), 48 GB of DRAM, and 8-16 directly attached disks, typically cost- per-byte optimized (e.g., 2 or 3 TB 3.5” SATA drives). When implemented with high-volume commodity technology, the majority of the server cost is the disk drive complement, and each server will have 20-50 TB of storage. © 2011 Internet Research Group, all rights reserved Provided to Clients Under License
  • 5. IRG 2011: The Enterprise Use of Hadoop (v1) page 34. Why Is Embedded Processing So Important? A useful way of thinking about a Hadoop cluster is as a very high-capacity storage system built with “Cloud” economics (using inexpensive, high-capacity drives), with substantial, general purpose, embedded processing power. The importance of having local processing capability becomes clear as soon as you realize that even when using the fastest LAN links (10Gbits/sec), it takes 40 minutes to transfer the contents of a single 3 TB disk drive. Big data sets may be remarkably inexpensive to store, but they aren’t easy to move around, even within a data center using high-speed network connections.1 In the past we brought the data to the program: we ran a program on a server, opened a file on a network-based storage system, brought the file to the server, processed the data, and then probably wrote new data back out to the storage system.2 With Hadoop, this is reversed reflecting the fact that it’s much easier to move the program to the data than the data to the program. Modern servers and large-capacity disks enable affordable storage systems of enormous capacity, but you have to process the data in place when possible; you can’t move it. Some “Cloud” storage applications require only infrequent access to the stored data. Almost all the activity in a Cloud-based backup service is writing the protected data to the disks. Reading the stored data is only done infrequently (albeit being able to read a backup file when needed is the key value proposition). The same is true to an only slightly lesser degree when pictures, videos or music are stored in the Cloud. Only a small percentage of that data is ever accessed, and that small fraction can (and is) cached on higher performance, more expensive storage. Analysis is very different; data will be processed repeatedly as it is used to answer diverse questions. PC backup or picture storage are write-once/read-never applications. Analysis is write- once/read-many. 1 A modern SATA drive can transfer data between the disk and server at a sustained rate of about 1 Gbit/second. On a 12-disk node, the aggregate read rate could be up to about 10 Gbits/second. On a 50-node cluster the total aggregate read rate could approach 500 Gbits/second. 2 A 10 MB file (100Mbits) can be transmitted in about 0.1 second over a Gbit/second link. © 2011 Internet Research Group, all rights reserved Provided to Clients Under License
  • 6. IRG 2011: The Enterprise Use of Hadoop (v1) page 45. MapReduce Analytics The use of Hadoop has created a lot of interest in large-scale analytics (the MapReduce part of Hadoop). This kind of “divide and conquer” algorithm methodology has been used for numerical analysis for many years as a way of dealing with problems that were known to be bigger than the biggest machine available. MapReduce is an elegant way of structuring this kind of algorithm that isolates the analyst/programmer from the specific details of managing the pieces of work that get distributed to the available machines, as well as an application architecture that doesn’t depend on any specific structuring of the data. As Hadoop evolves, the basic ideas will be adapted to more computer system architectures than just the commodity scale-out systems used by the mega Web properties like Google and Yahoo. A MapReduce computation cluster could also be used with data stored in a high-performance, high-bandwidth storage subsystem which would make a lot of sense if the data was already stored there for other reasons. We expect many such variants of the original architecture to emerge over time.6. What is “Big” Data? Google and Yahoo! use MapReduce for purposes that are unique to extremely large-scale systems (e.g., search optimization, ad delivery optimization). That fact notwithstanding, almost all companies have important sources of big data. For example:  World-wide markets: The Internet enables any company, large or small, to interact with the billions of people world-wide who are connected. Modern logistics services such as UPS, FedEx and USPS let any company sell to global markets. A successful company has to think of millions of people and build business systems capable of running at that scale. That’s big data.  Machine-generated data: IT infrastructure (the stuff that all modern companies run on) comprises thousands of devices (PCs and mobile devices, servers, storage, network and security devices) all of which are capable of generating a stream of log-data summarizing normal and abnormal activity. In aggregate this stream is a rich source of business process, operational, security and regulatory compliance analysis. That’s big data. We’ll talk more later about how big data will impact enterprises over time. © 2011 Internet Research Group, all rights reserved Provided to Clients Under License
  • 7. IRG 2011: The Enterprise Use of Hadoop (v1) page 57. The Major Components of Hadoop The core of the Hadoop OpenSource projects is HDFS (the Hadoop Distributed File System), the reimplementation of the Google File System, and MapReduce defined by the public documents Google has published. HDFS is the basic file storage, capable of storing a large number of large files. MapReduce is the programming model by which data is analyzed using the processing resources within the cluster. HDFS has these goals:  Build very large data management systems from commodity parts where component failure had to be assumed and dealt with as part of the basic design of the data system (in contrast to most enterprise storage where great attention is paid to making the components reliable).  A file system capable of storing huge files by historical standards (many files larger than 1 GB).  A file system that was optimized assuming that files typically change by data be appended to the file (e.g., additions to a log file) rather than by the modification of internal pieces of the file.  A system where the file system APIs reflect the needs of these new applications. The motivation for MapReduce is more complicated. Today’s world of commodity servers and inexpensive disk drives is completely different from yesterday’s world of enterprise IT. Historically, analytics ran on expensive, high-end servers and used expensive, enterprise-class disk drives. Buying a new database server is a big decision and comes with software licensing costs, as well as incremental operational needs (e.g., a database administrator). In the Hadoop world, adding more nodes isn’t a major capital expense (< $10K server) and doesn’t trigger new software licenses, or additional administrators. MapReduce was designed for an environment where adding more hardware is a perfectly reasonable approach to problem solving. MapReduce is designed for such environments: progress is more easily made by adding hardware than by thinking about the problem and carefully crafting an optimized solution. MapReduce allows the scale of the solution to grow with minimal need for the analyst or programmer to adapt the program. The MapReduce infrastructure functions to distribute that work among the available processors (the application programmer shouldn’t have to worry about how big the actual cluster is), monitor progress, restart work that stalls or fails, or to balance the work among the available nodes. Using MapReduce is by no means simple, nor something that many business analysts would ever want to do directly (or be able to do for that matter). Google has required all summer college interns to develop a MapReduce application, all © 2011 Internet Research Group, all rights reserved Provided to Clients Under License
  • 8. IRG 2011: The Enterprise Use of Hadoop (v1) page 6 being excellent programmers and having the benefit of colleagues who were experienced and still found it difficult to do. Google has supported the Hadoop effort in part so that it could be used in education to train more knowledgeable individuals. This isn’t a reason why the impact of MapReduce will be limited, however; it’s the motivation for a software ecology built on top of HDFS and MapReduce that makes the capability usable to a broader population.8. The Hadoop Application Ecology It is useful to think of Hadoop as a platform, like Windows or Linux. Although Hadoop was developed based on the specific Google application model, the interest in Hadoop has spawned the creation of a set of related programs. The Apache OpenSource Project includes these:  HBase – the Hadoop database  Pig – a high-level database for data analysis programs  Hive – a data warehouse system  Mahout – a set of machine learning tools There is other software that can be licensed to use with Hadoop including:  MapR – an alternative storage system  Cloudera – management tools Various database and BI vendors offer software for us with Hadoop including these:  Various database and BI vendors offer connectors that make it easy to control an attached Hadoop system and import the output of Hadoop processors  Similarly the “ETL” vendors offer connectors so that Hadoop can be a source (or sink) of data in that process.9. Cloud Economics Now that we have introduced Hadoop and HDFS, we can explain in more detail what we mean by “Cloud Economics.” If you walked into any modern large- scale data center (Google, Yahoo!, Facebook, Microsoft) you would see something that looked very different from an enterprise data center. The enterprise data center would be filled with top-of-the-line systems (“enterprise class”); the Web data center would be filled with something looking more like what you would find in a thrift shop: inexpensive “white box” servers and storage. As the cost of the hardware continues to decline, lots of other aspects of © 2011 Internet Research Group, all rights reserved Provided to Clients Under License
  • 9. IRG 2011: The Enterprise Use of Hadoop (v1) page 7IT have to evolve as well (e.g., software licensing fees, operational costs) if thevalue of the hardware is to be exploited. The basic system and applicationdesign have to evolve as well.Perhaps most importantly, Google recognized that in large-scale computingfailure and reliability had to be reconsidered. In large-scale systems, failure wasthe rule rather than the exception (with millions of disk drives, disk drive failureis ongoing). In large-scale systems, it makes more sense to achieve reliabilityand availability in the higher-level system (e.g., HDFS) and application (e.g.,MapReduce) layers, not by using “enterprise-class” subsystems (e.g., RAID disksystems). HDFS is a very reliable data storage subsystem because the file data isreplicated and distributed. MapReduce anticipates that individual tasks will failon an ongoing basis (because of some combination of software and hardwarefailure) and manages the redistribution of work so that the overall job iscompleted in a timely manner.Consider how this plays out with storage. In the enterprise data center, the datawould likely be stored on a shared SAN (storage area networking) system.Because this SAN system held key data for multiple important applications, theperformance, reliability and availability of the SAN system was critical: Redundant disks would be included and the data spread among multiple disks so that the loss of one or more of the disks wouldn’t result in the loss or unavailability of the data. Critical elements (the controller, SAN switches and links, power supplies, host adaptors) would all be replicated for availability. Because the SAN system supported multiple applications concurrently, performance was critical, so the fastest (and most expensive) disks would be used, with the fastest (and most expensive) connection to the controller. The controller would include substantial RAM memory for caching.In contrast, a Hadoop cluster of 50 nodes has 500-1000 high-capacity, low-costdisk drives. The disks are selected to be cost optimized – lowest cost per byte stored, least expensive attachment directly to a server (no storage network, no Fiber Channel attachment). The design has no redundancy at the disk level (no RAID configurations, for example). The HDFS file system assumes that disk failures are an ongoing issue and achieves high-availability data storage despite that.Cloud economics of storage means cost-effective drives directly connected to acommodity server with the least expensive connection. In a typical Hadoopnode, 70% of the cost of the node is the cost of the disk drives and the diskdrives are the most cost-effective possible. It can’t get any cheaper than that! AHadoop cluster is a large data store built in the most cost-effective way possible. © 2011 Internet Research Group, all rights reserved Provided to Clients Under License
  • 10. IRG 2011: The Enterprise Use of Hadoop (v1) page 810. Why Is Hadoop So Interesting?As we noted earlier, big data is something of relevance to essentially allbusinesses because of Internet markets and because of machine-generated logdata, if for no other reason. For dealing with big data, Hadoop is unquestionablya game changer: It enables the purchase and operation of very large-scale data systems at a much lower cost because it uses cost-optimized, commodity components. Adding 500 TB of Hadoop storage is clearly affordable; adding 500 TB to a conventional database system is often not. Hadoop is designed to move programs to data rather than the inverse. This basic paradigm change is required to deal with modern, high-volume disk drives. Because of the OpenSource community, Hadoop software is available for free rather than at current database and data warehouse licensing fees. The use of Hadoop isn’t free, but the elimination of traditional license fees makes it much easier to experiment (for example). Because Hadoop is designed to deal with unstructured data and unconstrained analysis (in contrast to a data warehouse that is carefully schematized and optimized), it doesn’t require database trained individuals (e.g., a DBA), although it clearly requires specialized expertise. The MapReduce model minimizes the parallel programming experience and expertise. To program MapReduce directly requires significant programming skills (Java and functional programming), but the basic Hadoop model is designed to use scaling (adding more nodes, especially as they get cheaper) as an alternative to parallel programming optimization of resources.Hadoop represents a quite dramatic rethinking of “data processing” drive by theincreasing volumes of data being processed and by the opportunity to follow thepioneering work of Google and others, and use commodity system technology ata much lower price. The downside of taking a new approach is twofold: There is a lot of learning to do. Conventional data management and analysis is a large and well-established business. There are many analysts trained to use today’s tools, and a lot of technical people trained for the installation, operation and maintenance of these tools. The “whole” product still needs some fleshing out. A modern data storage and analysis product is complicated: tools to import data, tools to transform data, job and work management systems, data management and migration tools, interfaces to and integration with popular analysis tools, for a beginning. By this standard Hadoop is still pretty young.From a product perspective, the biggest deficiencies are probably the adaptationof Hadoop for operation in an IT shop rather than a large Web property, and the © 2011 Internet Research Group, all rights reserved Provided to Clients Under License
  • 11. IRG 2011: The Enterprise Use of Hadoop (v1) page 9development of tools that let users with more diverse skill sets (e.g., businessanalysts), make productive use of Hadoop-stored data. All of this is beingworked on, either within the OpenSource community or as licensed proprietarysoftware to use in conjunction with Hadoop. Companies providing Hadoopsupport and training services have discovered a vibrant and growing market. Theusability of Hadoop (both operationally and as a data tool) is improving all thetime. But it does have some more distance to go.11. What Are the Interesting Sources of Big Data?There is no single answer. Different companies will have different data sets ofinterest. Some of the common ones of interest are these: Integration of data from multiple data warehouses: Most big companies have multiple data warehouses, in part because each may have a particular divisional or departmental focus, and in part to keep each at an affordable and manageable level since traditional data warehouses all tend to increase in cost rapidly beyond some capacity. Hadoop provides a tool by which multiple sources of data can be brought together and analyzed, and by which a bigger “virtual” data warehouse can be built at a more affordable price. Clickstream data: A Web server can record (in a log file) every interaction with a browser/user that it sees. This detailed record of use provides a wealth of information on the optimality of the Web site design, the Web system performance and in many cases, the underlying business. For example, for the large Web properties, clickstream analysis is the source of fundamental business analysis and optimization. For other businesses, the value depends on the importance of Web systems to the business. Log file data: modern systems, subsystems, and applications and devices all can be configured to create log “interesting” events. This information is potentially the source of a wealth of information ranging from security/attack analysis to design correctness and system utilization. Information scraped from the Web: Every year more information is captured on the Web and more valuable data is captured on the Web. Much of it is free to use for the cost of finding it and recording it. Specific sources such as Twitter produce high-volume data streams potentially of value.Where is this information all coming from? There are multiple sources, but tobegin with, consider: The remarkable and continuing growth of the World Wide Web. The Web has become a remarkable repository of data to analyze in terms of all the contents of the Web, and for a Web site owner, the ability to analyze in complete detail the use of the Web site. © 2011 Internet Research Group, all rights reserved Provided to Clients Under License
  • 12. IRG 2011: The Enterprise Use of Hadoop (v1) page 10 The remarkable and growing use of mobile devices. The iPhone has only existed for the last five years (and the iPad for less), but this kind of mobile device has transformed how we deal with information. Specifically, more and more of what we do is in text form (not written notes, nor FAXs nor phone calls) and available for analysis one way or another. Mobile devices also provide valuable (albeit frightening) information on where and when the data was created or read. The rise in “social sites.” There has been rapid growth in Facebook, LinkedIn as well as customer feedback on specific products (both at shared sites like Amazon and on vendor sites). Twitter provides remarkable volumes of data with possible value. The rise in customer self-service. Increasingly companies look for ways for the community of their customers to help oneanother through shared Web sites. This not only is cost-effective, but generally leads to the earlier identification and solution to problems, as well as providing a rich source of data by which to assess customer sentiment. Machine generated data. Almost all “devices” are now implemented in software and capable of providing log data (see above) if it can be used productively.12. How Important Is Big Data Analytics?The only reasonable answer is “it depends.” Big data evangelists note thatanalytics can be worth 5% on the bottom line, meaning that intelligent analysisof business data can have a significant impact on the financial performance of acompany. Even if that is true, for most companies most of the value will comefrom the analysis of “small data,” not from the incremental analysis of data thatis infeasible to store or analyze today.At the same time, there are unquestionably companies for which the ability to dobig data analytics is essential (Google and Facebook for example). Thesecompanies depend on the analysis of huge data sets (clickstream data from largeon-line user communities) that cannot be practically processed by conventionaldatabase and analytics solutions.For most companies, big data analytics can provide incremental value, but thelarger value will come from small data analytics. Over time, the value willclearly shift toward big data as more and more interesting data is available.There will almost always be value in the analysis of some very large data set.The more important question from a business optimization perspective, iswhether the highest priority requirement is based on big data or is there stilluntapped and higher value “small” data? © 2011 Internet Research Group, all rights reserved Provided to Clients Under License
  • 13. IRG 2011: The Enterprise Use of Hadoop (v1) page 1113. Things You Don’t Want to Do with HadoopThe Hadoop source distribution is “free” and a bright Java programmer canoften “find” enough “underutilized” servers with which to stand up a smallHadoop cluster and do experiments. While it is true that almost every largecompany has real large-data problems of interest, to date much of theexperimentation has been on problems that don’t really need this class ofsolution. Here is a partial list of some of the workloads that probably don’tjustify going to Hadoop: Non-huge problems. Keep in mind that a relatively inexpensive server can easily have 10 cores and 200 GB of memory. 200 GB is a lot of data, especially in a compressed format (Microsoft PowerPivot – an Excel plugin – can process 100 M rows of a compressed fact table data in 5% of that storage). Having the data resident in DRAM makes a huge difference (PowerPivot can scan 1 trillion rows a minute with less than 5 cores). If a compressed version of the data can reside in a large commodity-server memory, it’s almost certain to be a better solution (there are various in-memory database tools available). Only for data storage. Although Hadoop is a good, very-large storage system (HDFS) unless you want to do embedded processing, there are often better storage solutions around. Only for parallel processing. If you just want to manage the parallel execution of a distributed Java program, there are simpler and better solutions. For HPC applications. Although a larger Hadoop cluster (100 nodes) comprises a significant amount of processing power and memory, you wouldn’t want to do traditional HPC algorithms in Hadoop rather than in a more traditional computational grid (e.g., FEA, CFD, geophysical data analysis).14. Horizontal Hadoop ApplicationsWith some very bright programmers, Hadoop can be applied wherever thefunctional model can be applied. One generic class of applications ischaracterized by this: Data sets that are clearly too large to economically store in traditional enterprise storage systems (SAN and NAS) and that are clearly too large to analyze with traditional data warehouse systems. Think of a Hadoop as a place where you can now store the data economically, and use MapReduce to preprocess the data and extract data that can be fed into an existing data warehouse and analyzed, along with existing structured data, using existing analysis tools. © 2011 Internet Research Group, all rights reserved Provided to Clients Under License
  • 14. IRG 2011: The Enterprise Use of Hadoop (v1) page 12 Alternatively you can think of Hadoop as a way of “extending” the capacity of an existing storage and analysis system when the cost of the solution starts to grow faster than linearly as more capacity is required. As introduced above, Hadoop can also be used as a means of integrating data from multiple existing warehouse and analysis systems.15. SummaryTechnology progress and the increased use of the Internet are creating very largenew data sets with increasing value to businesses and making the processingpower to analyze them affordable. The size of these data sets suggests thatexploitation may well require a new category of data storage and analysissystems with different system architectures (parallel processing capabilityintegrated with high-volume storage), different use of components (moreexploitation of the same high-volume, commodity components that are usedwithin today’s very-large Web properties). Hadoop is a strong candidate forsuch a new processing tier. In addition to its initial design by Google, the factthat it is today a vibrant OpenSource efforts suggests additional disruptiveimpact in product pricing and the economics of use is possible. © 2011 Internet Research Group, all rights reserved Provided to Clients Under License