White Paper                     Business Intelligence Solutions on Windows® Azure™                   - Sidharth Subhash Gh...
BI ProcessOverviewPrimarily, a BI solution has two parts: data storage and analysis. The stored raw data is an asset that ...
Data presentation is a crucial component in analysis. The richer the presentation of the data to be analyzed, the better i...
BI Solution Based on Cloud ComputingWith more and more devices getting meshed and inter-connected on the information highw...
Elastic and ScalableA cloud-based solution offers users the capability to provide cloud resources such as computing servic...
Azure™ Based BI SolutionWe will now attempt to explain a high-level design for a custom-built BI solution on Windows® Azur...
Figure 3: High-Level Design for Custom-Built BI Solution on Azure™On-Premise ComponentsData Cleansing and Profiling AgentT...
An on-premise component that forms a part of the integration layer would help in exposing the master data to the cloud. Te...
As of now, the SQL Server Analysis Services are not provided as part of the SQL Azure™ services. Hence, it is imperative t...
• Regional Legislations/Regulations — These are to address regulatory requirements of deploying the application and its da...
Private                                               Public                                   On-Premise                 ...
ConclusionAs cloud computing is evolving and growing every day, it would bring on several distinct changes. We foresee cha...
Upcoming SlideShare
Loading in …5

Business Intelligence Solution on Windows Azure


Published on

This white paper discusses how Cloud Computing might help address the challenges in Enterprise BI

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Business Intelligence Solution on Windows Azure

  1. 1. White Paper Business Intelligence Solutions on Windows® Azure™ - Sidharth Subhash Ghag Abstract Enterprise Business Intelligence (BI) solutions today are analyzing growing amounts of data. More often, the data is historical in nature, coming from within the enterprise and also from external channels such as the Web, mobile, and devices. This has led to the growth of data volume to alarming levels. In traditional BI implementations, this information explosion, along with increasing demands on computational power to process high volumes of data, has been managed through expensive hardware and software upgrades. This is a highly inefficient approach to meet the demands of a growing business, one that the enterprise considers economically unfavorable. With the global scale of operation of large enterprises, the need of the hour is to make information available to partners, remotely-located analysts, and managers who are on the move. This in turn results in additional demands on infrastructure and IT. This white paper discusses how Cloud Computing might help address these challenges with its round-the-clock availability and its dynamic and scalable nature. Cloud infrastructure would be beneficial in terms of offloading - BI storage, long running processes, and handling erratic load behaviors. The proposed solution discussed in this paper is an alternative BI architecture providing an optimal solution that extends existing BI infrastructure. www.infosys.com
  2. 2. BI ProcessOverviewPrimarily, a BI solution has two parts: data storage and analysis. The stored raw data is an asset that needs to be cleansed and processed toderive information for making decisions. The information has to be presented to the decision makers in an intuitive and highly interactivemanner, so that key strategic decisions can be made in the least possible time. BI relies on data warehousing (a data repository designedto support an organization’s decision making). Ineffectively managed data warehouses make it difficult for organizations to quickly extractnecessary data for information analysis to facilitate practical decision-making.The BI process can be represented using the following diagram: Figure 1: BI ProcessOnline Transaction DataOnline transactional data (operational data) from multiple systems (finance, sales, and CRM) is extracted and processed to eliminate dataredundancy or is optimized to be stored in a data warehouse. The purpose of creating a data warehouse is to bring information fromheterogeneous systems to a common data storage platform.Data WarehouseA data warehouse is an independent master store of all the historical transactional data for any enterprise. Extracting transactional data frommultiple systems and then cleansing the data for using it for further analysis is the most important activity of establishing a data warehouse.The process of accumulating data largely depends on the source systems from where the data is retrieved. Mostly, this process of accumulationis customized enough to handle the multiple data sources and data rules, easing the transformation of data from multiple disparate systems,which needs to be stored in a single platform.Data MartsAlthough a data warehouse is a storehouse for voluminous data, it is difficult to process complex analytical queries or jobs directly off the datawarehouse. Thus, the data warehouse is broken down logically or sometimes physically into smaller analysis units called data marts. Data martscan be conceptualized as units of data storage used for dedicated analysis, which is generated using specific filters and queries. Data martscontain specialized multi-dimensional data structures called data cubes. Unlike relational database tables, which have only two dimensions(row and column), a data cube has multiple dimensions.Typical data mart queries include how the sale of grocery products was in the last six months and how a promotion performed in the last sixmonths in the southern region. Data marts are useful for such focused analysis.Since the data warehouse is responsible for storing high volumes of historical and ever-growing data, a data warehouse solution should becost-effective and reliable and should always be available to other components for analysis and reporting.Reports, Dashboards and Key Performance MatrixAnalysis is the process of slicing and dicing a set of information to interpret a pattern that can be used to justify certain impact or for furtherplanning. The analytics engine works on data marts. The purpose of the analytics engine is to execute complex queries and present data withmultiple dimensions and measures. Dimensions and measures are key parameters in BI that help slice and dice information to make it moreprecise for decision makers.2 | Infosys – White Paper
  3. 3. Data presentation is a crucial component in analysis. The richer the presentation of the data to be analyzed, the better it is for decision makersto examine the information. This presentation layer helps in presenting reports, KPI matrix, and dashboards to the end user for slicing anddicing information. These rich reports also support ‘what-if’ scenario analyses.A BI system is an aggregation of multiple systems and sub-systems. Data storage, information slicing and dicing tools, and reporting or richvisualization interfaces are some of the multiple sub-systems of any typical BI system. This peculiarity of structure and integration createsinherent challenges. Let us look at the typical challenges faced by enterprises in implementing and using BI solutions.BI Implementation Challenges• Intermittent demands for storage Since a data warehouse is the backbone of the entire BI solution, it becomes important to manage this data warehouse properly to keep it running all the time. The data warehouse is a storehouse for large datasets, and it is not possible to keep the entire data active so that it may be used for on-demand analysis. In certain scenarios, historical data that has otherwise been inactive for some time may need to be activated. Activation of historical data involves obtaining the backed up tapes, retrieving the data, and loading and fitting it into the current activated data warehouse or data marts, all of which are by no means simple. Even if such a situation arises only once a month, it would still consume a considerable amount of IT operational resources. Storage demand increases with every such request because activation of inactive data adds to rather than taking away from the currently activated data. The need for extra storage capacity adds to the investment of hardware and the pressure of managing the same.• Sub-optimal utilization of resources As the BI solutions have been in place for many years, it is highly likely that the number of users, size of the storage, and complexity of the systems have increased. Increase in users adds pressure on the scalability of the solution, which might have been provisioned long ago. There is yet another possibility where an organization may have considered the rapid growth in the number of users, where the storage and other infrastructure capacities are planned upfront. In such cases, it is highly likely that the system may remain underutilized causing the loss of opportunity of using the same investment elsewhere. The scalability challenge is crucial in deciding utilization as well as smooth running of the system.• Lacking external dimension On-premise BI solutions are mostly oriented around the transactional data of the enterprises. They lack the external dimensions and measures of analysis, that are important for strategic analysis. A combination of internal data such as sales data and external data such as government collected data and industry trends can be used to get better insight and plan effective strategies. External environmental data is available through different data marketplaces, which can help enhance the quality of analytics. Increasing demand to factor external entities into the analysis is adding pressure on the design and flexibility of the BI solutions. Many a time, enterprises end up developing their own components or smaller, independent BI solutions to factor these external entities.• Lacking multi-channel delivery capabilities Most enterprises work with workforce spread all over the world. These geographically distributed stakeholders demand round-the-clock availability and accessibility from any place. Enterprises that had not factored this demand have ended up spending huge amounts of money and resources to address it. The need to make data warehouses and BI solutions available over the Internet with multiple delivery channels such as RIA, services, mobile and browsers is increasing. This quick, easy, perennial accessibility adds an edge to enterprises, facilitating them to collaborate better and take decisions quickly. Thus, it becomes essential for enterprises to make their BI platform available over the Internet. This requirement not only demands additional investment for infrastructure, but also adds to the additional integration touch points to address such requirements. Present businesses operate in highly dynamic environments influenced by factors such as changing business scenarios, change in compliances and governance processes, new integration requirements adding to the complexities of the systems, and increasing pressure on the system to be responsive. These challenges multiply with the increasing demand for dynamism in the business, processes, and technologies. It is important for every enterprise to address these challenges and make use of their BI investment to get the best results.3 | Infosys – White Paper
  4. 4. BI Solution Based on Cloud ComputingWith more and more devices getting meshed and inter-connected on the information highway, demand for data and everything related to itwill grow manifold. This information explosion will lead to the need of systems that can:• Process large amounts of data efficiently and in near real-time• Handle storage for data flowing in from the various systems and devices into storage units that can store large amounts of dataThe figure shown below depicts a typical information flow landscape of any large enterprise in the future. Thus, a BI solution has to meet thehigh volume requirements of an enterprise, which constantly exchanges information with multiple stakeholders, systems, and devices as partof its day-to-day operations. Regulatory Content Providers Field Agencies Devices/Appliances Enterprise – Geo1 Sales SCM CRM DW Analytical Transformation Portal & Engine Engine Reporting Enterprise – Geo2 Customers Sales SCM CRM Delivery Channels Suppliers Partners Figure 2: Typical Azure™ Business Intelligence Eco-SystemCloud computing, a new generation technology platform of deploying and delivering software services, addresses the growth requirementsof an enterprise and the commonly faced BI challenges. The value proposition delivered by cloud computing, which can address the needs ofthe BI platform for the future, includes:• Capability to process voluminous and rapidly-growing data over the Internet• Replication of machines, applications, and data storage at multiple instances to provide high availability• Dynamic, elastic capability to support scaling up and down of infrastructure within minutesImproved Cost EfficiencyManaging complexity and Total Cost of Ownership (TCO) using cloud storage solutions are relatively more appealing compared to traditionalRDBMS data solutions, especially in a data warehouse scenario that deals with handling historic or inactive data. With cloud storage, datacan be kept active at all times while avoiding the aide of the IT management to activate any historical data. Thus, cloud storage addressesthe challenges of intermittent data storage access, particularly when there is an urgent need to reload historical data, say to meetcompliance-related queries.4 | Infosys – White Paper
  5. 5. Elastic and ScalableA cloud-based solution offers users the capability to provide cloud resources such as computing services, storage services, and cache servicesinstantaneously. This infrastructure-level flexibility allows one to handle workload fluctuations, both planned and unplanned, in an elasticmanner without having to plan for any investments upfront. The elastic and scalable nature of the cloud, along with the pay-as-you-go model,aligns well with the enterprise needs such that the business gets a more transparent and assured view of its IT resource consumption.InteroperableSince the cloud is available over the Internet and can easily provide interoperable endpoints such as REST and SOAP, the architecture supportseasy integration with external services. Relatively easy and quick integration with externally available interface endpoints makes the enterprisesaccount for adding external dimensions to their analysis. These rich sets of external dimensions provide a platform for the enterprise to logicallyconsider factors for their analysis, be it competitor data, national/international growth data, neighborhood safety, climate effect, or new storesor services in the neighborhood.Available Anytime AnywhereThe cloud is available ubiquitously and can be accessed through standard http protocols. Enterprises do not have to spend extra money orresources to make the solution available over the Internet. Concerns such as provisioning and hardening are inconsequential with the cloud.The cloud helps enterprises support multiple delivery channels that allow information to reach stakeholders including employees, mobile fieldagents, and external partners easily.Even as the cloud computing platform is growing, different vendors are adding to the rich set of building blocks required to develop enterpriseapplications on the cloud. The basic principle in developing these building blocks is to be able to integrate easily and quickly. All the vendorsare striving for open and interoperable standards of integration, making it easier to use these enterprise application services on any cloudplatform. It also delivers the advantage of making the system agile to handle system changes required to address dynamic business andtechnical needs.These characteristics of the cloud computing platform enable the implementation of large BI solutions possible in an easy and relativelyinexpensive manner. Cloud computing platforms are maturing and cloud vendors are trying hard to increase the functional and technicalrichness of their offerings to drive innovations. These innovations would help enterprises in better management, easy decision making, and beingmore competitive.We will explore Microsoft Azure™, a public cloud platform that offers Platform as a Service (PaaS), for developing the next generation cloud-based BI solution. PaaS offers hosted scalable application servers with necessary supporting services such as storage, security, and integrationinfrastructure. PaaS platform also provides development tools and application building blocks to develop custom solutions on the cloud.Though we have selected PaaS for our proposed solution, there are two other cloud delivery models: Software as a Service (SaaS) and IaaS(Infrastructure as a service), which we will discuss briefly in this paper.5 | Infosys – White Paper
  6. 6. Azure™ Based BI SolutionWe will now attempt to explain a high-level design for a custom-built BI solution on Windows® Azure™.Let us first get acquainted with the Azure™ terminologies given in the following table: Windows® Azure™ A cloud operating system platform that provides the computing capability on a cloud Entity/Key value or tuple store-based service capabilities provided by Microsoft Azure™ to address large, Azure™ Table Storage structured, and scalable data storage Large and scalable data storage made available by Microsoft Azure™ for unstructured data such as Azure™ Blob Storage documents and media files Queue service offered by Microsoft Azure™ for message orchestration and asynchronous Azure™ Queue request processing Relational database capability similar to SQL Server made available by Microsoft Azure™ to address SQL Azure™ relational database capabilities on the cloud A web server instance to run web applications readily available at http/https endpoints for access. Web Role A web role is simply a web server provided by Microsoft Azure™ Worker Role A computing instance for executing long running processes on Microsoft Azure™ A role used to run a virtual hard disk image, store that image in the cloud, and load and run it on demand. VM Role The role is highly suited for moving legacy applications to the cloud with minimal effort A service-bus-like messaging platform on the cloud that allows on-premise applications to be available AppFabric Service Bus externally and to seamlessly connect with other systems A claim-based authorization service that supports federated access to enterprise systems and services AppFabric Access Control on the cloud. All authorization rules can be abstracted and managed from ACS independently out of Service (ACS) the application in a standard oriented way An information marketplace that acts as an external dataset provider, which would be consumed by the Windows® Azure™ Data BI stack to leverage external dimensioning metrics such as demographics, location, and other publically Marketplace available information to enrich the analytical reporting capabilities An identity management framework that externalizes identity-related logic from an application. Windows® Identity Federated single sign-on scenarios involving multiple stakeholders can be built on this framework. For Foundation (WIF) the enterprise, this will also help integrate on-premise Active Directory-based authentication with the Azure™ deployed applicationHigh-Level Design for Custom-Built BI Solution on Azure™Owing to concerns around data privacy, security, and data ownership, enterprises have been cautious in adopting cloud computing. However,at the same time, they have also shown a keen interest in leveraging the value proposition offered by the cloud and the potential opportunityit presents in growing their businesses.Keeping these key aspects in mind, a hybrid BI solution is proposed to alleviate enterprise challenges. As shown in the figure below, theproposed solution divides the architecture into two distinct facets – On-premise component and Cloud component.6 | Infosys – White Paper
  7. 7. Figure 3: High-Level Design for Custom-Built BI Solution on Azure™On-Premise ComponentsData Cleansing and Profiling AgentThis agent would be responsible for collating transactional and unstructured data from on-premise systems, cleansing the data, and uploadingit on a data warehouse developed on Azure™ table storage. This component can be extended to consider disparate data sources such as Oracle,SQL Server, mainframes, and excel data. Cleansing and profiling would also be configurable according to business needs to handle business-specific rules, such as soft-deleted data should not be uploaded and transactional data not in the published state should not be uploaded.The data transfer from agent to the cloud would happen over a secured channel. This agent is usually a part of the Extract Transform Load(ETL) component.Data Integration LayerBased on the criticality of information, an enterprise may have structure data categorized into different levels. We will discuss the different dataintegration approaches to cover mission critical and non-mission critical data.Exposing master data on the cloud without having to upload the master data on the cloud storage helps in maintaining data privacyand ownership in the hands of the enterprise. This would avoid the need to physically store confidential data such as credit card details,address information of customers, and salary information of employees on the cloud. It would instead be fetched from the enterprise as andwhen required.7 | Infosys – White Paper
  8. 8. An on-premise component that forms a part of the integration layer would help in exposing the master data to the cloud. Technically, this canbe achieved by leveraging the capabilities of the Azure™ AppFabric service bus. Azure™ AppFabric service bus, with its service virtualizationcapabilities, allows exposing on-premise components or services on the cloud without having to physically move the data outside the enterprise.The AppFabric service bus provides a publically accessible virtual endpoint on the cloud to any on-premise service endpoint it manages. Thischannel of communication between the Azure™ AppFabric service bus and the on-premise service can be secured at the transport level, whichwould be achieved by using SSL, and at the message level, which would be achieved by using standard encryption techniques.To avoid latency issues, which could be a cause of concern arising due to the external network hop between an on-premise and cloudenvironments, a distributed caching functionality can be implemented on the cloud. The analytical engine deployed on the cloud can beembedded with a caching component such as Azure™ AppFabric Cache to cache regularly-used master data and in turn reduce the effectsof latency.Data integration achieved using service virtualization addresses data security concerns, but this comes at the cost of performance. It is, thus,advisable that for non-critical data, the data be transported and made to reside physically on the cloud, closer to the hosted application. Thiscan be achieved by leveraging existing data integration techniques such as ETL, Change Data Capture (CDC), and Enterprise InformationIntegration (EII) implemented using a tool such as Microsoft’s SQL Server Integration Services (SSIS).Power Pivot’Power Pivot for Excel’ is a data analysis tool that delivers unmatched computational power directly within the application and with a tool suchas MS Excel, which users are fairly acquainted with. Power Pivot is a user-friendly way to perform data analysis using familiar Excel featuressuch as the common MS Office User Interface shell, PivotTable, PivotChart views, and slicers. Power Pivot helps users analyze data marts offlinewithout being connected to the online data marts. Power Pivot enables focused analysis on the data marts for on-premise and on-the-moveanalysts to access at their own convenience.ADFS 2.0ADFS 2.0 is an identity provider service that enables an enterprise-level identity federation solution. It is developed on Windows® IdentityFoundation (WIF) and makes it very easy to integrate with web applications for authentication/authorization from on-premise active directoryuse stores. The BI portal solution proposed here would implement claims-based authentication using WIF and ADFS 2.0 for allowing enterpriseusers to login to the system with their existing active directory credentials.Azure™ ComponentsCloud Data WarehouseAll the collated data uploaded by the cleansing and profiling agent would be stored in Azure™ table storage. Azure™ table storage is highlyscalable and is an appropriate fit for persisting de-normalized data due to its Entity Value Attribute (tuple store) style of storage. No analyticalprocessing or advanced queries would be run on the data warehouse. Hence, the economically cheaper Azure™ table storage is a relativelybetter option compared to relational data stores such as SQL Azure. The Azure™ storage, through blobs, can also persist metadata of the datawarehouse along with unstructured data such as files, documents, scanned images, and video files.The inexpensive storage capability delivered by table storage frees data warehouse administrators from having to deactivate historical data, apractice often followed in the earlier BI systems due to storage capacity limitations of on-premise storage facilities. CAPEX spending, normallyinvolved in expanding storage to meet enterprise growth, is also eliminated. However, due to the Pay-As-You-Use pricing model of WindowsAzure services, there would be a rise in the OPEX spending, but it would tend to align more closely with the demands of the growing business.A detailed assessment of the existing system along with a Y-O-Y ROI analysis of the Azure™ platform can help provide a clear picture in termsof overall savings and business value that can be realized in the future.Analytical EngineThe analytical engine is the most important component in the BI solution. The analytical engine:• Prepares data required for focused analysis• Applies algorithms for processing data based on different facts, measures, and dimensions• Analyzes structured and unstructured information to provide patterns and predicts trends that are usually difficult to spot with the naked eye or traditional reporting• Identifies cases or exceptions in the data to isolate or identify anomalies8 | Infosys – White Paper
  9. 9. As of now, the SQL Server Analysis Services are not provided as part of the SQL Azure™ services. Hence, it is imperative to build this customcomponent, which would achieve analysis services, cube formation, and querying cube-related functionalities on SQL Azure™.In the proposed solution, the analytical engine has the following parts:• Batch Process (Azure™ worker role): This Azure™ worker role would be responsible for the creation of data marts and offline reports. • Data-Mart Processor: Responsible for creating new data marts (SQL Azure™ tables) from the data warehouse (Azure™ table storage) for focused analysis. The multiple requests submitted by analysts from the BI portal to create data marts would be handled asynchronously by batch-processing requests, implemented using Azure™ queues. • Offline Report Generator: Responsible for generating standard reports periodically and storing it in the Azure™ blobs to make it readily available for the BI portal. This component would generate standard reports as per the configuration stored in the Azure™ table storage.• Real Time Analytics (Azure™ web role): This Azure™ web role is one of the most important components used for analysis. It would be responsible for fetching data from data marts and presenting it on the BI portal for analysis. BI portal presentation of dynamic reports and KPI matrix and generation of ad-hoc reports on existing data marts are achieved through this component. It services analysis requests synchronously on the existing data marts, making real-time analysis possible on the data marts. Note: With Windows® Azure™ version 1.6 release (November 2011), running SSAS off Azure VM roles is not supported by Microsoft. Hence, until Microsoft recognizes SSAS as a first class citizen of the cloud, we suggest using the data-mart processor approach.• Data Marts: Since the proposed data warehouse is created using Azure™ table storage, which is entity-value schema-based and non-relational, we propose to create data marts in the SQL Azure™ tables. This is primarily because existing analytical engines can also leverage the premium RDBMS capabilities offered by SQL Azure™ on the cloud without any changes. SQL Azure™ is a relational database and makes it easy to fetch data using complicated analytical queries. Power Pivot provides a quick and powerful analysis tool to be used with SQL Azure™. Moreover, the BI portal would be able to generate the desired reports and analyses out of SQL Azure™.• Application Data: Application data comprises configuration and customization data required as a part of the BI solution.• SQL Azure Reporting Services Reports: As part of the BI solutions, standard reports can be configured using SQL Azure™ Reporting Services (SARS) and can be made available from the BI portal.• Standard Reports: As part of the BI solution, there are standard reports needed to be generated on the data using the specific dimensions and measures. These standard reports can be generated in a batch process to reduce the latency and can be made available all the time. As explained previously, the batch analytics component running on the Azure™ worker role generates these reports periodically.• BI Portal: This is the web portal ported on Azure™ web role. It interacts with the analytical engine to generate dashboards, ad-hoc reports, and visual analyses of data from multiple dimensions and measures. This BI portal would be accessible everywhere over the Internet and would be made available over multiple delivery channels including desktop, mobile, and PDAs.• Windows Azure Data Marketplace Dataset External Measures: The analytics engine can be configured to use specific datasets exposed from Windows Azure™ data marketplace. These datasets would be used as an external measure, along with the data mart measures, for analysis. Examples of such datasets that can be used as external measures could be demographic information of customers, upcoming business/stores in nearby locations, and weather conditions impacting sales for specific locationDesign Considerations• Geo-location and affinity group: Applications developed on Windows® Azure™ can be deployed across multiple data centers located around the world – South Central US, North Central US, West Europe, East Europe, East Asia, and South East Asia. The Windows Azure global footprint is rapidly growing as Microsoft continues to build new global data centers for Azure™ deployment. Selection of appropriate data centers and creating an affinity group for deployment should be considered for the following reasons:9 | Infosys – White Paper
  10. 10. • Regional Legislations/Regulations — These are to address regulatory requirements of deploying the application and its data within a specific geographical location. There are a few compliance requirements that organizations have to abide by, to keep their data geographically close to the region of business operations. These requirements can be addressed by deploying the Azure™ application in an appropriate data center. • Performance — Data center proximity to end users would help in reducing network latency and improving overall application performance. Creating an affinity group for application and data instances would deploy these components within the same data center and would bring them closer. Inter-process communication within the same affinity group is faster and helps in improving application performance, especially when there would be a large amount of data transfers involved during activities such as reporting and data mart creation.• Caching: Caching frequently used data such as reference data and infrequently modified data would help reduce data access calls and latency in serving requests. Moreover, since there would be multiple roles running in the Azure™ load-balanced environment, we need to consider using distributed caching systems such as Windows Azure AppFabric Caching services or Distributed Memcached.• Partition keys for table storage: Partition keys used for data warehouse should not create too large partitions such that they are not able to run efficient queries on Azure™. We need to consider using partition keys in all queries for better performance.• Communication security for data in transit: We need to ensure transport level security using SSL. For highly confidential data, we need to consider using messaging-level security, such as encryption and signatures.• Processing Model: We could analyze business use-cases and choose the appropriate processing model between online and batch. Long running processes can be effectively scaled using the worker role approach for computation tasks. Message queue based asynchronous processing also provides data and processing reliability.• SQL Azure™ Partition: In case where data-mart size expands more than one database instance limit of 150 GB for SQL Azure™, consider horizontal partitioning of few tables. We could consider high-growth tables for partitioning and range-based keys or storing hash of keys to identify a specific partition.Other Cloud-Based BI Implementation ModelsAccording to US-based National Institute of Standards and Technology, the cloud is composed of three service models, namely, SaaS, PaaS, andIaaS. The design of the cloud-hosted BI solution explained in this paper was made by considering the boundaries of a PaaS cloud service model,realized using Microsoft Azure™. The other cloud models available for implementing BI solutions are as follows: • SaaS: This is the highest abstraction of the cloud. In this model, a finished application or solution is offered as a service. It is akin to a packaged product with support for limited customization offered through the cloud. Since it is a standard packaged solution, there may be limitations for enterprises to map their unique customizations and heterogeneous data stores to avail this solution. SaaS might be a good offering for smaller organizations to address their limited BI needs. • IaaS: This is the lowest abstraction of the cloud. In this model, vendors provide basic hardware and software infrastructure as a service. Customers need to deploy their software, ranging from the operating system to the end application. Using this model, enterprise will have to address the need of software licensing and deployment themselves, which limits the benefits of the cloud computing platform.Enterprises can select their cloud platform based on the criteria described in the figure below, driven by factors that make business sense in theirrespective domains.10 | Infosys – White Paper
  11. 11. Private Public On-Premise IAAS PAAS SAAS Business Intelligence Platform Evaluation Model Selection Criteria On-Premise IAAS PAAS SAAS Flexibility Ease of Management (Hardware, Software & Infrastructure) Control Functional Richness Application Building Blocks Security & Compliance Time to Market QoS (Scalability, Availability, Reliability & Performance) Preferred Procurement Buy Buy Build Subscribe Choice Figure 4: Business Intelligence Platform Evaluation ModelThe above evaluation model summarizes the business value realized in implementing a cloud-based BI solution on different cloud servicemodels. A model of this nature can help guide enterprises in selecting the most appropriate cloud service by mapping the expected outcomeof their BI initiatives to the business value realized from the different cloud service options available.Concerns About BI in Cloud/Azure™The cloud platform addresses most of the challenges faced by enterprises in implementing and managing a traditional on-premise BI solution.However, there are few concerns around cloud usage for implementing BI solutions. These concerns are common to any cloud implementationand are not specific to BI. Let us briefly discuss these concerns from a BI cloud adoption point of view. The most talked about concern is arounddata security and compliance.Enterprises have concerns about placing their confidential data on the cloud where it would get replicated onto multiple servers. Technically,the cloud technology treats all data in a similar fashion and that raises concerns around information security. To address this problempractically, there is a need to amend the compliance rules to cater to the technology evolution. At the same time, cloud vendors need toprovide mechanisms that can handle the need to meet compliance requirements more effectively. Until then, a hybrid solution as proposedin the high-level design in this paper, wherein critical data is stored on-premise but is exposed as a service for integration and aggregationpurpose and transactional data is stored in the cloud, is an option that can be explored.11 | Infosys – White Paper
  12. 12. ConclusionAs cloud computing is evolving and growing every day, it would bring on several distinct changes. We foresee changes in compliancerequirements and a mindset shift to make optimized use of the cloud technology from the decision support system perspective. BI, aselucidated, has a peculiar nature; it would need a customized solution approach. An integrated BI solution formed from a combination of on-premise deployments, as well as cloud-based deployments, is the most suitable option available not only to realize the cloud benefits but alsoto address enterprise concerns around the cloud.This paper has discussed in detail how Microsoft Azure™ can be a good fit for an enterprise willing to optimize yet futuristically enrich itssolution. This paper also envisages an integration pattern for hybrid in-cloud and on-premise solutions developed using Windows® Azure™.This pattern is not limited to BI solutions; it can also be used in multiple problem domains such as disaster recovery, data backup, seasonalcampaigning, and collaboration solution. We hope to see a lot of interest generated in developing a green field BI solution, migrating anexisting BI solution, or using the proposed aggregation design for implementing solutions on Windows® Azure™.Referenceshttp://www.powerpivot.com/http://msdn.microsoft.com/en-us/security/aa570351.aspx About the Author Sidharth Subhash Ghag (Sidharth_ghag@infosys.com) is a Senior Technology Architect with the Microsoft Technology Center (MTC) in Infosys. With several years of software industry experience, he currently leads solutions in Microsoft Technologies in the area of Cloud computing. He has also worked in the areas of SOA and service-enabling mainframe systems and on domains such as Finance, Utilities, and Transportation. He has been instrumental in helping Infosys clients with service orientation of their legacy systems. Currently, he helps customers adopt Cloud computing within their Enterprise. He has authored papers on Cloud computing and service-enabling mainframe systems. Sidharth blogs at http://www.infosysblogs.com/cloudcomputingAcknowledgementSachin Kumar Sancheti, Technical Architect, for his immense contribution in preparing the initial draft and for technical input provided duringhis tenure in the organization.Yogesh Bhatt, Principal Architect, Infosys Labs and Sudhanshu Hate, Senior Technology Architect, Infosys Labs for paper review.About InfosysMany of the worlds most successful organizations rely on Infosys todeliver measurable business value. Infosys provides business consulting,technology, engineering and outsourcing services to help clients in over30 countries build tomorrows enterprise.For more information, contact askus@infosys.com www.infosys.com© 2012 Infosys Limited, Bangalore, India. Infosys believes the information in this publication is accurate as of its publication date; suchinformation is subject to change without notice. Infosys acknowledgesthe proprietary rights of the trademarks and product names of other companies mentioned in this document.