Your SlideShare is downloading. ×

Enabling Cloud Analytics with Data-Level Security


Published on

Booz Allen’s data lake approach enables agencies to embed security controls within each individual piece of data to reinforce existing layers of security and dramatically reduce risk. Government …

Booz Allen’s data lake approach enables agencies to embed security controls within each individual piece of data to reinforce existing layers of security and dramatically reduce risk. Government agencies – including military and intelligence agencies – are using this proven security approach to secure data and fully capitalize on the promise of big data and the cloud.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. by Jason Escaravage Peter Guerra Enabling Cloud Analytics with Data-Level Security Tapping the Full Value of Big Data and the Cloud
  • 2. Table of Contents Introduction........................................................................................................................ 1 The Cloud Analytics Imperative............................................................................................. 1 Embedding Data-Level Security in the Cloud.......................................................................... 2 Implementing Data-Level Security......................................................................................... 5 JIEDDO Bolsters Cloud Analytics with Data-Level Security....................................................... 5 Conclusion......................................................................................................................... 6 Appendix: Cloud Analytics Reference Architecture.................................................................. 7
  • 3. D
  • 4. 1 Introduction We are entering an era of big data and cloud computing. The combination, termed “cloud analytics,” holds enormous promise for improved productivity, cost savings, and enhanced mission performance. The Big Data Research and Development Initiative, launched by the White House Office of Science and Technology Policy (OSTP) in March 2012, underscores a growing recognition that big data analytics can help solve some of the nation’s most complex problems. Developed by OSTP in concert with several federal departments and agencies, the big data initiative provides funding and guidance aimed at improving our ability to collect, store, preserve, manage, analyze, and share huge quantities of data, with the ultimate goal of harnessing big data technologies to accelerate the pace of discovery in science and engineering, strengthen national security, and transform teaching and learning.1 Despite the evident benefits of cloud analytics, many federal leaders hesitate to adopt a cloud-based services model because of worries about both costs and security. How will my organization pay for these new capabilities? And will our data be secure in the cloud? How do we secure data in the cloud while still meeting our information sharing obligations? These are legitimate questions, particularly given today’s constrained fiscal environment and government’s strict privacy and security requirements. Booz Allen Hamilton’s viewpoint, “Developing a Business Case for Cloud-based Services,” shows how agencies can address cost concerns through a combination of cost- savings and productivity gains that more than justify their cloud investments.2 The current viewpoint examines how an innovation in cloud data storage and management known as a “data lake” is opening new avenues for agencies to meet their security and compliance requirements in a cloud environment. The data lake approach enables agencies to embed security controls within each individual piece of data to reinforce existing layers of security and dramatically reduce risk. Government agencies — including military and intelligence agencies —  are using this proven security approach to secure data and fully capitalize on the promise of big data and the cloud. The Cloud Analytics Imperative To understand the power of cloud analytics, it helps to see the progression from basic data analytics performed in most organizations today to cloud analytics (Exhibit 1). As a system is built out along the continuum to cloud analytics, the size and scale of data the system can process increases, along with its analytic capabilities. The combination of large datasets and powerful analytics create a platform — cloud Tapping the Full Value of Big Data and the Cloud Enabling Cloud Analytics with Data-Level Security 1 “Obama Administration Unveils ‘Big Data‘ Initiative: Announces $200 Million In New RD Investments.” press_release_final_2.pdf. 2 For more information about Booz Allen’s Cloud Cost Model, see our viewpoint, “Developing a Business Case for Cloud-based Services,” available at insights/insight-detail-spec/concepts-in-the-cloud. Exhibit 1 | Progression to Cloud Analytics Source: Booz Allen Hamilton
  • 5. 2 analytics — for enormous leaps forward in problem solving, decisionmaking, and overall performance. Numerous factors are driving federal agencies to adopt cloud analytics. The Office of Management and Budget (OMB) mandated a rapid move to embrace “infrastructure as a service” in its “Federal Cloud Computing Strategy,” issued in February 2011. The cloud-first strategy called for agencies to begin by moving at least three services to the cloud within 18 months, so they could begin harnessing the anticipated savings and efficiencies. For example, cloud computing facilitates federal efforts to consolidate data centers, improve server utilization, and reduce the energy footprint and management costs associated with data centers. Agencies can also reduce costs and improve IT performance with cloud-based services that enable rapid provisioning, efficient use of resources, and greater agility in adopting new technologies and solutions. Another key driver is the desire to achieve cost efficiencies by consolidating stove-pipes of data — basically assessing legacy systems to identify integration opportunities, consolidating interfaces, and so on. For example, an agency that maintains 15 separate data systems would look to consolidate them down to just 1, with an eye to reducing overall IT “cost of ownership.” However, with that consolidation comes a host of the security concerns. Security is also a key component in the White House’s “Digital Government Strategy,” which calls for agencies to make better use of digital technologies, including analytics for data-driven decisionmaking. Finally, the White House’s “Big Data Research and Development Initiative” would exploit the fast-growing volume of federal data using cloud-based services and emerging analytics tools. Cloud analytics offers a wealth of potential insights and benefits in medicine and healthcare, military operations, intelligence analysis, fraud detection, border protection, anti-terrorism, and other critical government missions. Together, cloud computing and data analytics provide a foundation for productivity gains and enhanced mission performance too compelling to ignore. The question is: How can agencies realize these benefits while also ensuring security and compliance? Embedding Data-Level Security in the Cloud Many organizations today rely on techniques and approaches for storing and accessing data that were created before the advent of the cloud and big data. These legacy approaches typically store data in “siloed” servers that house different types of data based on a variety of characteristics, such as their source, function, and security restrictions, or whether they are batch, streaming, structured, or unstructured. Security approaches for protecting data “at rest” have naturally focused on protecting the individual silos that store the data. Unfortunately, these approaches for storing and securing data create significant challenges for cloud analytics. The cloud’s value stems from its ability to bring together vast amounts of data from multiple sources and in multiple combinations for analysis — and to do so quickly and efficiently. Rigid, regimented silos make the data difficult to access and nearly impossible to mix and use all at once, reducing the effectiveness of the analytical tools. Organizations can build bridges between silos to enable sharing and analysis, but this approach becomes increasingly cumbersome and costly as more and more bridges are required to facilitate sharing among multiple combinations of databases. In addition, it becomes more difficult to determine who is accessing the data, what they do with it, and why they need it across all their systems because there is no record of data provenance, data lineage, or data access. Combining data from databases that have different levels of security is especially problematic, often requiring designation of the mixed data (and resulting analysis) with high levels of security restrictions. Another complicating factor for many organizations is that some of the more effective methods for protecting data — such as using polymorphous techniques, mixing bogus data with real data, changing where the data resides, and disaggregating data — become difficult to implement as the datasets become larger and
  • 6. 3 larger. These techniques do not scale easily with the data. Ultimately, conventional approaches for securing data become impossible to sustain in a growing cloud environment, and the full potential of cloud analytics remains unfulfilled. The new, complex cloud environment requires organizations to re-imagine how they store, manage, and secure data to facilitate the free flow and mixing of different types of data. An innovative approach called the data lake has proven extremely effective in addressing the challenges of managing and securing large, diverse datasets in the cloud. Rather than storing data in siloed servers, this approach ingests all data — structured, unstructured, streaming, batch, etc. — into a common storage pool: the data lake. As data enters the data lake, each piece is tagged with security information — security metadata — that embeds security within the data. The metadata tags can control (or prescribe) security parameters such as who can access the data; when they can access the data; what networks and devices can access the data; and the regulations, standards, and legal restrictions that apply. Security resides within and moves with the data, whether the data is in motion or at rest. As a result, organizations can confidently mix multiple datasets and provide analysts with fast and efficient access to the data, knowing the security tags will remain permanently attached to the data. Before examining how security metadata is attached to the data, it is important to understand the types of security controls needed in a cloud environment. Within the cloud, data is typically shared among multiple users, devices, networks, platforms, and applications; consequently, effective cloud security encompasses three essential activities: identity management, configuration management, and compliance. Identity management is critical to ensure that the right people — and only those people — have access to the different types of data. For most government and commercial organizations, the requirements for multilevel identity management complicate this task because they give some employees access to some but not all types of information, such as top- secret intelligence reports or proprietary financial data. Cloud-based data is also shared across many different types of platforms, applications, and devices, which further complicates the security task, because employees might be authorized to access some data only from specific types of devices (e.g., a secure computer located within a government building) or only on authorized networks (e.g., a secure intranet). Consequently, secure cloud–based systems require effective configuration management to manage data access for many combinations of approved networks, platforms, and devices, while also taking into account user identities and authorizations. Finally, organizations require security controls to ensure they comply with relevant regulations and standards as data is accessed, shared, and analyzed. For example, federal agencies must comply with a host of security standards and authorizations, such the Federal Information Security Management Act (FISMA) National Institute of Standards and Technology (NIST) security standards and guidelines, Health Insurance Portability and Accountability Act (HIPAA) privacy requirements, and the Federal Risk and Authorization Management Program (FedRAMP) program for accreditation of cloud products and services. The data lake enables organizations to address these security requirements efficiently and effectively through the security tags attached to the data as it flows into and out of the data lake. In carrying out this security function, the data lake acts as though it were a massive spreadsheet with an infinite number of columns and rows, and each cell within the spreadsheet contains a unique piece of data, with a defined set of security conditions or restrictions. As each piece of data enters the lake and is tagged, it is assigned to its cell, along with its particular security parameters. For example, a piece of data could be tagged with information describing who can use the data, as well as with information describing the types of approved devices, networks, platforms, or locations. The tags could also describe the types of compliance
  • 7. 4
  • 8. 5 regulations and standards that apply. And the tags could contain the dimension of time, thus helping organizations maintain the integrity of the data and have a record of changes over time. Similarly, the tags could allow certain people access to all historical data while limiting others to just the most recent data; or the tags could embed an expiration date on the data. Many data elements will have multiple security descriptors; there are no limits to the number or combinations assigned. Every piece of data is tagged with security metadata describing the applicable security restrictions and conditions of its use. Also noteworthy, organizations can code the tags to recognize and work with security controls in the other layers of the architecture — that is, with the infrastructure, platform, application, and software layers. In this way, data-level security complements and reinforces the identity management, configuration management, and compliance controls already in place (or later implemented) while also facilitating the free flow of data that gives cloud computing and analytics their power.3 For example, the data lake approach uses an identity management system that can handle Attribute-Based Access Control (ABAC), a public key infrastructure (PKI), to protect the communications between the servers and to bind the tags to the data elements, and a process for developing the security controls to apply to each data element. These technology elements are usually combined with an organization’s existing security policies and are then applied as analytics on top of the data once it is ingested. In addition, unlike many conventional security techniques, data tagging can easily scale with an organization’s expanding infrastructure, datasets, devices, and user population. Implementing Data-Level Security The data-level security made possible by the data lake approach can be used within a variety of cloud frameworks. A number of federal agencies have recently implemented it with great success using the Cloud Analytics Reference Architecture, a breakthrough approach for storing, managing, securing, and analyzing data in the cloud.4 Developed by Booz Allen Hamilton in collaboration with its US government partners, the Cloud Analytics Reference Architecture automatically tags each piece of data with security metadata as the data enters the data lake. Organizations can use a variety of commercial off-the-shelf (COTS) or government off-the-shelf (GOTS) tools, including open-source tools, to tag the data. The tagging technology — basically a preprocessor with the ability add metadata to data streams — has not proven difficult to implement. However, resolving the policy and legal issues surrounding the sharing and mixing of data can be problematic. The complex process to decide which policies and laws apply to which pieces of data requires a determined effort by the relevant stakeholders and decisionmakers. Each organization is different and so will apply the rules, standards, laws, and policies in accordance with its culture and mission. However, once these decisions are made and the appropriate mechanisms are put in place, the security metadata can be attached automatically based on the agreed-upon, preconfigured rules addressing the relevant aspects of security, including identity management, configuration management, and compliance. JIEDDO Bolsters Cloud Analytics with Data-Level Security A government organization that is successfully implementing data-level security within the Cloud Analytics Reference Architecture is the Joint Improvised Explosives Device Defeat Organization (JIEDDO). Established in 2006, JIEDDO seeks to improve threat intelligence-gathering, acquire counter-IED technologies and solutions, and develop counter-IED training for US forces. To identify and eliminate threats, JIEDDO analysts constantly comb through hundreds of different data sources, such as message traffic from the intelligence community, operations summaries from 3 In addition to applying metadata security tags to their data, organizations can also encrypt selected pieces of data to further control access and risk. As with other security controls that organizations put in place, the decision to encrypt data should be determined by an assess- ment of the overall benefits relative to the costs and risks of encrypting the information. 4 For an overview of the Cloud Analytics Reference Architecture, see the Appendix.
  • 9. 6 on-the-ground deployed units, RS feeds, news reports, websites, and other open sources. The diverse sets of data enter JIEDDO in every kind of format. Combining all of JIEDDO’s information so that analysts could conduct a single search was difficult and sometimes impossible before JIEDDO adopted the Cloud Analytics Reference Architecture and data-security tagging. Typically, analysts were forced to query separate databases using processes and tools that were specific to each database, which meant the analysts needed to master each database and format. After receiving the results, analysts would then manually combine the results to find the answers they were seeking. The process, although valuable, could be cumbersome and time consuming, even for thosewith experience and expertise in using the databases. In contrast, the Cloud Analytics Reference Architecture allows analysts to run a single query of all JIEDDO’s data because the data is stored together in the data lake. When looking for patterns and trends, such as what types of IEDs certain groups are using or where the danger spots are located, analysts can tap every available source. Analysts can also ask any type of question regarding information in the data lake; in contrast, the types of questions that analysts can ask using conventional databases are often limited by how the data is formatted. In addition, one of the benefits of security tagging is that it creates hierarchies of access control to identify who can and cannot see the data and the analytical results. This is extremely important for JIEDDO, because it supports the US military and international security assistance forces. Security tagging enables analysts and commanding officers to more readily share information with foreign allies because the metadata protects the data. Previously, without such tagging, valuable information and analyses often defaulted to the highest level of security, thus limiting their usefulness because the information and analyses could not be widely shared. Data tagging and the Cloud Analytics Reference Architecture are enabling JIEDDO to more effectively carry out its mission responsibilities to analyze intelligence, attack terrorist networks, and protect US and coalition forces from IEDs. Conclusion Federal chief information officers and IT managers overwhelmingly cite security as their chief concern when moving to cloud computing. Many fear a loss of control over their data. Data-level security within a data lake addresses their concerns by providing security that is fine-grained and expressive. It is expressive in that organizations can tag their data with a limitless number of security and business rules; and it is fine-grained in that organizations can affix those rules with rigorous, detailed precision to specify approved user identities, devices, physical locations, networks, and applications, applicable privacy and security regulations, and other security parameters to each piece of data. Data tagging also reinforces existing layers of security embedded at the infrastructure, platform, application, and network levels. And the metadata tags embed each piece of data with security throughout its lifecycle, from data generation to data elimination when the hard drive and data are destroyed. Together, the data lake and data-level security represent an entirely new approach that gives both government and business organizations a powerful tool to solve their most complex problems. By re-imagining data security in the cloud, organizations can unlock the full value of cloud analytics to address scientific, social, and economic challenges in ways that were unimaginable a decade ago.
  • 10. 7 Appendix: Cloud Analytics Reference Architecture The Cloud Analytics Reference Architecture, as shown in Exhibit 2, is built on a cloud computing and network infrastructure that ingests all data — structured, unstructured, streaming, batch, etc. — into a common storage pool called a data lake. Storing data in the data lake has many advantages over conventional techniques. It is stored on commodity hardware and can scale rapidly in performance and storage. This gives the data lake the flexibility to expand to accommodate the natural growth of an organization’s data, as well as additional data from multiple outside sources. Thus, unlike conventional approaches, it enables organizations to pursue new analytical approaches with few changes, if any, to the underlying infrastructure. It also precludes the need for building bridges between data silos, because all of the information is already stored together. Perhaps most important, the data lake treats structured and unstructured data equally. There is no “second-class” data based on how easy it is to use. Given that an estimated 80 percent of the data created today is unstructured, organizations must have the ability to use this data. Overall, the data lake makes all of the data easy to access and opens the door to the more efficient and effective use of big data analytical tools. The Cloud Analytics Reference Architecture also allows computers to take over much of the work, freeing people to focus on analysis and insight. As data flows into the data lake, it is automatically tagged and indexed for analytics and services. Unlike in conventional approaches, the data is not pre-summarized or pre-categorized as structured or unstructured or by its different locations (given that all data is stored in the data lake), but rather for indexing, sorting, identification, and security across multiple dimensions. The data lake smoothly accepts all types of data, including unstructured data, through this automated tagging process. When organization are ready to apply analytic tools to the data, pre-analytics filers help sort the data and prepare it for deeper Exhibit 2 | Primary Elements of the Cloud Analytics Reference Architecture Source: Booz Allen Hamilton Streaming Indexes Human Insights and Actions Enabled by customizable interfaces and visualizations of the data Analytics and Services Your tools for analysis, modeling, testing, and simulations Data Management The single, secure repository for all of your valuable data Infrastructure The technology platform for storing  and managing your data Services (SOA) Analytics and Discovery Views and Indexes Data Lake Metadata Tagging Data Sources Infrastructure/ Management Visualization, Reporting, Dash-boards, and Query Interface
  • 11. 8
  • 12. 9 analysis, using the tags to locate and pull out the relevant information from the data lake. Pre-analytical tools are also used in the conventional approach, but they are typically part of a rigid structure that must be reassembled as inquiries change. In contrast, the pre- analytics in the Cloud Analytics Reference Architecture are designed for use with the data lake, and so are both flexible and reusable. The Cloud Analytics Reference Architecture opens up the enormous potential of big data analytics in multiple ways. For example, it removes the constraints created by data silos. Rather than having to move from database to database to pull out specific information, users can access all of the data at once, including data from outside sources, expanding exponentially the spectrum of analysis. This approach also expands the range of questions that can be asked of data through multiple analytic tools and processes, including: • Ad hoc queries. Unlike conventional approaches, where analytics are part of the narrow, custom- built structure, in the Cloud Analytics Reference Architecture, analysts are free to pursue ad hoc queries employing any line of inquiry, including improvised follow-up questions that can yield particularly valuable results. • Machine learning. Analytics can search for patterns examining all of the available data at once without needing to hypothesize in advance what patterns might exist. • Alerting. An analytic alert notifying an organization that something unexpected has occurred — such as an anomaly in a pattern — can signal important changes and trends in cyber threats, enemy activities, health and disease status, consumer behavior, market activity, and other areas. The Cloud Analytics Reference Architecture also supports interfaces and visualization dashboards to contextualize and package the insights, patterns, and other results for decisionmakers. Although the Cloud Analytics Reference Architecture opens a wide aperture to data, it incorporates visualization and interaction tools that present the analyses in clear formats tailored to the specific issues and decisions at hand, enabling insight and confident action by decisionmakers. A number of defense, civilian, and intelligence agencies are already using the Cloud Analytics Reference Architecture to generate valuable insights and achieve mission goals previously unattainable in conventional cloud environments. For example, the US military is using the Cloud Analytics Reference Architecture to search for patterns in war zone intelligence data, mapping out convoy routes least likely to encounter IEDs. The Centers for Medicare and Medicaid Services (CMS) are using this approach to combat fraud by analyzing mountains of data, which enables CMS to assess doctors and others who bill Medicare on their risk to commit fraud. And intelligence agencies are using this new cloud architecture to apply aggressive indexing techniques and on-demand analytics across the agencies’ massive and increasing volume of both structured and unstructured data. Booz Allen itself is also adopting the Cloud Analytics Reference Architecture to maximize its cloud analytics capabilities, both for the firm and its clients. Many organizations today have an urgent need to make sense of data from diverse sources, including those that have previously been inaccessible or extremely difficult to use, such as streams of unstructured data from social networks or remote sensors. The Cloud Analytics Reference Architecture enables analysts and decisionmakers to see new connections within all of this data to uncover previously hidden trends and relationships. Organizations can extract real business and mission value from their data to address pressing challenges and requirements, while improving operational effectiveness and overall performance.
  • 13. 10
  • 14. 11 About Booz Allen Hamilton ContactsBooz Allen Hamilton has been at the forefront of strategy and technology consulting for nearly a century. Today, Booz Allen Hamilton is a leading provider of management and technology consulting services to the US and international governments in defense, intelligence, and civil sectors, and to major corporations, institutions, and not-for-profit organizations. In the commercial sector, the firm focuses on leveraging its existing expertise for clients in the financial services, healthcare, and energy markets, and to international clients in the Middle East. Booz Allen Hamilton offers clients deep functional knowledge spanning strategy and organization, engineering and operations, technology, and analytics—which it combines with specialized expertise in clients’ mission and domain areas to help solve their toughest problems. The firm’s management consulting heritage is the basis for its unique collaborative culture and operating model, enabling Booz Allen Hamilton to anticipate needs and opportunities, rapidly deploy talent and resources, and deliver enduring results. By combining a consultant’s problem-solving orientation with deep technical knowledge and strong execution, Booz Allen Hamilton helps clients achieve success in their most critical missions—as evidenced by the firm’s many client relationships that span decades. Booz Allen Hamilton helps shape thinking and prepare for future developments in areas of national importance, including cybersecurity, homeland security, healthcare, and information technology. Booz Allen is headquartered in McLean, Virginia, employs approximately 25,000 people, and had revenue of $5.86 billion for the 12 months ended March 31, 2012. For over a decade, Booz Allen’s high standing as a business and an employer has been recognized by dozens of organizations and publications, including Fortune, Working Mother, G.I. Jobs, and DiversityInc. More information is available at (NYSE: BAH) Jason Escaravage Principal 703-902-5635 Peter Guerra Senior Associate 301-497-6754
  • 15. The most complete, recent list of offices and their addresses and telephone numbers can be found on Principal Offices Huntsville, Alabama Montgomery, Alabama Sierra Vista, Arizona Los Angeles, California San Diego, California San Francisco, California Colorado Springs, Colorado Denver, Colorado District of Columbia Pensacola, Florida Sarasota, Florida Tampa, Florida Atlanta, Georgia Honolulu, Hawaii O’Fallon, Illinois Indianapolis, Indiana Leavenworth, Kansas Radcliff, Kentucky Aberdeen, Maryland Annapolis Junction, Maryland Lexington Park, Maryland Linthicum, Maryland Rockville, Maryland Troy, Michigan Kansas City, Missouri Omaha, Nebraska Red Bank, New Jersey New York, New York Rome, New York Fayetteville, North Carolina Cleveland, Ohio Dayton, Ohio Philadelphia, Pennsylvania Charleston, South Carolina Houston, Texas San Antonio, Texas Abu Dhabi, UAE Alexandria, Virginia Arlington, Virginia Chantilly, Virginia Charlottesville, Virginia Falls Church, Virginia Herndon, Virginia Lorton, Virginia McLean, Virginia Norfolk, Virginia Stafford, Virginia Seattle, Washington ©2013 Booz Allen Hamilton Inc. 12.032.12M