Data Lake-based Approaches to Regulatory-Driven Technology ChallengesBooz Allen Hamilton
Booz Allen Hamilton has found that a data lake-based approach to CA3 requirements is scalable, extensible, and improves the range and sophistication of analyses that can be supported while providing higher levels of data control and security.
Shifting Risks and IT Complexities Create Demands for New Enterprise Security...Booz Allen Hamilton
Holistic Cyber Risk Management Programs in the Financial Industry Must "Predict and Prevent" in Today's Complex Threat Environment, says new White Paper.
Building an Infrastructure that Secures and Protects
In June and July 2011, the Economist Intelligence Unit conducted a global survey, sponsored by Booz Allen Hamilton, of 387 executives to assess attitudes toward cybersecurity, and their progress towards implementing resilience strategies. Learn more: http://www.boozallen.com/insights/expertvoices/cyber-power
Booz Allen’s data lake approach enables agencies to embed security controls within each individual piece of data to reinforce existing layers of security and dramatically reduce risk. Government agencies – including military and intelligence agencies – are using this proven security approach to secure data and fully capitalize on the promise of big data and the cloud.
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...Booz Allen Hamilton
; Booz Allen’s data lake approach enables agencies to embed security controls within each individual piece of data to reinforce existing layers of security and dramatically reduce risk. Government agencies – including military and intelligence agencies – are using this proven security approach to secure data and fully capitalize on the promise of big data and the cloud.
Cloud computing offers a very important approach to achieving lasting strategic advantages by rapidly adapting to complex challenges in IT management and data analytics. This paper discusses the business impact and analytic transformation opportunities of cloud computing. Moreover, it highlights the differences among two cloud architectures—Utility Clouds and Data Clouds—with illustrative examples of how Data Clouds are shaping new advances in Intelligence Analysis.
Booz Allen's Cloud cost model offers a total-value perspective on IT cost that evaluates the explicit and implicit value of a migration to cloud-based services.
Data Lake-based Approaches to Regulatory-Driven Technology ChallengesBooz Allen Hamilton
Booz Allen Hamilton has found that a data lake-based approach to CA3 requirements is scalable, extensible, and improves the range and sophistication of analyses that can be supported while providing higher levels of data control and security.
Shifting Risks and IT Complexities Create Demands for New Enterprise Security...Booz Allen Hamilton
Holistic Cyber Risk Management Programs in the Financial Industry Must "Predict and Prevent" in Today's Complex Threat Environment, says new White Paper.
Building an Infrastructure that Secures and Protects
In June and July 2011, the Economist Intelligence Unit conducted a global survey, sponsored by Booz Allen Hamilton, of 387 executives to assess attitudes toward cybersecurity, and their progress towards implementing resilience strategies. Learn more: http://www.boozallen.com/insights/expertvoices/cyber-power
Booz Allen’s data lake approach enables agencies to embed security controls within each individual piece of data to reinforce existing layers of security and dramatically reduce risk. Government agencies – including military and intelligence agencies – are using this proven security approach to secure data and fully capitalize on the promise of big data and the cloud.
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...Booz Allen Hamilton
; Booz Allen’s data lake approach enables agencies to embed security controls within each individual piece of data to reinforce existing layers of security and dramatically reduce risk. Government agencies – including military and intelligence agencies – are using this proven security approach to secure data and fully capitalize on the promise of big data and the cloud.
Cloud computing offers a very important approach to achieving lasting strategic advantages by rapidly adapting to complex challenges in IT management and data analytics. This paper discusses the business impact and analytic transformation opportunities of cloud computing. Moreover, it highlights the differences among two cloud architectures—Utility Clouds and Data Clouds—with illustrative examples of how Data Clouds are shaping new advances in Intelligence Analysis.
Booz Allen's Cloud cost model offers a total-value perspective on IT cost that evaluates the explicit and implicit value of a migration to cloud-based services.
IABE Big Data information paper - An actuarial perspectiveMateusz Maj
We look closely on the insurance value chain and assess the impact of Big Data on underwriting, pricing and claims reserving. We examine the ethics of Big Data including data privacy, customer identification, data ownership and the legal aspects. We also discuss new frontiers for insurance and its impact on the actuarial profession. Will actuaries will be able to leverage Big Data, create sophisticated risk models and more personalized insurance offers, and bring new wave of innovation to the market?
Another great content/horrendous stock photo "presentation" from IT Business Edge about Big Data. (http://www.itbusinessedge.com/slideshows/big-data-eight-facts-and-eight-fictions.html)
Python's Role in the Future of Data AnalysisPeter Wang
Why is "big data" a challenge, and what roles do high-level languages like Python have to play in this space?
The video of this talk is at: https://vimeo.com/79826022
Big data security and privacy issues in theIJNSA Journal
Many organizations demand efficient solutions to store and analyze huge amount of information. Cloud computing as an enabler provides scalable resources and significant economic benefits in the form of reduced operational costs. This paradigm raises a broad range of security and privacy issues that must be taken into consideration. Multi-tenancy, loss of control, and trust are key challenges in cloud computing environments. This paper reviews the existing technologies and a wide array of both earlier and state-ofthe-art projects on cloud security and privacy. We categorize the existing research according to the cloud reference architecture orchestration, resource control, physical resource, and cloud service management layers, in addition to reviewing the recent developments for enhancing the Apache Hadoop security as one of the most deployed big data infrastructures. We also outline the frontier research on privacy-preserving data-intensive applications in cloud computing such as privacy threat modeling and privacy enhancing solutions.
As we enter the digital economy, it becomes increasingly transparent that the information and data ecosphere will continue to be a complex environment for the foreseeable future, with information being provided from a variety of internal and external sources in the form of files, messages, queries and streams. It would be foolish for any organization to place their bets on any one platform to be their platform of choice because it is incongruent to the thought patterns of the consumers, suppliers, regulators, partners and financiers who will participate in their information ecosphere through data feeds, information requests and a host of other interfaces.
Rather, there is a role of each of these platforms which serve as the conduit for data and the transformation of data into information aligned with the value propositions of the organization. This writing is focused on the big data platform because there are some unique characteristics of the big data environment that require an approach different than many of the legacy environments that exist in organizations. Furthermore, while big data is the one environment that is new and requires these special handling characteristics, there will be future platforms with the same requirements as big data requires today, and hopefully lessons learned will be left to not revisit each of the challenges as the next transformational information ecosphere is made available.
Figure 1 The Fourth Industrial Revolution, World Economic Forum, InfoSight Partners, 2016
This time is different, in that information is the catalyst to achieving value and the platform ideally suited to house information not optimal for storage in the form of rows and columns is the big data environment. Understanding which information is delivered with intended consequences and having the management prowess to tune information shared with customers, prospects, suppliers, partners, regulators and financiers is critical for the digital economy. Additionally, it is specific to understand the challenges each platform housing information bring to the equation. This writing will focus on big data.
The advent of Big Data has presented nee challenges in terms of Data Security. There is an increasing need of research
in technologies that can handle the vast volume of Data and make it secure efficiently. Current Technologies for securing data are
slow when applied to huge amounts of data. This paper discusses security aspect of Big Data.
We are optimistic that the United States can
strengthen critical infrastructure cybersecurity through
a government-industry partnership that builds a
robust Cybersecurity Framework, shares threat
data, and collaborates on achieving national cyber
goals. Although we don’t discount the challenges
of bringing together such large and diverse
groups of stakeholders, we believe that emerging
cyber technologies and capabilities have created
opportunities for success that did not exist 15
years ago when government first initiated "whole of
government" efforts similar to the Executive Order.
IABE Big Data information paper - An actuarial perspectiveMateusz Maj
We look closely on the insurance value chain and assess the impact of Big Data on underwriting, pricing and claims reserving. We examine the ethics of Big Data including data privacy, customer identification, data ownership and the legal aspects. We also discuss new frontiers for insurance and its impact on the actuarial profession. Will actuaries will be able to leverage Big Data, create sophisticated risk models and more personalized insurance offers, and bring new wave of innovation to the market?
Another great content/horrendous stock photo "presentation" from IT Business Edge about Big Data. (http://www.itbusinessedge.com/slideshows/big-data-eight-facts-and-eight-fictions.html)
Python's Role in the Future of Data AnalysisPeter Wang
Why is "big data" a challenge, and what roles do high-level languages like Python have to play in this space?
The video of this talk is at: https://vimeo.com/79826022
Big data security and privacy issues in theIJNSA Journal
Many organizations demand efficient solutions to store and analyze huge amount of information. Cloud computing as an enabler provides scalable resources and significant economic benefits in the form of reduced operational costs. This paradigm raises a broad range of security and privacy issues that must be taken into consideration. Multi-tenancy, loss of control, and trust are key challenges in cloud computing environments. This paper reviews the existing technologies and a wide array of both earlier and state-ofthe-art projects on cloud security and privacy. We categorize the existing research according to the cloud reference architecture orchestration, resource control, physical resource, and cloud service management layers, in addition to reviewing the recent developments for enhancing the Apache Hadoop security as one of the most deployed big data infrastructures. We also outline the frontier research on privacy-preserving data-intensive applications in cloud computing such as privacy threat modeling and privacy enhancing solutions.
As we enter the digital economy, it becomes increasingly transparent that the information and data ecosphere will continue to be a complex environment for the foreseeable future, with information being provided from a variety of internal and external sources in the form of files, messages, queries and streams. It would be foolish for any organization to place their bets on any one platform to be their platform of choice because it is incongruent to the thought patterns of the consumers, suppliers, regulators, partners and financiers who will participate in their information ecosphere through data feeds, information requests and a host of other interfaces.
Rather, there is a role of each of these platforms which serve as the conduit for data and the transformation of data into information aligned with the value propositions of the organization. This writing is focused on the big data platform because there are some unique characteristics of the big data environment that require an approach different than many of the legacy environments that exist in organizations. Furthermore, while big data is the one environment that is new and requires these special handling characteristics, there will be future platforms with the same requirements as big data requires today, and hopefully lessons learned will be left to not revisit each of the challenges as the next transformational information ecosphere is made available.
Figure 1 The Fourth Industrial Revolution, World Economic Forum, InfoSight Partners, 2016
This time is different, in that information is the catalyst to achieving value and the platform ideally suited to house information not optimal for storage in the form of rows and columns is the big data environment. Understanding which information is delivered with intended consequences and having the management prowess to tune information shared with customers, prospects, suppliers, partners, regulators and financiers is critical for the digital economy. Additionally, it is specific to understand the challenges each platform housing information bring to the equation. This writing will focus on big data.
The advent of Big Data has presented nee challenges in terms of Data Security. There is an increasing need of research
in technologies that can handle the vast volume of Data and make it secure efficiently. Current Technologies for securing data are
slow when applied to huge amounts of data. This paper discusses security aspect of Big Data.
We are optimistic that the United States can
strengthen critical infrastructure cybersecurity through
a government-industry partnership that builds a
robust Cybersecurity Framework, shares threat
data, and collaborates on achieving national cyber
goals. Although we don’t discount the challenges
of bringing together such large and diverse
groups of stakeholders, we believe that emerging
cyber technologies and capabilities have created
opportunities for success that did not exist 15
years ago when government first initiated "whole of
government" efforts similar to the Executive Order.
Acquiring the Right Talent for the Cyber Age: The Need for a Candidate Develo...Booz Allen Hamilton
Quickly and efficiently pushing quality candidates through the federal hiring process is a good first step, but those efforts will be wasted if agencies don’t invest in comprehensive candidate development plans. Agencies need value propositions that speak to the next generation of IT leaders, because the governments that are successful in acquiring the top cyber talent will position themselves as the political and economic leaders of the Cyber Age.
Booz Allen Mission Engineering® enables clients to reduce costs by eliminating redundant and irrelevant systems, while enhancing mission capabilities through enterprise system integration.
Booz Allen Hamilton has extensive experience conducting digital forensics investigations for diverse clients in the defense, civil, commercial, and intelligence sectors.
Booz Allen Hamilton’s cross-disciplinary expertise in program management includes systems integration, technology, strategic planning, stakeholder analysis and management, wargaming, and other capabilities for implementing 3-D Program Management to help US government agencies successfully manage and deliver complex programs.
Booz Allen Hamilton offers an integrated suite of cloud capabilities, deep subject matter expertise, and unparalleled hands-on experience with a broad range of cloud technology products.
Theft of intellectual property is troubling, no matter what the victim’s identity. But theft of IP from the defense industry can be terrifying. IP that falls into the wrong hands can have devastating security and espionage repercussions, troublesome competitiveness implications, and can even be used to target employees and families for blackmail or kidnapping. Learn more: http://www.cyberhub.com/research/IP_threat
Booz Allen Hamilton is helping military organizations develop and implement readiness decision-support solutions that provide a clear understanding of the relationships and trade-offs among requirements, resources, capabilities, capacities, costs, and risks.
Affidavit of Eligibility and Release Associated with the Degas/Cassatt Like t...Booz Allen Hamilton
This is the Affidavit of Eligibility and Release Associated with Booz Allen Hamilton's Degas/Cassatt Like to Win Contest. For complete contest rules, visit: slidesha.re/1oRIFEd.
Leading reform isn't easy, but for the past 100 years, Booz Allen has helped both public and private sector clients do exactly that. Our goal is to walk with you on this journey to help bring clarity to the chaos and spark some innovation along the way.
In this paper we have penetrate an era of Big Data. Through better analysis of the large volumes of data that are becoming available, there is the potential for making faster advances in many scientific disciplines and improving the profitability and success of many enterprises. However, many technical challenges described in this paper must be addressed before this potential can be realized fully. The challenges include not just the obvious issues of scale, but also heterogeneity, lack of structure, error-handling, privacy, timeliness, provenance, and visualization, at all stages of the analysis pipeline from data acquisition to result interpretation.
The success of an organization increasingly depends on their ability to draw conclusions regarding the various types of data available. Staying ahead of competitors requires many times to identify a trend, problem or opportunity microseconds before anyone else. That's why organizations must be able to analyze this information if they want to find insights that will help them to identify new opportunities underlying this phenomenon.
People are spontaneously uploading large amounts of information on the internet and this represents a great opportunity for companies to segment according to their behavior and not only socio-demographic factors. Companies store transactional information from their customers by making them fill in forms but the challenge for brands is to enrich these databases with information describing their customer’s behavior and daily habits. This info can be obtained through the online conversation and can be processed, crossed and enriched with many other types of information through different models based on Big Data. Following this procedure, we can complement the information we already have from our customers without having to ask them directly and therefor providing more value-added proposals to clients from a brand perspective.
Using the same technology with the right platform and the correct tactic, companies can achieve more ambitious goals that provide valuable information for the brand, which in turn could also enrich the customer’s experience, improving the customer journey for all types of clients.
less
This article useful for anyone who want to introduce with Big Data and how oracle architecture Big Data solution using Oracle Big Data Cloud solutions .
"Data pipelines" are a collection of processes that transmit data from one location to another location.
The end-to-end process of gathering data, turning it into insights and models, disseminating insights, and applying the model whenever and wherever the action is required to achieve the business goal is stitched together by a data pipeline.
Architects and developers have had to adjust to "big data" because of the significantly increased volume, diversity, and velocity of data in recent years.
Innovation med big data – chr. hansens erfaringerMicrosoft
Mange steder er Big Data stadig det nye og ukendte, der ikke har topprioritet hos IT, da ”vi ikke har store datamængder”. Men Big Data er meget mere end store datamængder. I Chr. Hansen A/S har Forskning og Udvikling (Innovation) afdelingen arbejdet med værdien af data og som resultat etableret et tværfagligt BioInformatik-program på Big Data teknologier fra Microsoft.
Big Data has recently gained relevance because companies are realizing what it can do for them and that it is a gold mine for finding competitive advantages. Proximity’s Juan Manuel Ramírez, Director of Strategy and...
Future Trends in the Modern Data Stack LandscapeCiente
As we embrace the future, staying abreast of emerging technologies will be crucial for organizations seeking to harness the full potential of their data.
You Can Hack That: How to Use Hackathons to Solve Your Toughest ChallengesBooz Allen Hamilton
“Hackathon” has become a trendy word in today’s business vernacular, and for good reason. The word “hackathon” comes from both “hack” and “marathon.” If you think of a “hack” as a creative solution and “marathon” as a continuous, often competitive event, you’re at the heart of what a hackathon is about. Hackathons enable creative problem solving through an innovative and often competitive structure that engages stakeholders to come up with unconventional solutions to pressing challenges. Hackathons can be used to develop new processes, products, ways of thinking, or ways of engaging stakeholders and partners, with benefits ranging from solving tough problems to broader cultural and organizational improvements.
This playbook was designed to make hackathons accessible to everyone. That means not only can all kinds of organizations benefit from hackathons, but that all kinds of employees inside those groups—executives, project managers, designers, or engineers—should participate and can benefit, too. Use this playbook as a reference and allow the best practices we outline to guide you in designing a hackathon structure that works for you and enables your organization to achieve its desired outcomes. Give yourself anywhere from six weeks to a few months to plan your hackathon, depending on the components, approach, number of participants, and desired outcomes.
Contact Director Brian MacCarthy at MacCarthy_Brian2@bah.com for more information about Booz Allen’s hackathon offering.
Booz Allen's U.S. Commercial Leader and Executive Vice President, Bill Phelps, recently released his list of 10 Cyber Priorities for Boards of Directors. As we peer into how business, technology, regulatory, and cyber threat realities are evolving in the coming year, here is a reference guide for board members to use in validating their company's cybersecurity approach.
We looked at the data. Here’s a breakdown of some key statistics about the nation’s incoming presidents’ addresses, how long they spoke, how well, and more.
Our Military Spouse Forum built a roadmap to help you navigate your career between deployments, moves, and the unpredictable. Interested in how Booz Allen can help you navigate your career? Check out our opportunities at www.boozallen.com/careers
In August 2016, Booz Allen partnered with Market Connections to conduct a survey of National Security Leaders and the General Public to understand their perspectives on the current threats. Fifteen years after the September 11 attacks, we wanted to know what keeps them up at night today, and what they will be worried about in 15 years. This infographic provides the high-level results of our survey and we will be releasing a more detailed report later in the month of September – so stay tuned. #NationalSecurity2031
Booz Allen convened some of the smartest minds to explore making healthcare more accessible. This report shares the latest healthcare payment trends and what policy experts discovered when planning for different health reform scenarios.
An interactive workshop that guides you through the many relationships that exist in an agile team, with a business value emphasis. Team members gain empathy, discover expectations of others and the importance of these agile team relationships.
An immersive environment allows students to be completely “immersed” in a self-contained simulated or artificial environment while experiencing it as real. With immersive learning, you can show realistic visual and training environments to teach complex tasks and concepts.
Nuclear Promise: Reducing Cost While Improving PerformanceBooz Allen Hamilton
To remain competitive, nuclear operators must take aim at all addressable costs, ensuring maintenance is optimized, taking proactive steps to minimize unplanned outages and, where possible, reducing administrative and other overhead costs. There are multiple opportunities to reduce capital and operational spending, while improving safety and reliability.
General Motors and Lyft; Target and Walmart; Netflix and Amazon - we call these “frenemies”. A strange trend is emerging as unlikely partner companies join forces, and they’re transforming industries around the world. Understanding what's driving the frenemies trend, knowing what options best fit your needs, and making yourself an effective partner are all critical to success.
Threats to industrial control systems are on the rise. This briefing explores potential threats and vulnerabilities as well as what organizations can do to guard against them.
Booz Allen Hamilton and Market Connections: C4ISR Survey ReportBooz Allen Hamilton
Booz Allen Hamilton partnered with government market research firm Market Connections, Inc. to conduct the survey of military decision-makers. The research examined the main features of Integrated C4ISR through Enterprise Integration: engineering, operations and acquisition. Two-thirds of respondents (65 percent) agree agile incremental delivery of modular systems with integrated capabilities can enable rapid insertion of new technologies.
Modern C4ISR Integrates, Innovates and Secures Military NetworksBooz Allen Hamilton
A majority of the military believe Integrated C4ISR through Enterprise Integration would provide utility to their organization. Check out other key findings from our study in this infographic http://bit.ly/1OZOjG2
Agile and Open C4ISR Systems - Helping the Military Integrate, Innovate and S...Booz Allen Hamilton
Integrated C4ISR is a force multiplier that significantly improves situational awareness and decision making to give warfighters a decisive battlefield advantage. This advantage stems from Booz Allen Hamilton’s Enterprise Integration approach http://bit.ly/25nDBRg: bringing together three disciplines and their communities—engineering, operations, and acquisition.
Booz Allen Hamilton created the Field Guide to Data Science to help organizations and missions understand how to make use of data as a resource. The Second Edition of the Field Guide, updated with new features and content, delivers our latest insights in a fast-changing field. http://bit.ly/1O78U42
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
2. 2 NOVEMBER 2012
an expanding set of analytic services to help them gain
critical mission insights.
The Cloud Analytics Reference Architecture, which
is being adapted to the larger business and govern-
ment communities, removes the traditional con-
straints by bringing together innovations in two areas
of current technology. First, it uses the power of the
cloud to put an organization’s entire storehouse of
data into a common pool, or “data lake,” making all
of it easily accessible for the first time. It then uses
sophisticated computer analytics, such as machine
learning and natural language processing, to help ex-
tract the kind of knowledge and insight that creates
value, guides strategy, and drives business and mis-
sion success. Although the Cloud Analytics Refer-
ence Architecture builds upon current techniques, it is
not an incremental step forward. It is an entirely new
approach—one specifically designed for our new age
of data.
One way to understand how the Reference Archi-
tecture works is to view it in layers (see Figure 1). Its
foundation is the cloud computing and network infra-
structure, which supports the methods by which data is
managed—most notably, the data lake. The data lake, in
turn, supports a two-step process to analyze the data.
In the first step, special tools known as pre-analytics
filter information from the data lake, and give it an un-
derlying organization. That sets the stage for computer
analytics—in the next layer up—to search for valuable
knowledge. These elements support the final phase, the
visualization and interaction, where the human insights
and action take place.
THE POWER OF THE CLOUD ANALYTICS
REFERENCE ARCHITECTURE
The Reference Architecture opens up the enormous
potential of big data by allowing us to search for insight
in new ways. It enables us to look for overarching pat-
terns, and ask intuitive questions of all the data, rather
than limiting us to narrowly defined queries within
data sets. The Reference Architecture allows comput-
ers to take over much of the work humans are doing
now—freeing people to focus on the search for insight.
It makes it possible for non-computer experts, for the
first time, to frame the questions, look for patterns, and
follow hunches.
This is not some kind of magical solution—far from
it. The Reference Architecture is simply a new way of
looking at data, but one that revolutionizes our ability
to gain knowledge and insight. With conventional tech-
niques, the data and analytics are locked into stovepipes,
or silos. We can explore only limited amounts of data
at any one time—and then only with predetermined
questions that have already been built in. The Reference
Architecture removes these constraints by eliminating
the silos, and consolidating all the information in the
data lake. What results is not chaotic or overwhelming.
Rather, the rich diversity of information in the data lake
Figure 1. Primary Elements of the Cloud Analytics Reference Architecture
3. 3NOVEMBER 2012
becomes a powerful force. The data lake is more than
a means of storage—it is a medium expressly designed
to foster connections in data. And the Reference Archi-
tecture explores those connections to search for valu-
able correlations and patterns This actually reduces the
complexity of big data, making it manageable and use-
ful, and creating efficiencies.
Instead of using data to ask “canned” questions that
test what we may already know, the Reference Architec-
ture uses data to discover new possibilities—solutions
and answers that we have not even considered. The
power of the Reference Architecture is that it constant-
ly evolves and adapts as we search for insight, taking us
beyond the limits of our imagination.
WHAT THE CLOUD ANALYTICS REFERENCE
ARCHITECTURE DOES
The Cloud Analytics Reference Architecture re-
moves the constraints created by data silos. While
the rigid structures used in conventional techniques
provide ease of storage, they carry severe disadvan-
tages. They give us an artificial view of the world based
on data models, rather than on reality and meaning. It
is akin to reading a map through a tube—we can never
immerse ourselves in the diversity of big data, and in-
stead make decisions based on limited and constrained
information. Much of data science in the last ten years
has been devoted to improving access to the silos and
building bridges between them. But that does not solve
the underlying problem—that the data is regimented
and locked in.
Eliminating the need for silos gives us access to all
the data at once—including data from multiple outside
sources. Users no longer need to move from database
to database, pulling out specific information. And, be-
cause there are no data silos, there is no need to build
complex bridges between them.
If we want to know, for example, which parts of
our computer network are most vulnerable to attack in
the next six hours, we can take into account a wide va-
riety of data sources at the same time. We might look at
whether today is a holiday in certain foreign countries,
which means that the young hackers known as “script
kiddies” are more likely to be out of school and so have
time on their hands to launch an attack. If we deter-
mine that a particular group is targeting us, we might
examine how its members are connected, asking wheth-
er they had a common professor at a university, and if
so, what techniques did he or she teach. The Reference
Architecture gives us the ability to ask a full suite of
questions rather than a pre-selected few.
The Cloud Analytics Reference Architecture al-
lows us to experiment more with the data. The Ref-
erence Architecture’s flexibility provides a new kind of
freedom—to follow hunches wherever they may lead,
to quickly shift direction to pursue promising avenues
of inquiry, to easily factor in new knowledge and in-
sights as they arise.
With the conventional approach, it is difficult to add
or switch variables that are not already part of a dataset
or data base. That typically requires tearing apart and
rebuilding both the structure that the data is in and the
computer analytics that are custom-designed to handle
specific lines of inquiry. The process is expensive and
time consuming, and so consequently, we tend to focus
instead on doing better analysis with the limited tools
available on our narrow slices of data.
With the Reference Architecture, we might decide,
in the network security example above, to add new vari-
ables to the mix, such as the current propagation speed
of commonly used viruses and botnets. Even if those
variables come from outside data sources, we do not
have to tear down and rebuild our data structures and
analytics to consider them—they seamlessly become
part of our inquiry.
The Cloud Analytics Reference Architecture al-
lows us to ask more intuitive questions. With the
conventional approach, we do not really ask questions
of the data—we create hypotheses, and then test the
data to see whether we are right. In order to pose these
hypotheses, we have to guess in advance what the an-
swers might be, often a difficult proposition.
To determine where our network is most vulnerable,
for example, we would need to start with a hypothe-
sis—say, that any attacks will occur through outdated
operating systems. That hypothesis, accurate or not,
would drive our initial line of inquiry.
With the conventional approach, we also need to
be familiar with the data we are considering, includ-
ing where it is (in what specific datasets or databases),
what format it is in, and even to a large extent what the
data itself contains. That level of knowledge might be
achievable when we are working with a limited number
of datasets or databases, but not with the vast amounts
of information now becoming available to us. We often
have to put aside, or assume away, factors that we might
actually believe are critical.
Add to these handicaps our inability to go beyond
the pre-selected questions or easily change variables,
and it becomes an impossible task. And so we never
try it. We end up settling for marginal questions, and
marginal answers.
4. 4 NOVEMBER 2012
With the Reference Architecture, however, we can
structure an inquiry around a single, intuitive, big-pic-
ture question: What part of our computer network is
most vulnerable to attack in the next six hours? We do
not need to know much about any of the data sources
we are consulting—the data will point us to the answer.
The Cloud Analytics Reference Architecture al-
lows us to more readily look for unexpected pat-
terns—it lets the data talk to us, so to speak. Even
if we could ask all the questions we want, the way we
want, there is simply too much data to formulate every
question that might be important. Our questions can
also be limited by our biases about the issues we are
researching. We may not know what areas to explore,
or what we should be looking at. To get the full picture,
and help guide our inquiries, we need to see what pat-
terns naturally emerge in the data.
While we can look for patterns with the convention-
al approach, there are two significant drawbacks. We
can only do such searches within our narrowly defined
datasets and databases, rather than with the entire range
of data available to us. We also must first guess what
those specific patterns might be, and then test them out
with hypotheses. But what about the patterns we do not
even know might exist? How do we get to the hidden
knowledge that often proves so valuable?
Because there are no limiting data and analytic struc-
tures in the Reference Architecture, we do not need to
pose hypotheses, and our search for patterns encom-
passes the entire range of data. For example, the U.S.
military is now using the Reference Architecture to
search for patterns in war zone intelligence data, to map
out convoy routes least likely to encounter improvised
explosive devices (IEDs).
The Cloud Analytics Reference Architecture
allows computers to take over much of the work
humans are doing now—enabling people to focus
on creating value. Conventional methods require that
people play a large role in processing the data—in-
cluding selecting samples to be analyzed, creating data
structures, posing hypotheses, and sifting through and
refining results. That intense level of effort may be
workable for small amounts of data, but no organiza-
tion has the personnel or resources to use that method
to process big data.
The Cloud Analytics Reference Architecture solves
this problem by giving a great deal of that work to the
computers, particularly tasks that are repetitive and
computationally intensive. This reduces human error,
and substantially speeds up the work.
When we use the Reference Architecture to pose
more intuitive questions, or to find patterns, we are es-
sentially asking the computer to take us as close as it can
to finding the answers we want. It is then up to us, using
our cognitive skills, to find meaning in those answers.
By separating out what the computer can do—the
analytics—and what only people can do—the actual
analysis—the Cloud Analytics Reference Architecture
greatly eases the human workload. It is a division of la-
bor that frees subject-matter experts to look at the larg-
er picture. At the same time, the Reference Architecture
rapidly highlights areas that analysts should not waste
their time exploring—enabling them to focus their time
and attention in the right direction.
For example, agencies that investigate consumer
complaints against financial institutions often do not
know which individual complaints are indicative of a
broader patterns of consumer abuse, and so deserve
the most attention. Investigators rarely have the time to
sort through the vast array of sources that might pro-
vide valuable clues, such as blogs and social media sites
where consumers commonly air their grievances. With
a data lake that included all such available information,
the Reference Architecture’s analytics could quickly
identify patterns, such as consumer abuse affecting
large numbers of people. Investigators could then fo-
cus their resources on the most serious cases.
The Cloud Analytics Reference Architecture’s
analysis capability enables subject matter experts
to explore the data. If we are to drive business and
mission success, we must give direct access to the data
to the analysts, or subject matter experts, who under-
stand what that success might mean. However, be-
cause of the high level of computer expertise needed
to design custom data storage structures and analytics,
much of the analysis today is conducted by computer
scientists, computer engineers, and mathematicians act-
ing as agents for the subject matter experts. They are
typically the ones who translate the overall goals of the
business and government analysts into the language of
the machine. Whenever there is a middleman in any
field, things tend to get lost in the translation, and data
analysis is no exception. Here, it leads to a disconnect
between the people who need knowledge and insight
(the subject matter experts) and the data itself. It also
substantially slows the process.
In the top layers of the Reference Architecture, the
middleman syndrome goes away. The ability to ask in-
tuitive questions, and to look for patterns, provides the
analysts with direct access to the data. That gives them
the flexibility they need to experiment and explore,
and allows the system to reach maximum velocity. The
computer scientists, computer engineers and mathema-
ticians still play a key role, but now are no longer the
ones who drive the inquiries into the data.
5. 5NOVEMBER 2012
For example, investigators who suspect fraud may
be occurring are often hampered by the need to go
through computer experts to query the data. Their re-
quest may be one of many, and by the time they get
back the information they need to act, the criminals
have often long since committed the fraud and dis-
appeared. With the Reference Architecture, however,
investigators could query the data themselves, quickly
pinpoint the fraud, and take action in time to stop the
activity.
THE FOUNDATION OF THE REFERENCE
ARCHITECTURE: A NEW APPROACH TO
INFRASTRUCTURE
The Reference Architecture takes advantage of
the immense storage ability of the cloud, though in a
different way than in the past. With the conventional
approach, cloud storage does not eliminate the data si-
los—it simply makes them fatter. Organizations must
continually reinvest in infrastructure as analytic needs
change. Building bridges between silos, for example,
typically requires reconfiguring and even expanding the
infrastructure.
The Reference Architecture, by contrast, has an in-
herent flexibility that enables organizations to pursue
new analytical approaches with few if any changes to
the underlying infrastructure. One reason is that the
data lake is easily expandable. Because it stores infor-
mation so efficiently, it can accommodate both the
natural growth of an organization’s data, as well as the
addition of data from multiple outside sources. At the
same time, the Reference Architecture replaces the cur-
rent, custom-built analytics with a new generation of
tools that are highly reusable for almost any number
of inquiries. With the Reference Architecture, organi-
zations do not need to rebuild infrastructure as their
levels of data and analytics increase. An organization’s
initial investment in infrastructure is therefore both en-
during and cost-effective.
HOW THE DATA LAKE WORKS
With the conventional approach, the computer is
able to locate the information it needs because it knows
precisely where it is—in one database or another. The
information is identified largely by its location. With the
data lake, information is still identified for use, but now
in a way other than by location. Specific pieces of infor-
mation are identified by “tags”—details that have been
embedded in them for sorting and identification.
For example, an investor’s portfolio balance (the
data) is generally stored with identifying information
such as the name of the investor, the account number,
one or more dates, the location of the account, the
types of investments, the country the investor lives in,
and so on. This “metadata” is what gets tagged, and is
located by the computer during inquiries.
The process of tagging information is not new—it
is commonly done within specific datasets and databas-
es. What is new is using the technique to eliminate the
need for datasets and databases altogether.
The tags themselves are also a way of gaining
knowledge from the data. In the example above, they
might allow us to look for, say, connections between
investors’ countries and their types of investments. The
basic data—the portfolio balance—might not even be
part of the inquiry. Such connections can be made with
the conventional approach, but only if the custom-built
databases and computer analytics have already been de-
signed to take them into consideration. With the data
lake, all the data, metadata and identifying tags are avail-
able for any inquiry or search for patterns. And, such
inquiries or searches can pivot off of any one of those
pieces of information. This greatly expands the usabil-
ity of the data available to an organization. It actually
makes big data even bigger.
An important advantage of the data lake is that
there is no need to build, tear down, and rebuild rigid
data structures. For example, suppose we develop an
improved approach to translating English into Chi-
nese. With conventional techniques, the database is the
translation. To make major changes, we would have to
go back to the original data (the English and Chinese
words), and build a completely new structure. With the
Reference Architecture, however, we would simply pull
out the data in a new way, easily reusing it.
In addition, the data lake smoothly accepts every
type of data, including “unstructured” data—infor-
mation that has not been organized for inclusion in
a data base. An example might be the doctors’ and
nurses’ notes that accompany a patient’s electronic
health records.
Two other critical emerging data types are batch and
streaming. Batch data is typically collected on an auto-
mated basis and then delivered for analysis en masse—
for example, the utility meter readings from homes.
Streaming data is information from a continuous feed,
such as video surveillance.
Most of the flood of big data is unstructured, batch
and streaming, and so it is essential that organizations
have the ability to make full use of all types. With the
data lake, there is no second-class or third-class data.
All of it, including structured, unstructured, batch and
streaming, is equally “ingested” into the data lake, and
available for every inquiry.
6. 6 NOVEMBER 2012
It is an environment that is not random and chaotic,
but rather is purposeful. The data lake is like a viscous
medium that holds the data in place, and at the same
time fosters connections. Because the data is all in one
place, it is, in a sense, all connected.
GATHERING INFORMATION FROM THE DATA
LAKE: THE PRE-ANALYTICS
In the first step in analyzing the data, the Reference
Architecture uses tools known as pre-analytics to filter
data from the data lake and then give it an underlying
organization. For example, a recent study by Booz Al-
len and a large hospital chain in the Midwest analyzed
the electronic medical records of hundreds of patients,
to track the progression of a life-threatening condition
known as severe sepsis. Pre-analytics were used to first
pull patients’ vital signs from a version of a data lake,
and—using the time-and-date stamps embedded in the
records—organize them in chronological order. Once
that was accomplished, computer analytics could then
search for patterns in the way the patients’ vital signs
changed over time.
Pre-analytics accomplish a number of tasks at once.
Using the tags, they locate and pull out the relevant data
from the data lake. They then prepare that data for the
analytics, sorting and organizing the information in any
number of ways. The pre-analytics allow great flexibil-
ity in the inquiries—for example, one such tool might
transliterate a name like Muhammad into every possible
spelling (e.g., Mohammad, Mahamed, Muhamet). This
would enable the computer to collect and analyze infor-
mation about a particular person, even if that person’s
name is spelled differently in different sources of data.
Although pre-analytical tools are commonly used
in the conventional approach, they are typically part of
the rigid structure that must be torn down and rebuilt
as inquiries change. Generally, they cannot be reused—
for example, each name to be transliterated would re-
quire an entirely new pre-analytic. Because such work is
resource-intensive, only a limited number of such tools
can be built, severely hampering an organization’s abil-
ity to make full use of its data. By contrast, the pre-
analytics in the Cloud Analytics Reference Architecture
are designed for use with the data lake, and so are not
part of a custom-built structure. They are both flex-
ible and reusable, giving organizations almost endless
windows into their data. Moreover, they are designed to
be interoperable from the moment they come on-line,
creating a set of easily shared services for all users of
the data.
THE POWER OF COMPUTER ANALYTICS
Once the data has been prepared, the search for
knowledge and insight can begin. As with the other ele-
ments of the Reference Architecture, computer analyt-
ics are used in an entirely new way.
An analogy might be the difference between the
smartphones of today and the separate functions for
telephones, personal digital assistants and computers of
the not-so-distant past. Smartphones do more than just
combine those functions—they create a new world of
possibilities. The computer analytics in the Cloud Ana-
lytics Reference Architecture do the same.
There are several types of analytics in the Reference
Architecture, including:
Ad hoc queries. These are the analytics that ask
questions of the data. While in the conventional ap-
proach the analytics are part of the narrow, custom-
built structure, here they are free to pursue any line of
inquiry. For example, a financial institution might want
to know which of its foreign investors are at greatest
risk of switching to another firm, based on dozens of
characteristics of current and former customers. Later,
analysts might want to change the question somewhat,
asking the extent to which the political turmoil in cer-
tain countries plays a role. They can use the same ana-
lytic to ask the second question, and any number of
other questions—like the pre-analytics, they are flexible
and reusable. And they enable the kinds of improvised,
intuitive questions that can yield particularly valuable
results.
Machine learning. This is the search for patterns.
Because all of the data is available at once, and because
there is no need to hypothesize in advance what pat-
terns might exist, these analytics can look for patterns
that emerge anywhere across the data.
Alerting. This type analytic sends an alert when
something unexpected appears in the patterns. Such
anomalies are often clues to the kind of hidden knowl-
edge that can provide business with a competitive ad-
vantage, and help government organizations achieve
their missions.
Pre-Computation. These analytics enable organiza-
tions to do much of the analyzing in advance, creating
efficiencies. For example, an auto insurance company
might pre-compute the policy price for every individual
vehicle in the U.S., so that, with a few additional details,
a potential customer can be given an instant quote.
7. 7NOVEMBER 2012
PUTTING IT ALL TOGETHER: VISUALIZATION AND
INTERACTION
Decision-makers may be understandably concerned
that all this big data will be overwhelming, that remov-
ing the tube from the map will simply lead to informa-
tion overload. Quite the opposite is true. The Cloud
Analytics Reference Architecture addresses the issue
head-on by incorporating the visualization—how the
knowledge is presented to us—into the analytics from
the outset. That is, the analytics not only conduct the
inquiries, they help contextualize and focus the results.
At the visualization and interaction level of Refer-
ence Architecture, this focus enables the analysts to
more easily make sense of the information, to frame
better, more intuitive inquiries, and to gain deeper in-
sights. Building the visualization into the analytics has
another advantage—it provides the ability for quick and
effective feedback between the two layers, so that the
presentation of the findings can be continually refined
for the decision-maker.
With the Reference Architecture, the flood of infor-
mation is not overwhelming—it is readied for action as
never before. This breakthrough in visualization could
have as profound an effect on decision-making as bar
graphs and pie charts did in the 1950s and 1960s, when
statistics became widely used in business. Those visuals
presented all the essential information at a glance, chang-
ing the nature of decision-making. The Reference Ar-
chitecture will do the same—but this time with big data.
DELIVERING ON THE PROMISE
The possibilities of big data and the cloud are not
pipe dreams. But they will not be fulfilled on their
own—conscious effort and deliberate planning are
needed. Unless organizations make the right infra-
structure decisions, they cannot hope to build a data
lake. Unless they make the right data management de-
cisions, they will never break free from the rigid data
and analytic structures that are so limiting. The Cloud
Analytics Reference Architecture can be seen as a road
map for that decision-making, one that shows the im-
portance of a holistic, rather than piecemeal, haphazard
approach. Each element is closely tied to each of the
other elements, and so all must be considered together.
The Cloud Analytics Reference Architecture is no
more expensive to build than traditional approach, and
is considerably more cost-effective in the long run. Be-
cause the elements of the Cloud Analytics Reference
Architecture are largely reusable, they can scale an or-
ganization’s big data in an affordable way.
The Cloud Analytics Reference Architecture is al-
ready being used by the U.S. government to make our
nation safer, and it can help other organizations in gov-
ernment and business create value, solve real-world
problems, and drive success. The grand promise of big
data and the cloud is now within reach.
FOR MORE INFORMATION
Mark Jacobsohn
jacobsohn_mark@bah.com
301-497-6989
Joshua Sullivan, PhD
sullivan_joshua@bah.com
301-543-4611
www.boozallen.com/cloud
This document is part of a collection of papers developed by Booz Allen Hamilton to introduce new concepts and ideas spanning cloud
solutions, challenges, and opportunities across government and business. For media inquiries or more information on reproducing this
document, please contact:
James Fisher—Senior Manager, Media Relations, 703-377-7595, fisher_james_w@bah.com
Carrie Lake—Manager, Media Relations, 703-377-7785, lake_carrie@bah.com