Marshaling Data for Enterprise Insights


Published on

This article explains a 10-year vision for the US Department of Homeland Security in regards to marshaling data.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Marshaling Data for Enterprise Insights

  1. 1. Marshaling Data for Enterprise Insights A 10-Year Vision for the US Department of Homeland Security
  2. 2. 1 As owners of one of the most comprehensive data environments in the US federal system, the US Department of Homeland Security (DHS) has an asset like no other for conducting its basic mission of safeguarding America’s citizens, infrastructure, and borders. The information it collects is as diverse as the missions of the many agencies and offices under its purview, ranging from the Transportation Security Administration (TSA) to the Federal Emergency Management Agency (FEMA), US Customs and Border Protection (CPB), Immigration and Customs Enforcement (ICE), the US Secret Service, and the Coast Guard. Because the mission of DHS is so broad and all consuming, it is challenged with ushering in the kinds of changes that its data regime needs to render new insights and drive down costs. Each of the extensive data sets that DHS and its components maintain supports specific mission needs and exists under established legal and regulatory authorities. Their owners naturally strive to improve the efficiency and effectiveness of data analysis for their intended purposes. But even when users recognize the mission benefits of sharing data and conducting analysis across multiple data sets, they too often run into technical, procedural, legal, and cultural barriers to shared analysis. These barriers are preventing DHS from realizing the full potential of the data it has as its disposal. Intelligence and law enforcement analysts searching for terrorists, and other threats, need the ability to paint a comprehensive picture that considers all kinds of data at once—such as travel and entry and exit records, evidence that ties suspected terrorists or criminals to illicit shipments of cash or contraband, or involvement of specific individuals or groups in diverse criminal activities, from weapons trafficking to cyber crime. The need to learn from and capitalize on multiple sources of data is no less urgent for DHS teams responsible for responding to emergencies, enforcing trade agreements, combating transnational criminal organizations, managing the flow of people through US borders, protecting intellectual property rights, and safeguarding America’s critical infrastructure. Over the coming years, DHS will need to master the ability to marshal its tremendous repository of data while still living up to its manifold day-to-day responsibilities. These tasks are mutually dependent— the bridge must be rebuilt while remaining open to traffic. We believe it will be easier for DHS to manage its transformation into a data-enabled enterprise if it attacks the problem from a mission perspective and taps emerging cloud-based analytics to gain new insights from the information it already has at its disposal. Through this effort, DHS can significantly elevate its role in the nation’s homeland security ecosystem, position itself as a valuable and insight-driven partner to other crime enforcement agencies, and increase the efficiency of its mission-support functions. Opening the Aperture When Michael Chertoff became the second secretary of homeland security in 2005, one of his first priorities was to change a departmental policy of only taking two fingerprints from each person either wanting a visa to come into the country or for those traveling without visas. Chertoff reasoned that by only taking two fingerprints—one from each index finger—homeland security personnel were potentially missing scores of Marshaling Data for Enterprise Insights A 10-Year Vision for the US Department of Homeland Security
  3. 3. 2 latent fingerprints, or the fingerprint residue collected all over the world at crime scenes, in safehouses where terrorists plan, and even on battlefields. By expanding the information collected from travelers to 10 fingerprints, Chertoff said that the DHS significantly enhanced its capability to identify those who weren’t included on watch lists but “left a little piece of themselves somewhere and some place that suggests we ought to take a closer look.”1 Chertoff’s policy change was predicated on the belief that information is power. By opening the department’s aperture to collect more information, he reasoned that the department was strengthening its powers to detect bad actors before they performed bad deeds. This premise is at the heart of all homeland security efforts, and yet, there are real and significant constraints to expanding the amount of pertinent data that each DHS entity has at its disposal. The most significant constraint is rooted in the department’s origins—bringing 22 different agencies under one organization necessarily meant bringing their legacy information technology systems with them. While considerable progress has been made in improving these systems over the last 10 years, too little has been done to address the challenges of sharing data and conducting analysis across these systems. As a result, DHS has limited the usefulness of its data. Users have to mine a number of databases to find answers to problems identified at the policy level or pursue particular cases, and these efforts have historically been susceptible to redundancy and lack of coordination. In addition, the people operating at the collection end of those stovepipes have little incentive to acquire information that isn’t directly relevant to their own priorities, even though it might be very important to others. But the issue also goes to a lack of awareness that data exist in the first place. In the homeland security setting, the real power of data is not limited to helping you solve problems identified elsewhere—it’s identifying new problems so you can get ahead of them. Information sharing relies on the owner of that information to recognize its value to others, and this is very seldom the case. As a result of this shortcoming, information just stays where it is, where it has no value at all. It’s as if 10 different DHS entities were each collecting a print sample from a different finger, but nobody could ever put them together because there was no mechanism for matching them up. These organizational and other barriers are particularly problematic for homeland security leaders trying to secure the nation’s borders, keep ahead of criminal organizations, enforce and facilitate trade and travel, and plan for emergency response. For instance, Customs and Border Protection has developed reasonably effective processes for physically and virtually screening cargo shipments by analyzing trade data supplied by shippers and other information on the flow of goods into and out of ports of entry. But it is difficult to link that data to information on individuals who may be involved in those shipments, the past 1 Remarks by Michael Chertoff, shtm
  4. 4. 3 histories of the conveyances used in the shipments, and the money and data flows that accompany all movement of goods—whether legal or not. Part of this difficulty stems from the fact that different agencies relying on different databases are responsible for different aspects of securing and facilitating the movement of people, goods, money, and data. A data point as plain as a phone number on a bill of lading could be associated with a known terrorist, but it won’t be flagged for follow-up if nobody inside CBP understands its value and those outside of CBP remain unaware that this information is even being tracked. Only by revealing the presence of data in the first place can DHS and its partners expect its analysts at the edge to make these critical connections. More Tooth, Less Tail DHS faces the added challenge of evolving into a data- centric organization at a time when every cost is being scrutinized. While mission challenges remain high, the department is facing budget constraints unprecedented in its 10-year history. As in the military, concern is growing that the “tail” of back-office functions has grown too big and unwieldy to support DHS’s mission in a timely and cost-effective manner. One of the biggest complaints coming from inside the organization is that staffers have to supply the same data to five or six different entities. That level of duplication is expensive and ties up valuable resources. This spending seems particularly out of synch when analysts working at the edge of the enterprise have a limited view of the information the back-office support groups are helping to maintain. Our view is that DHS has the opportunity to become a leaner, more productive organization in the future if it harnesses the power of the data it already collects today to support its many mission objectives. In 10 years' time, we see DHS components thinking of themselves as collective sources and sharers of information rather than separate parts of a larger bureaucracy. DHS must tap the potential of its data much in the way that retailers, banks, and healthcare providers of today are tapping every available piece of information about their customers to predict their behavior and help set strategy. Investigators, inspectors, and analysts in the field will know what kinds of data are being collected across the organization, so they know where to go to mine for information and learn more about the cases they are pursuing. Ultimately, this capability will be extended from mission data to mission-support activities to help DHS more efficiently manage its personnel, property, fixed assets, and finances. Mining Big Data for Insights The first step in developing this capability and transforming DHS into a data-enabled enterprise is to embrace a new method of keeping track of and accessing information—one specifically designed for big data that is distributed in a classified or law enforcement environment. The emergence of cloud computing technologies has paved the way for organizations to unlock new insights by coupling technological infrastructure with analytical tools to gain better access to all of the data they have at their disposal. To support this new capability, DHS must have a full and working knowledge of all the types of data its many entities collect on a regular basis. As it stands now, a DHS analyst is only able to locate the information he or she needs if they know precisely where it is housed—in one database or another. But DHS’s experience reveals that this is seldom the case; users move from database to database, searching for specific information that may or may not be there. If users want to ask different kinds of questions, they often have to reengineer both the databases and the analytics involved—a process that can be prohibitively long and expensive. This tends to limit both the complexity of the questions that are asked and the utility of the results.
  5. 5. 4 What DHS needs to derive actionable insights from its data is to evolve toward a platform where the data can be staged for advanced analytics. Cloud-based technology solutions now avail organizations with an effective and efficient way to load, store, and access multiple data sources to enable multivariate analytics in a mission-specific setting. The evolution will systematically convert the DHS’s collection of individual databases into a consolidated and connected pool of information—or “data lake”—making all of it readily accessible for all types of analysis. With the data lake, users can easily tap all of the available data in a variety of constantly changing ways. A key feature of the data lake is that information is no longer defined strictly by its location. Specific pieces of supporting information—known as meta-data, or “data about the data”—are identified by embedded “tags” that allow for sorting and identification. In the shipping container example illustrated above, the user could find the phone number listed on the bill of lading simply by searching for it, regardless of which DHS agency captured the information. The process of tagging information is not new—it is commonly done within specific datasets or databases. What is new is using the cloud in combination with the technique to make it available to a wide array of users. This is an important distinction, as an important advantage of the data lake is there is no need to build, tear down, and rebuild rigid data structures. The most transformative aspect of a cloud analytics reference architecture that incorporates a data lake is that users do not need to have the possible answers in mind when they ask questions. Instead, they can let the data talk to them. The ability to make complex inquiries, easily switching in and out any number of variables, allows users to look for patterns, and then follow them wherever they may lead. This is particularly important in predictive analytics, when people may not know exactly what they are seeking. This capability strengthens the power of inquiry by empowering data to help reveal new or broader questions and correlations, expanding insight into scenarios and challenges. Importantly, a data lake also helps draw from unstructured and streaming data, which exist in forms that cannot be directly placed in structured databases and data sets. Our prior work with another intelligence-driven organization shows that a data lake can be employed even when classified information is at issue by putting appropriate levels of security and authorization in place. The solution called for predictive analytics to use existing data to forecast potential events and detect anomalies in order to extract potentially significant information and patterns. Our design focused on keeping transactional-based queries in the current relational databases, while doing the “heavy lifting” in the cloud and outputting the results into relational data stores for quick access. The new cloud solution provided immediate and striking improvements across the increasing volume of structured and unstructured data using aggressive indexing techniques, on-demand analytics, and pre- computed results for common analytics. By combining sophistication with scalability, the solution helped move the organization from a situation in which analysts stitched together sparse bits of data to a platform for distilling real-time, actionable information from the full aggregation of data. Many such opportunities now exist in Homeland Security. For example, the National Cybersecurity and Communications Integration Center (NCCIC) faces the challenge of making full use of the numerous data feeds it receives from a variety of government, private- sector, and other organizations. Because much of the data is unstructured, or resides in discrete data sets that are difficult to connect, NCCIC may get only a partial, delayed picture of cyber and communications incidents. Booz Allen Hamilton’s Cloud-based Services would enable NCCIC to consolidate all its available data—of every type, from all sources—and then automatically search through its entirety for important correlations and patterns. With this ability to put all the
  6. 6. 5 pieces together and see the full picture, NCCIC could better respond to incidents in real time, and provide more accurate situational awareness to stakeholders. In another example, Cloud-based Services would enable DHS and the Department of Justice (DOJ) to integrate the intelligence they gather in their shared Southwest border mission. Currently, the critical ability of the two agencies to gain a common operating picture is hampered not just by the wide range of structured and unstructured data collected, but also by the various restrictions, authorities and security issues around storing, managing and accessing that data. Through Cloud-based Services, DHS and DOJ can create a common operating picture while maintaining the different sets of rules for the data—all in one system, rather than in many. This ultimately frees the mission operators and the IT organizations supporting them from the need for proprietary solutions, and from the resource-depleting operations and maintenance costs tied to those solutions. Cloud-based Services would also allow DHS to break through major cost barriers with a shared data environment for the department’s component agencies. Currently, agencies often share space at data centers. Cloud-based Services takes a major step forward by creating an environment in which the agencies share a common data pool, as well as reusable analytic and visualization software that taps into the pool. What occurs is a decoupling of the data from the software, substantially driving down costs and creating previously unobtainable economies of scale. Oversight and Backing Of course, DHS will need to maintain adequate governance of its data to help decide which information is included in the data lake and who is authorized to access it. To this end, DHS will need a dedicated team to provide this level of governance. Many commercial enterprises have turned to a chief data officer (CDO) to serve this function and increase the transparency of their data. This idea, or something like it, could serve DHS well. The CDO and their staff should not be confused with a chief information officer or chief technology officer. Where these other two functions provide general infrastructure support to the rest of the organization in a commercial setting, CDOs work much more closely with the lines of business to tap data for specific objectives. Financial services firms, among the first to create such a position, have turned to CDOs to drive their data management strategy and establish a data governance structure to determine who owns and manages the information. Their experiences could be particularly instructive in a homeland security setting, given the vast array of data they collect, as well as their need to safeguard sensitive information. Indeed, this last consideration has proved problematic for DHS in the past; it’s easier to protect information when security is at stake than risk exposing it to someone without the proper authorization. A critical part of the CDO’s role would be to define the parameters for data ownership and stewardship. The CDO would also help determine legal restrictions on the use of data. When certain data are combined, it can create a collision of legal authorities. On its own, one piece of information may not be classified or law enforcement sensitive, but combining that information with other data may yield insights that need to be protected for national security or prosecutorial reasons. A strong governance function may also be influential in securing support from outside the organization. Whether the solution takes the form of a data lake or some other vehicle, it likely won’t go far without a legislative mandate. The legacy structure of its seven operating components and other agencies has supported a view that information is a commodity that must be protected, and only disseminated on a need-to-know basis. The agency will also face the reality of convincing incoming political leaders to make short-term investments for long-term gains. Change
  7. 7. 6 management is tough enough for private companies—it is that much more difficult for federal agencies whose senior leaders are in a state of constant flux. Even the best-laid plans can get disrupted or derailed when those who call the shots inevitably change. Without a legislative or statutory mandate to serve as a forcing event, DHS will not likely be able to foment the degree of change that is needed—at least not at an acceptable rate. In some ways, the position DHS finds itself in now is not dissimilar to that faced by the US Treasury Department in the aftermath of the credit crisis. At the time, it was clear that the financial services industry had major gaps in trying to get its hands around all of the risks being taken with the advent of credit default swaps and other advanced financial derivatives. The Treasury Department needed financial data that was actionable and would support decision-making at the highest levels within the department. To help fill these gaps, Congress created an Office of Financial Research (OFR) within the Treasury Department through the Wall Street Reform and Consumer Protection Act of 2010 (“Dodd-Frank”). The law created two units within OFR—a data center and a research and analysis center—to continually gather up and analyze detailed financial information collected from a variety of banks and other financial firms. One of its first steps was to issue a request for proposals on an industry utility to create and generate identification codes to help it obtain more granular data on counterparty risk, long and short positions, collateral and collateral calls, and derivative positions. Given the complexity involved with the effort, and the reliance on private sector contributions, OFR would not have been possible were it not provided for in statute. It may be too early to judge whether OFR will be a success or not, but it does show the power of a mandate to force organizational change where the availability of information is restricted. Conclusion: A Data-Driven Future The political feasibility of winning such a mandate for a data-driven approach at DHS is an open question. To win support for such a mandate, it will be critical that DHS and its components present the opportunity to be more effective and cost-efficient in real terms. DHS will continue to collect and store data regardless of whether such a mandate exists. The cost of those efforts will not go away. The question is whether DHS is empowered to make better use of those resources by increasing its awareness of the information that exists. After all, you can’t fight an enemy you don’t know. And, creating new channels to share information only works if you know what information you have at the ready. The enterprise insights that come from these early efforts will likely produce some quick victories and attract the necessary buy-in to build incrementally from there and start developing a data repository with actual answers—not just questions. By thinking big, acting small, and going methodically step-by-step, DHS has the capability to lead where others continue to squabble. Information is the key to this transformation. Treated as a commodity, it will continue to be stockpiled and stored away. Viewed as an asset, it will strengthen the nation’s homeland security enterprise and help DHS deliver on its many far-reaching mandates. Contacts Suzanne Storc Principal Mike Delurey Principal
  8. 8. About Booz Allen To learn more about the firm and to download digital versions of this article and other Booz Allen Hamilton publications, visit Booz Allen Hamilton has been at the forefront of strategy and technology consulting for nearly a century. Today, Booz Allen is a leading provider of management and technology consulting services to the US government in defense, intelligence, and civil markets, and to major corporations, institutions, and not-for-profit organizations. In the commercial sector, the firm focuses on leveraging its existing expertise for clients in the financial services, healthcare, and energy markets, and to international clients in the Middle East. Booz Allen offers clients deep functional knowledge spanning strategy and organization, engineering and operations, technology, and analytics—which it combines with specialized expertise in clients’ mission and domain areas to help solve their toughest problems. The firm’s management consulting heritage is the basis for its unique collaborative culture and operating model, enabling Booz Allen to anticipate needs and opportunities, rapidly deploy talent and resources, and deliver enduring results. By combining a consultant’s problem-solving orientation with deep technical knowledge and strong execution, Booz Allen helps clients achieve success in their most critical missions—as evidenced by the firm’s many client relationships that span decades. Booz Allen helps shape thinking and prepare for future developments in areas of national importance, including cybersecurity, homeland security, healthcare, and information technology. Booz Allen is headquartered in McLean, Virginia, employs approximately 25,000 people, and had revenue of $5.86 billion for the 12 months ended March 31, 2012. Fortune has named Booz Allen one of its “100 Best Companies to Work For” for eight consecutive years. Working Mother has ranked the firm among its “100 Best Companies for Working Mothers” annually since 1999. More information is available at (NYSE: BAH) 7
  9. 9. The most complete, recent list of offices and their addresses and telephone numbers can be found on Principal Offices Huntsville, Alabama Sierra Vista, Arizona Los Angeles, California San Diego, California San Francisco, California Colorado Springs, Colorado Denver, Colorado District of Columbia Orlando, Florida Pensacola, Florida Sarasota, Florida Tampa, Florida Atlanta, Georgia Honolulu, Hawaii O’Fallon, Illinois Indianapolis, Indiana Leavenworth, Kansas Aberdeen, Maryland Annapolis Junction, Maryland Hanover, Maryland Lexington Park, Maryland Linthicum, Maryland Rockville, Maryland Troy, Michigan Kansas City, Missouri Omaha, Nebraska Red Bank, New Jersey New York, New York Rome, New York Dayton, Ohio Philadelphia, Pennsylvania Charleston, South Carolina Houston, Texas San Antonio, Texas Abu Dhabi, United Arab Emirates Alexandria, Virginia Arlington, Virginia Chantilly, Virginia Charlottesville, Virginia Falls Church, Virginia Herndon, Virginia McLean, Virginia Norfolk, Virginia Stafford, Virginia Seattle, Washington ©2012 Booz Allen Hamilton Inc. BA12-269Use of DoD imagery does not constitute or imply endorsement.