Big Data Challenges
in the DoD and IC
Wes Caldwell
Chief Architect
Intelligent Software Solutions
Topics
• Introduction to ISS
• The growth of data
• Our customer’s data environment
• The need for effective big-data mana...
About ISS
• Headquartered in Colorado Springs
• Other offices located in Washington DC, Hampton VA,
Tampa FL, and Rome NY
...
ISS Solution Space/Value Proposition
• Reusable and license-free to US
Federal Government (GOTS)
• Committed to providing ...
ISS Business Strategy
Government
Off The Shelf
(GOTS)
Commercial
Off The Shelf
(COTS)
Subject
MatterExperts
(SMEs)
• Low B...
The growth of data
• Most electronic information is not relational,
but unstructured (textual, binary) or semi-
structured...
Our customer’s data environment
• Literally thousands of data sources/feeds
from a variety of strategic, national, and
tac...
How our analysts feel
The need for effective “big-data” management
• Analysts are looking to extract knowledge from the massive heterogeneous
da...
Search IS the cornerstone of an effective big-data strategy
Structured Content
Semi-Structured
Content
Un-Structured
Conte...
How can Search help you?
Have a great conference!!!
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Upcoming SlideShare
Loading in...5
×

Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC

277

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
277
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC

  1. 1. Big Data Challenges in the DoD and IC Wes Caldwell Chief Architect Intelligent Software Solutions
  2. 2. Topics • Introduction to ISS • The growth of data • Our customer’s data environment • The need for effective big-data management • Search as the cornerstone of a big-data strategy
  3. 3. About ISS • Headquartered in Colorado Springs • Other offices located in Washington DC, Hampton VA, Tampa FL, and Rome NY • Innovative Solutions from “Space to Mud and Everything Between” • Sole prime on multiple Air Force Research Labs programs IDIQ • Currently Executing More Than 100 Software Development Projects • Over 800 employees • Strength in Solutions Development and Deployment • Consistently Recognized as a Leader • Recognized as a Deloitte Fast 50 Colorado company and a Deloitte Fast 500 company over eight consecutive years • Three-time Inc. Magazine 500 winner • 2009 Defense Company of the Year
  4. 4. ISS Solution Space/Value Proposition • Reusable and license-free to US Federal Government (GOTS) • Committed to providing best ROI to our customers by integrating leading open-source solutions into our products and services • Scalable from a single desktop solution to large distributed networks with thousands of users • Customizable to each organization’s unique analytical and information technology infrastructure • Operationally proven, secure and accredited for all major classified networks
  5. 5. ISS Business Strategy Government Off The Shelf (GOTS) Commercial Off The Shelf (COTS) Subject MatterExperts (SMEs) • Low Barrier to Entry: No license fees to US Government Agencies • Fast: Proven baseline provides immediate capability • Turnkey: Highly customizable solutions can be implemented quickly with no development • Solutions Oriented: Subject Matter Experts support implementation in each domain • Low Cost: Cost of Adding Features is shared across large customer base; all customers benefit Blending the best elements of each industry model to provide low risk, nonproprietary, high payoff solutions—fast! 6
  6. 6. The growth of data • Most electronic information is not relational, but unstructured (textual, binary) or semi- structured (spreadsheet, RSS feed, etc.) – In 2007, the estimated information content of all human knowledge was 295 exabytes(295 million terabytes) – Data production will be 44 times greater in 2020 than in 2009 • Approx 35 zetabytes total (35 billion terabytes) • A majority of the data produced in the future will be unstructured – A tremendous amount of information and knowledge is dormant within unstructured data
  7. 7. Our customer’s data environment • Literally thousands of data sources/feeds from a variety of strategic, national, and tactical sources – Media (documents, images, etc.) – Human interactions – Geospatial – Open Source (News feeds, RSS) – Imagery/Video – Many more…
  8. 8. How our analysts feel
  9. 9. The need for effective “big-data” management • Analysts are looking to extract knowledge from the massive heterogeneous data sets, providing “actionable intelligence” • Tactical environments absolutely demand effective management of data – Time to live on the relevance of data collected can be very short – Communications pipes aren’t as optimal as large CONUS-based data centers, so reduction of data based on tactical conditions (i.e. AOR, Problem Domain, etc.) is critical • Search and Analytics are key enablers to allow an analyst to reliably search through large amounts of information, and to focus their efforts around a subset of that information to perform deeper analysis
  10. 10. Search IS the cornerstone of an effective big-data strategy Structured Content Semi-Structured Content Un-Structured Content Content Cache (Haystacks) Content Acquisition Tenets • Connector architecture • Data normalization • Data staging • Data Compartmenting (Multiple Haystacks) Tenets • Optimized Index of Content for Search and Discovery of Big Data • Analyst Topics that “Shrink the Haystack” Search Features (Facets, Auto- Complete, Tagging, Comments, etc.) • Semantic (Synonym) Search based on pluggable taxonomies Search/Discovery Content Index NLP Pipeline Semantic Enrichment Categorization Named Entity Recognition Clustering Gazetteers Tenets • “Domain Spaces” that support pluggable entity recognition and categorization • Continuous feedback loop that improves the system over time with analyst input • Lexicon-based analytics that allows for targeted categorization across corpus of data Tenets • Data Reduction into focused “Data Perspectives” • Data perspectives stored in optimized formats (e.g. Graph, Time Series, Geo, etc.) for the questions being asked • Leveraging industry- standard parallel processing frameworks for scalable analytics Data Perspectives Data
  11. 11. How can Search help you? Have a great conference!!!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×