Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC

  1. 1. Big Data Challenges in the DoD and IC Wes Caldwell Chief Architect Intelligent Software Solutions
  2. 2. Topics • Introduction to ISS • The growth of data • Our customer’s data environment • The need for effective big-data management • Search as the cornerstone of a big-data strategy
  3. 3. About ISS • Headquartered in Colorado Springs • Other offices located in Washington DC, Hampton VA, Tampa FL, and Rome NY • Innovative Solutions from “Space to Mud and Everything Between” • Sole prime on multiple Air Force Research Labs programs IDIQ • Currently Executing More Than 100 Software Development Projects • Over 800 employees • Strength in Solutions Development and Deployment • Consistently Recognized as a Leader • Recognized as a Deloitte Fast 50 Colorado company and a Deloitte Fast 500 company over eight consecutive years • Three-time Inc. Magazine 500 winner • 2009 Defense Company of the Year
  4. 4. ISS Solution Space/Value Proposition • Reusable and license-free to US Federal Government (GOTS) • Committed to providing best ROI to our customers by integrating leading open-source solutions into our products and services • Scalable from a single desktop solution to large distributed networks with thousands of users • Customizable to each organization’s unique analytical and information technology infrastructure • Operationally proven, secure and accredited for all major classified networks
  5. 5. ISS Business Strategy Government Off The Shelf (GOTS) Commercial Off The Shelf (COTS) Subject MatterExperts (SMEs) • Low Barrier to Entry: No license fees to US Government Agencies • Fast: Proven baseline provides immediate capability • Turnkey: Highly customizable solutions can be implemented quickly with no development • Solutions Oriented: Subject Matter Experts support implementation in each domain • Low Cost: Cost of Adding Features is shared across large customer base; all customers benefit Blending the best elements of each industry model to provide low risk, nonproprietary, high payoff solutions—fast! 6
  6. 6. The growth of data • Most electronic information is not relational, but unstructured (textual, binary) or semi- structured (spreadsheet, RSS feed, etc.) – In 2007, the estimated information content of all human knowledge was 295 exabytes(295 million terabytes) – Data production will be 44 times greater in 2020 than in 2009 • Approx 35 zetabytes total (35 billion terabytes) • A majority of the data produced in the future will be unstructured – A tremendous amount of information and knowledge is dormant within unstructured data
  7. 7. Our customer’s data environment • Literally thousands of data sources/feeds from a variety of strategic, national, and tactical sources – Media (documents, images, etc.) – Human interactions – Geospatial – Open Source (News feeds, RSS) – Imagery/Video – Many more…
  8. 8. How our analysts feel
  9. 9. The need for effective “big-data” management • Analysts are looking to extract knowledge from the massive heterogeneous data sets, providing “actionable intelligence” • Tactical environments absolutely demand effective management of data – Time to live on the relevance of data collected can be very short – Communications pipes aren’t as optimal as large CONUS-based data centers, so reduction of data based on tactical conditions (i.e. AOR, Problem Domain, etc.) is critical • Search and Analytics are key enablers to allow an analyst to reliably search through large amounts of information, and to focus their efforts around a subset of that information to perform deeper analysis
  10. 10. Search IS the cornerstone of an effective big-data strategy Structured Content Semi-Structured Content Un-Structured Content Content Cache (Haystacks) Content Acquisition Tenets • Connector architecture • Data normalization • Data staging • Data Compartmenting (Multiple Haystacks) Tenets • Optimized Index of Content for Search and Discovery of Big Data • Analyst Topics that “Shrink the Haystack” Search Features (Facets, Auto- Complete, Tagging, Comments, etc.) • Semantic (Synonym) Search based on pluggable taxonomies Search/Discovery Content Index NLP Pipeline Semantic Enrichment Categorization Named Entity Recognition Clustering Gazetteers Tenets • “Domain Spaces” that support pluggable entity recognition and categorization • Continuous feedback loop that improves the system over time with analyst input • Lexicon-based analytics that allows for targeted categorization across corpus of data Tenets • Data Reduction into focused “Data Perspectives” • Data perspectives stored in optimized formats (e.g. Graph, Time Series, Geo, etc.) for the questions being asked • Leveraging industry- standard parallel processing frameworks for scalable analytics Data Perspectives Data
  11. 11. How can Search help you? Have a great conference!!!