SlideShare a Scribd company logo
1 of 12
Download to read offline
Big Data Challenges
in the DoD and IC
Wes Caldwell
Chief Architect
Intelligent Software Solutions
Topics
• Introduction to ISS
• The growth of data
• Our customer’s data environment
• The need for effective big-data management
• Search as the cornerstone of a big-data strategy
About ISS
• Headquartered in Colorado Springs
• Other offices located in Washington DC, Hampton VA,
Tampa FL, and Rome NY
• Innovative Solutions from “Space to Mud
and Everything Between”
• Sole prime on multiple Air Force Research Labs
programs IDIQ
• Currently Executing More Than 100 Software
Development Projects
• Over 800 employees
• Strength in Solutions Development and
Deployment
• Consistently Recognized as a Leader
• Recognized as a Deloitte Fast 50 Colorado
company and a Deloitte Fast 500 company over
eight consecutive years
• Three-time Inc. Magazine 500 winner
• 2009 Defense Company of the Year
ISS Solution Space/Value Proposition
• Reusable and license-free to US
Federal Government (GOTS)
• Committed to providing best ROI
to our customers by integrating
leading open-source solutions into
our products and services
• Scalable from a single desktop
solution to large distributed
networks with thousands of users
• Customizable to each
organization’s unique analytical and
information technology
infrastructure
• Operationally proven, secure and
accredited for all major classified
networks
ISS Business Strategy
Government
Off The Shelf
(GOTS)
Commercial
Off The Shelf
(COTS)
Subject
MatterExperts
(SMEs)
• Low Barrier to Entry: No license
fees to US Government Agencies
• Fast: Proven baseline provides
immediate capability
• Turnkey: Highly customizable
solutions can be implemented
quickly with no development
• Solutions Oriented: Subject Matter
Experts support implementation in
each domain
• Low Cost: Cost of Adding Features
is shared across large customer
base; all customers benefit
Blending the best elements of each industry model to provide low risk,
nonproprietary, high payoff solutions—fast! 6
The growth of data
• Most electronic information is not relational,
but unstructured (textual, binary) or semi-
structured (spreadsheet, RSS feed, etc.)
– In 2007, the estimated information content of all
human knowledge was 295 exabytes(295 million
terabytes)
– Data production will be 44 times greater in 2020
than in 2009
• Approx 35 zetabytes total (35 billion terabytes)
• A majority of the data produced in the future will
be unstructured
– A tremendous amount of information and
knowledge is dormant within unstructured data
Our customer’s data environment
• Literally thousands of data sources/feeds
from a variety of strategic, national, and
tactical sources
– Media (documents, images, etc.)
– Human interactions
– Geospatial
– Open Source (News feeds, RSS)
– Imagery/Video
– Many more…
How our analysts feel
The need for effective “big-data” management
• Analysts are looking to extract knowledge from the massive heterogeneous
data sets, providing “actionable intelligence”
• Tactical environments absolutely demand effective management of data
– Time to live on the relevance of data collected can be very short
– Communications pipes aren’t as optimal as large CONUS-based data
centers, so reduction of data based on tactical conditions (i.e. AOR,
Problem Domain, etc.) is critical
• Search and Analytics are key enablers to allow an analyst to reliably search
through large amounts of information, and to focus their efforts around a
subset of that information to perform deeper analysis
Search IS the cornerstone of an effective big-data strategy
Structured Content
Semi-Structured
Content
Un-Structured
Content
Content Cache
(Haystacks)
Content Acquisition
Tenets
• Connector architecture
• Data normalization
• Data staging
• Data Compartmenting
(Multiple Haystacks)
Tenets
• Optimized Index of Content
for Search and Discovery of
Big Data
• Analyst Topics that “Shrink
the Haystack” Search
Features (Facets, Auto-
Complete, Tagging,
Comments, etc.)
• Semantic (Synonym) Search
based on pluggable
taxonomies
Search/Discovery
Content Index
NLP Pipeline
Semantic Enrichment
Categorization
Named
Entity
Recognition
Clustering
Gazetteers
Tenets
• “Domain Spaces” that
support pluggable entity
recognition and
categorization
• Continuous feedback loop
that improves the system
over time with analyst
input
• Lexicon-based analytics
that allows for targeted
categorization across
corpus of data
Tenets
• Data Reduction into
focused “Data
Perspectives”
• Data perspectives stored
in optimized formats
(e.g. Graph, Time Series,
Geo, etc.) for the
questions being asked
• Leveraging industry-
standard parallel
processing frameworks
for scalable analytics
Data Perspectives
Data
How can Search help you?
Have a great conference!!!

More Related Content

What's hot

MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.Łukasz Grala
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantStuart Miniman
 
Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogCortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogMSAdvAnalytics
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorDataWorks Summit
 
There's Gold in Them Thar Data
There's Gold in Them Thar DataThere's Gold in Them Thar Data
There's Gold in Them Thar Datadclsocialmedia
 
Big data – An Introduction, July 2013
Big data – An Introduction, July 2013Big data – An Introduction, July 2013
Big data – An Introduction, July 2013Peter Morgan
 
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATION
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATIONKNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATION
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATIONConnected Data World
 
Building A Self Service Analytics Platform on Hadoop
Building A Self Service Analytics Platform on HadoopBuilding A Self Service Analytics Platform on Hadoop
Building A Self Service Analytics Platform on HadoopCraig Warman
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big datakk1718
 
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...datacite
 

What's hot (20)

MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogCortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data Catalog
 
Big data
Big dataBig data
Big data
 
What is Big Data ?
What is Big Data ?What is Big Data ?
What is Big Data ?
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services Sector
 
There's Gold in Them Thar Data
There's Gold in Them Thar DataThere's Gold in Them Thar Data
There's Gold in Them Thar Data
 
Big data(1st presentation)
Big data(1st presentation)Big data(1st presentation)
Big data(1st presentation)
 
Big data – An Introduction, July 2013
Big data – An Introduction, July 2013Big data – An Introduction, July 2013
Big data – An Introduction, July 2013
 
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATION
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATIONKNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATION
KNOWLEDGE ARCHITECTURE: IT’S IMPORTANCE TO AN ORGANIZATION
 
Big Data
Big DataBig Data
Big Data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Building A Self Service Analytics Platform on Hadoop
Building A Self Service Analytics Platform on HadoopBuilding A Self Service Analytics Platform on Hadoop
Building A Self Service Analytics Platform on Hadoop
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data tools
Big data toolsBig data tools
Big data tools
 
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
 

Viewers also liked

IE のサポート変更が Azure に及ぼす影響
IE のサポート変更が Azure に及ぼす影響IE のサポート変更が Azure に及ぼす影響
IE のサポート変更が Azure に及ぼす影響彰 村地
 
Network Forensics Puzzle Contest に挑戦 #1
Network Forensics Puzzle Contest に挑戦 #1Network Forensics Puzzle Contest に挑戦 #1
Network Forensics Puzzle Contest に挑戦 #1彰 村地
 
Lucene rev preso busch realtime search lr1010
Lucene rev preso busch realtime search lr1010Lucene rev preso busch realtime search lr1010
Lucene rev preso busch realtime search lr1010Lucidworks (Archived)
 
Windows 8 で魅力的なWeb サイトを作る
Windows 8 で魅力的なWeb サイトを作るWindows 8 で魅力的なWeb サイトを作る
Windows 8 で魅力的なWeb サイトを作る彰 村地
 
2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solrLucidworks (Archived)
 
Tennis
TennisTennis
Tennisaritz
 
Integration of apache solr with crawlers
Integration of apache solr with crawlersIntegration of apache solr with crawlers
Integration of apache solr with crawlersLucidworks (Archived)
 
O asis1 2[1]
O asis1 2[1]O asis1 2[1]
O asis1 2[1]tanica
 
In The Annals Of Rock History The Who
In The Annals Of Rock History The WhoIn The Annals Of Rock History The Who
In The Annals Of Rock History The Whotanica
 
Lucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lrLucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lrLucidworks (Archived)
 
Davis mark advanced search analytics in 20 minutes
Davis mark   advanced search analytics in 20 minutesDavis mark   advanced search analytics in 20 minutes
Davis mark advanced search analytics in 20 minutesLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Нестандартные методы интернет рекламы
Нестандартные методы интернет рекламыНестандартные методы интернет рекламы
Нестандартные методы интернет рекламыVladimir
 

Viewers also liked (20)

Real Time Search at Yammer
Real Time Search at YammerReal Time Search at Yammer
Real Time Search at Yammer
 
La Pensadora
La PensadoraLa Pensadora
La Pensadora
 
Portades
PortadesPortades
Portades
 
IE のサポート変更が Azure に及ぼす影響
IE のサポート変更が Azure に及ぼす影響IE のサポート変更が Azure に及ぼす影響
IE のサポート変更が Azure に及ぼす影響
 
Network Forensics Puzzle Contest に挑戦 #1
Network Forensics Puzzle Contest に挑戦 #1Network Forensics Puzzle Contest に挑戦 #1
Network Forensics Puzzle Contest に挑戦 #1
 
Lucene rev preso busch realtime search lr1010
Lucene rev preso busch realtime search lr1010Lucene rev preso busch realtime search lr1010
Lucene rev preso busch realtime search lr1010
 
All Data Big and Small
All Data Big and SmallAll Data Big and Small
All Data Big and Small
 
Windows 8 で魅力的なWeb サイトを作る
Windows 8 で魅力的なWeb サイトを作るWindows 8 で魅力的なWeb サイトを作る
Windows 8 で魅力的なWeb サイトを作る
 
2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr
 
Tennis
TennisTennis
Tennis
 
Integration of apache solr with crawlers
Integration of apache solr with crawlersIntegration of apache solr with crawlers
Integration of apache solr with crawlers
 
O asis1 2[1]
O asis1 2[1]O asis1 2[1]
O asis1 2[1]
 
What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9
 
In The Annals Of Rock History The Who
In The Annals Of Rock History The WhoIn The Annals Of Rock History The Who
In The Annals Of Rock History The Who
 
Lucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lrLucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lr
 
Davis mark advanced search analytics in 20 minutes
Davis mark   advanced search analytics in 20 minutesDavis mark   advanced search analytics in 20 minutes
Davis mark advanced search analytics in 20 minutes
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9
 
Нестандартные методы интернет рекламы
Нестандартные методы интернет рекламыНестандартные методы интернет рекламы
Нестандартные методы интернет рекламы
 
Web Design Course Overview
Web Design Course OverviewWeb Design Course Overview
Web Design Course Overview
 

Similar to Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC

Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...Pedro Mac Dowell Innecco
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
Zen and the Art of Datanauting
Zen and the Art of DatanautingZen and the Art of Datanauting
Zen and the Art of DatanautingOntologySystems
 
Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...African Open Science Platform
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science OverviewDavide Mauri
 
Data Structures - The Cornerstone of Your Data’s Home
Data Structures - The Cornerstone of Your Data’s HomeData Structures - The Cornerstone of Your Data’s Home
Data Structures - The Cornerstone of Your Data’s HomeDATAVERSITY
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatiaSatish Bhatia
 
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 

Similar to Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC (20)

Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Zen and the Art of Datanauting
Zen and the Art of DatanautingZen and the Art of Datanauting
Zen and the Art of Datanauting
 
Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science Overview
 
Big data
Big dataBig data
Big data
 
Data Structures - The Cornerstone of Your Data’s Home
Data Structures - The Cornerstone of Your Data’s HomeData Structures - The Cornerstone of Your Data’s Home
Data Structures - The Cornerstone of Your Data’s Home
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatia
 
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in Graphdatenbanken
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 

More from Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)
 

More from Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 

Recently uploaded

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC

  • 1.
  • 2. Big Data Challenges in the DoD and IC Wes Caldwell Chief Architect Intelligent Software Solutions
  • 3. Topics • Introduction to ISS • The growth of data • Our customer’s data environment • The need for effective big-data management • Search as the cornerstone of a big-data strategy
  • 4. About ISS • Headquartered in Colorado Springs • Other offices located in Washington DC, Hampton VA, Tampa FL, and Rome NY • Innovative Solutions from “Space to Mud and Everything Between” • Sole prime on multiple Air Force Research Labs programs IDIQ • Currently Executing More Than 100 Software Development Projects • Over 800 employees • Strength in Solutions Development and Deployment • Consistently Recognized as a Leader • Recognized as a Deloitte Fast 50 Colorado company and a Deloitte Fast 500 company over eight consecutive years • Three-time Inc. Magazine 500 winner • 2009 Defense Company of the Year
  • 5. ISS Solution Space/Value Proposition • Reusable and license-free to US Federal Government (GOTS) • Committed to providing best ROI to our customers by integrating leading open-source solutions into our products and services • Scalable from a single desktop solution to large distributed networks with thousands of users • Customizable to each organization’s unique analytical and information technology infrastructure • Operationally proven, secure and accredited for all major classified networks
  • 6. ISS Business Strategy Government Off The Shelf (GOTS) Commercial Off The Shelf (COTS) Subject MatterExperts (SMEs) • Low Barrier to Entry: No license fees to US Government Agencies • Fast: Proven baseline provides immediate capability • Turnkey: Highly customizable solutions can be implemented quickly with no development • Solutions Oriented: Subject Matter Experts support implementation in each domain • Low Cost: Cost of Adding Features is shared across large customer base; all customers benefit Blending the best elements of each industry model to provide low risk, nonproprietary, high payoff solutions—fast! 6
  • 7. The growth of data • Most electronic information is not relational, but unstructured (textual, binary) or semi- structured (spreadsheet, RSS feed, etc.) – In 2007, the estimated information content of all human knowledge was 295 exabytes(295 million terabytes) – Data production will be 44 times greater in 2020 than in 2009 • Approx 35 zetabytes total (35 billion terabytes) • A majority of the data produced in the future will be unstructured – A tremendous amount of information and knowledge is dormant within unstructured data
  • 8. Our customer’s data environment • Literally thousands of data sources/feeds from a variety of strategic, national, and tactical sources – Media (documents, images, etc.) – Human interactions – Geospatial – Open Source (News feeds, RSS) – Imagery/Video – Many more…
  • 10. The need for effective “big-data” management • Analysts are looking to extract knowledge from the massive heterogeneous data sets, providing “actionable intelligence” • Tactical environments absolutely demand effective management of data – Time to live on the relevance of data collected can be very short – Communications pipes aren’t as optimal as large CONUS-based data centers, so reduction of data based on tactical conditions (i.e. AOR, Problem Domain, etc.) is critical • Search and Analytics are key enablers to allow an analyst to reliably search through large amounts of information, and to focus their efforts around a subset of that information to perform deeper analysis
  • 11. Search IS the cornerstone of an effective big-data strategy Structured Content Semi-Structured Content Un-Structured Content Content Cache (Haystacks) Content Acquisition Tenets • Connector architecture • Data normalization • Data staging • Data Compartmenting (Multiple Haystacks) Tenets • Optimized Index of Content for Search and Discovery of Big Data • Analyst Topics that “Shrink the Haystack” Search Features (Facets, Auto- Complete, Tagging, Comments, etc.) • Semantic (Synonym) Search based on pluggable taxonomies Search/Discovery Content Index NLP Pipeline Semantic Enrichment Categorization Named Entity Recognition Clustering Gazetteers Tenets • “Domain Spaces” that support pluggable entity recognition and categorization • Continuous feedback loop that improves the system over time with analyst input • Lexicon-based analytics that allows for targeted categorization across corpus of data Tenets • Data Reduction into focused “Data Perspectives” • Data perspectives stored in optimized formats (e.g. Graph, Time Series, Geo, etc.) for the questions being asked • Leveraging industry- standard parallel processing frameworks for scalable analytics Data Perspectives Data
  • 12. How can Search help you? Have a great conference!!!