SlideShare a Scribd company logo
1 of 22
Use Cases For Cassandra in
Federal and State
Government
Chris Bradford and Matt Overstreet
Matt Overstreet
● Software Architect
● Search relevancy engineer
● Has worked on systems ranging
from Tractor Trailer weigh stations
to celebrity websites
● Likes Cassandra
GitHub: omnifroodle
● DataStax Cassandra Architect
● Contributor to CQLEngine -
Python C* ORM
● Developed Trireme -
a C* migration engine
● Created the world’s smallest C*
cluster
Chris Bradford
Twitter: @bradfordcp
GitHub: bradfordcp
Who we are
● Consulting firm based in Charlottesville
Virginia
● Founded in 2005
● 30 consultants delivering projects
● Focused on Search in 2010, specifically Solr
and Lucene
● Delivering Cassandra Consulting since 2012
● Datastax Gold partner
● Great with Search, Analytics and Discovery
Blog & Publications
● Blog: http://o19s.com/blog/
● Twitter: @o19s
● Books
o Relevant Search
(Manning)
o Building a Search
Server with
Elasticsearch (Packt)
o Apache Solr
Enterprise Search
Server (Packt)
How we got here
OpenSource Connections started with a deep
expertise in full text search.
As the size and velocity of the data we interact
with grew, so did our toolset for storing,
presenting and processing that data.
OSC Toolkit
Some Use Cases
- Analytics Workloads
- Welfare Fraud Detection
- Intrusion Detection
- Distributed Data Warehousing
- Data Warehouse/Sink
- Replication & Recovery
Analytics Workloads
Look for patterns of user error, fraud and abuse
in forms submitted to an agency.
Requires the ability to compare submissions to
look for similar identifiers such like name, street
address, etc
Welfare Fraud Detection
● Massive amounts of data
● Hard to compare and find patterns
● Difficult to incorporate human analysis
Welfare Fraud Detection
● Ingest data into the system or work on data
in place
● Fraud Score Generation
o Automated rules
o Manually
● Employees can now focus on reviewing the
flagged records
Intrusion Detection
● Stream log data in to C* from applications
● Surface metrics through a security
dashboard
● Perform analysis on records looking for
anomalies (Optional) CREATE TABLE ids (
window TIMESTAMP,
route VARCHAR,
status_code VARCHAR,
request_id TIMEUUID,
PRIMARY KEY ((window, route,
Intrusion Detection
Distributed Data Warehouse
● Cassandra is designed in a peer
to peer architecture. There are no
“masters” or “slaves”.
● True distributed load, write anywhere, read
anywhere.
● Built-in replication between data centers.
Simple Distributed Applications
Data Warehouse
● Cassandra is used to house case data from
disparate systems
● Data is then pushed into a full text search
index
● Cases may now be searched through an
intuitive web interface
Operations
● Widely compatible with programming
languages used in enterprise development
● OpsCenter monitoring tool
● Cassandra scales predictably
● Fault-tolerant
Use Case Review
● Analytics Workloads
○ Welfare Fraud Detection
○ Intrusion Detection
● Distributed Data Warehousing
○ Data Warehouse/Sink
○ Replication & Recovery
Q & A

More Related Content

What's hot

Introducing Hydra – An Open Source Document Processing Framework
Introducing Hydra – An Open Source Document Processing FrameworkIntroducing Hydra – An Open Source Document Processing Framework
Introducing Hydra – An Open Source Document Processing Frameworklucenerevolution
 
Search Analytics with ELK (Elastic Stack)
Search Analytics with ELK (Elastic Stack)Search Analytics with ELK (Elastic Stack)
Search Analytics with ELK (Elastic Stack)MC+A
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 
Webinar: Fusion for Data Science
Webinar: Fusion for Data ScienceWebinar: Fusion for Data Science
Webinar: Fusion for Data ScienceLucidworks
 
This Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineThis Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineGrant Ingersoll
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 
So we all have ORCID integrations, now what?
So we all have ORCID integrations, now what?So we all have ORCID integrations, now what?
So we all have ORCID integrations, now what?Bram Luyten
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaAvinash Ramineni
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stackVikrant Chauhan
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureObjectRocket
 
Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Grant Ingersoll
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...Edureka!
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseRobert Lujo
 
Webinar: Rapid Solr Development with Fusion
Webinar: Rapid Solr Development with FusionWebinar: Rapid Solr Development with Fusion
Webinar: Rapid Solr Development with FusionLucidworks
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012Amazon Web Services
 
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Caserta
 

What's hot (20)

Introducing Hydra – An Open Source Document Processing Framework
Introducing Hydra – An Open Source Document Processing FrameworkIntroducing Hydra – An Open Source Document Processing Framework
Introducing Hydra – An Open Source Document Processing Framework
 
Intro to Search
Intro to SearchIntro to Search
Intro to Search
 
Search Analytics with ELK (Elastic Stack)
Search Analytics with ELK (Elastic Stack)Search Analytics with ELK (Elastic Stack)
Search Analytics with ELK (Elastic Stack)
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Webinar: Fusion for Data Science
Webinar: Fusion for Data ScienceWebinar: Fusion for Data Science
Webinar: Fusion for Data Science
 
This Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineThis Ain't Your Parent's Search Engine
This Ain't Your Parent's Search Engine
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
So we all have ORCID integrations, now what?
So we all have ORCID integrations, now what?So we all have ORCID integrations, now what?
So we all have ORCID integrations, now what?
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and Kibana
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the future
 
Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4
 
Elastic Stack Roadmap
Elastic Stack RoadmapElastic Stack Roadmap
Elastic Stack Roadmap
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
 
Webinar: Rapid Solr Development with Fusion
Webinar: Rapid Solr Development with FusionWebinar: Rapid Solr Development with Fusion
Webinar: Rapid Solr Development with Fusion
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
 
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
 

Viewers also liked

Lucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiquesLucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiquesSylvain Wallez
 
Lessons Learned with Spark at the US Patent & Trademark Office
Lessons Learned with Spark at the US Patent & Trademark OfficeLessons Learned with Spark at the US Patent & Trademark Office
Lessons Learned with Spark at the US Patent & Trademark OfficeOpenSource Connections
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Spark Summit
 
Core Techs Et Lucene
Core Techs Et LuceneCore Techs Et Lucene
Core Techs Et LuceneCore-Techs
 
User Experience Design Considerations for Multi-Museum Collaborations
User Experience Design Considerations for Multi-Museum CollaborationsUser Experience Design Considerations for Multi-Museum Collaborations
User Experience Design Considerations for Multi-Museum CollaborationsDesign for Context
 
Presentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUGPresentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUGfrancelabs
 

Viewers also liked (6)

Lucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiquesLucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiques
 
Lessons Learned with Spark at the US Patent & Trademark Office
Lessons Learned with Spark at the US Patent & Trademark OfficeLessons Learned with Spark at the US Patent & Trademark Office
Lessons Learned with Spark at the US Patent & Trademark Office
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
Core Techs Et Lucene
Core Techs Et LuceneCore Techs Et Lucene
Core Techs Et Lucene
 
User Experience Design Considerations for Multi-Museum Collaborations
User Experience Design Considerations for Multi-Museum CollaborationsUser Experience Design Considerations for Multi-Museum Collaborations
User Experience Design Considerations for Multi-Museum Collaborations
 
Presentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUGPresentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUG
 

Similar to Use cases for cassandra in federal and state government

Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucFraugster
 
Big data at scrapinghub
Big data at scrapinghubBig data at scrapinghub
Big data at scrapinghubDana Brophy
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricCambridge Semantics
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Neo4j
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"Rob Winters
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSDataStax Academy
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformGoDataDriven
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Databricks
 
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...VMware Tanzu
 
Confluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & LearnConfluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & Learnconfluent
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB
 
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio AlcacerApache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio AlcacerStratio
 
Key Skills Required for Data Engineering
Key Skills Required for Data EngineeringKey Skills Required for Data Engineering
Key Skills Required for Data EngineeringFibonalabs
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Databricks
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 

Similar to Use cases for cassandra in federal and state government (20)

Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana Goriuc
 
Big data at scrapinghub
Big data at scrapinghubBig data at scrapinghub
Big data at scrapinghub
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
DataStax
DataStaxDataStax
DataStax
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
 
Confluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & LearnConfluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & Learn
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
 
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio AlcacerApache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
 
Key Skills Required for Data Engineering
Key Skills Required for Data EngineeringKey Skills Required for Data Engineering
Key Skills Required for Data Engineering
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 

More from OpenSource Connections

How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessOpenSource Connections
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019OpenSource Connections
 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullOpenSource Connections
 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonOpenSource Connections
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...OpenSource Connections
 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajOpenSource Connections
 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...OpenSource Connections
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlOpenSource Connections
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesOpenSource Connections
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerOpenSource Connections
 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...OpenSource Connections
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...OpenSource Connections
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...OpenSource Connections
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...OpenSource Connections
 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...OpenSource Connections
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...OpenSource Connections
 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah ViaOpenSource Connections
 

More from OpenSource Connections (20)

Encores
EncoresEncores
Encores
 
Test driven relevancy
Test driven relevancyTest driven relevancy
Test driven relevancy
 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for Success
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019
 
Payloads and OCR with Solr
Payloads and OCR with SolrPayloads and OCR with Solr
Payloads and OCR with Solr
 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
 

Recently uploaded

(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Service
(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Service(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Service
(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jatin Das Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130 Available With Roomishabajaj13
 
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile ServiceCunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile ServiceHigh Profile Call Girls
 
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas Whats Up Number
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas  Whats Up Number##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas  Whats Up Number
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas Whats Up NumberMs Riya
 
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...anilsa9823
 
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...Suhani Kapoor
 
Item # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdfItem # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdfahcitycouncil
 
2024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 282024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 28JSchaus & Associates
 
(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service
(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service
(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
2024 Zoom Reinstein Legacy Asbestos Webinar
2024 Zoom Reinstein Legacy Asbestos Webinar2024 Zoom Reinstein Legacy Asbestos Webinar
2024 Zoom Reinstein Legacy Asbestos WebinarLinda Reinstein
 
Fair Trash Reduction - West Hartford, CT
Fair Trash Reduction - West Hartford, CTFair Trash Reduction - West Hartford, CT
Fair Trash Reduction - West Hartford, CTaccounts329278
 
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...Suhani Kapoor
 
Precarious profits? Why firms use insecure contracts, and what would change t...
Precarious profits? Why firms use insecure contracts, and what would change t...Precarious profits? Why firms use insecure contracts, and what would change t...
Precarious profits? Why firms use insecure contracts, and what would change t...ResolutionFoundation
 
The Most Attractive Pune Call Girls Handewadi Road 8250192130 Will You Miss T...
The Most Attractive Pune Call Girls Handewadi Road 8250192130 Will You Miss T...The Most Attractive Pune Call Girls Handewadi Road 8250192130 Will You Miss T...
The Most Attractive Pune Call Girls Handewadi Road 8250192130 Will You Miss T...ranjana rawat
 
PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)ahcitycouncil
 
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…nishakur201
 

Recently uploaded (20)

(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Service
(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Service(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Service
(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jatin Das Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130 Available With Room
 
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile ServiceCunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
 
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas Whats Up Number
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas  Whats Up Number##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas  Whats Up Number
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas Whats Up Number
 
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
 
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
 
Item # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdfItem # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdf
 
9953330565 Low Rate Call Girls In Adarsh Nagar Delhi NCR
9953330565 Low Rate Call Girls In Adarsh Nagar Delhi NCR9953330565 Low Rate Call Girls In Adarsh Nagar Delhi NCR
9953330565 Low Rate Call Girls In Adarsh Nagar Delhi NCR
 
2024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 282024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 28
 
(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service
(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service
(SUHANI) Call Girls Pimple Saudagar ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls In Rohini ꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
Call Girls In  Rohini ꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCeCall Girls In  Rohini ꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
Call Girls In Rohini ꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
 
Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...
Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
 
How to Save a Place: 12 Tips To Research & Know the Threat
How to Save a Place: 12 Tips To Research & Know the ThreatHow to Save a Place: 12 Tips To Research & Know the Threat
How to Save a Place: 12 Tips To Research & Know the Threat
 
2024 Zoom Reinstein Legacy Asbestos Webinar
2024 Zoom Reinstein Legacy Asbestos Webinar2024 Zoom Reinstein Legacy Asbestos Webinar
2024 Zoom Reinstein Legacy Asbestos Webinar
 
Fair Trash Reduction - West Hartford, CT
Fair Trash Reduction - West Hartford, CTFair Trash Reduction - West Hartford, CT
Fair Trash Reduction - West Hartford, CT
 
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
 
Precarious profits? Why firms use insecure contracts, and what would change t...
Precarious profits? Why firms use insecure contracts, and what would change t...Precarious profits? Why firms use insecure contracts, and what would change t...
Precarious profits? Why firms use insecure contracts, and what would change t...
 
The Most Attractive Pune Call Girls Handewadi Road 8250192130 Will You Miss T...
The Most Attractive Pune Call Girls Handewadi Road 8250192130 Will You Miss T...The Most Attractive Pune Call Girls Handewadi Road 8250192130 Will You Miss T...
The Most Attractive Pune Call Girls Handewadi Road 8250192130 Will You Miss T...
 
PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)
 
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
 

Use cases for cassandra in federal and state government

  • 1. Use Cases For Cassandra in Federal and State Government Chris Bradford and Matt Overstreet
  • 2. Matt Overstreet ● Software Architect ● Search relevancy engineer ● Has worked on systems ranging from Tractor Trailer weigh stations to celebrity websites ● Likes Cassandra GitHub: omnifroodle
  • 3. ● DataStax Cassandra Architect ● Contributor to CQLEngine - Python C* ORM ● Developed Trireme - a C* migration engine ● Created the world’s smallest C* cluster Chris Bradford Twitter: @bradfordcp GitHub: bradfordcp
  • 4. Who we are ● Consulting firm based in Charlottesville Virginia ● Founded in 2005 ● 30 consultants delivering projects ● Focused on Search in 2010, specifically Solr and Lucene ● Delivering Cassandra Consulting since 2012 ● Datastax Gold partner ● Great with Search, Analytics and Discovery
  • 5. Blog & Publications ● Blog: http://o19s.com/blog/ ● Twitter: @o19s ● Books o Relevant Search (Manning) o Building a Search Server with Elasticsearch (Packt) o Apache Solr Enterprise Search Server (Packt)
  • 6. How we got here OpenSource Connections started with a deep expertise in full text search. As the size and velocity of the data we interact with grew, so did our toolset for storing, presenting and processing that data.
  • 8. Some Use Cases - Analytics Workloads - Welfare Fraud Detection - Intrusion Detection - Distributed Data Warehousing - Data Warehouse/Sink - Replication & Recovery
  • 9. Analytics Workloads Look for patterns of user error, fraud and abuse in forms submitted to an agency. Requires the ability to compare submissions to look for similar identifiers such like name, street address, etc
  • 10. Welfare Fraud Detection ● Massive amounts of data ● Hard to compare and find patterns ● Difficult to incorporate human analysis
  • 11. Welfare Fraud Detection ● Ingest data into the system or work on data in place ● Fraud Score Generation o Automated rules o Manually ● Employees can now focus on reviewing the flagged records
  • 12.
  • 13. Intrusion Detection ● Stream log data in to C* from applications ● Surface metrics through a security dashboard ● Perform analysis on records looking for anomalies (Optional) CREATE TABLE ids ( window TIMESTAMP, route VARCHAR, status_code VARCHAR, request_id TIMEUUID, PRIMARY KEY ((window, route,
  • 15. Distributed Data Warehouse ● Cassandra is designed in a peer to peer architecture. There are no “masters” or “slaves”. ● True distributed load, write anywhere, read anywhere. ● Built-in replication between data centers.
  • 17.
  • 18. Data Warehouse ● Cassandra is used to house case data from disparate systems ● Data is then pushed into a full text search index ● Cases may now be searched through an intuitive web interface
  • 19.
  • 20. Operations ● Widely compatible with programming languages used in enterprise development ● OpsCenter monitoring tool ● Cassandra scales predictably ● Fault-tolerant
  • 21. Use Case Review ● Analytics Workloads ○ Welfare Fraud Detection ○ Intrusion Detection ● Distributed Data Warehousing ○ Data Warehouse/Sink ○ Replication & Recovery
  • 22. Q & A

Editor's Notes

  1. Matt - We are based in Charlottesville Virginia. (and big fans of the amtrak line to DC) We’ve always been interested in search, (one of our founders wrote the book on it - see next slide). In 2010 we really made search our focus and have been adding related technologies to really help deliver on full text search. In 2012 we also started delivering Cassandra consulting, and we are currently a Datastax Gold Partner.
  2. Relevant search will be out soon, great book about the art of tuning search results. Building a search server with ElasticSearch -> is a great video introduction to both the Angular javascript framework and ElasticSearch. Apache Solr Enterprise is the definitive guide for planning, building and maintaining Apache Solr
  3. OpenSource connections started with a deep expertise in full text search. As the size and velocity of the data we interact with grew, so did our toolset for storing and processing that data. The size of the documents we needed to search over grew, as did the demands for better pre-processing of those documents. As we were storing and searching increasing millions of documents we needed a better place to store and process them. Apache Cassandra has been a great tool for that purpose, particularly with Datastax Enterprise. DSE brings along Apache Spark and Apache Solr, both of which we’ll talk about a bit here.
  4. Here is an idea of the breadth of knowledge we have in the “Search, Analytics and Discovery” stack. This includes multiple search systems (Elasticsearch, Solr), Big Data stores (Cassandra, Spark), and frontend systems (Angular, Ember)
  5. We’ll cover a few cases where Cassandra has been a great solution. Loosely we can break the examples down into two categories, Analytics Workloads like Fraud Detection Intrusion Detection and Distributed Data Warehousing
  6. Why is Cassandra a good choice for analytics workloads? Great for time series data, which is often the core of analytics data. Cassandra is incredibly fast at writing data, which is often an issue with analytics data. Cassandra has no single point of failure, which means analytics data isn’t dropped. It scales linearly. Also, Datastax has create an Apache Spark connector. Apache Spark is data processing engine. It is capable of running on a cluster of machines, and smartely scheduling work accross them. It also supports processing “streaming” data, which is great when dealing with analytics data.
  7. Data may be ingested in batches or streamed in as data is acquired Automated rules may be run during ingestion or periodic batch jobs Manually flagged entries may be used to tune and generate automated rules Look for patterns in new data including existing data
  8. Velocity and data locality are the big stories here Spark performs some automated rule checks in both streaming and batch configurations Streaming - good for small window based checks Batch - ideal for larger jobs against the bigger dataset Machine learning may be used to develop new classifications and groups of records
  9. Why Cassandra for Intrusion Detection: Blazing fast write speed. No single point of failure. How it works: data is streamed to Cassandra into a wide row based timeslice/route/status_code data can then be monitored by timeslice to look for spikes Warning, make sure someone attends the datamodeling talk before trying this at home, you’ll need to understand how cassandra stores and access data to get the most out of this approach
  10. --- Repeat from last slide ---- Why Cassandra for Intrusion Detection: Blazing fast write speed. No single point of failure. How it works: data is streamed to Cassandra into a wide row based timeslice/route/status_code data can then be monitored by timeslice to look for spikes Warning, make sure someone attends the datamodeling talk before trying this at home, you’ll need to understand how cassandra stores and access data to get the most out of this approach
  11. Why Cassandra for this: Data replication, both locally in the data center on between data centers “tunable” consistency Cassandra is highly available as soon as you have two nodes. Data is automatically copied between nodes. Other solutions require special configuration for multi-master configurations or are only available as a commercial product. Cassandra gives you true multi-master out of the box.
  12. Netflix Example: They set up a Cassandra Cluster with nodes in Oregon and Northern Virginia. Load was simulated to a production level. To test the speed of replication they wrote 1 million records in one region. 500ms later they read all records from the data center in VA.
  13. Within the scope of a datacenter application developers interact with the cluster as though it’s a local data store. Should the local cluster go down the driver automatically routes requests to another datacenter if available.
  14. 225 YEARS of data spanning tens of millions documents Each document has over 250 fields Note that columns without data do not consume storage space Compare this to dealing with distributed Master-Slave in MS SQL or other
  15. Source documents are coming from various systems with information about part of the claim. In this case there were 10 different types of source documents including metadata about the cases.
  16. Drivers in C# .Net C++ Java Node PHP Ruby