SlideShare a Scribd company logo
1 of 24
Download to read offline
Friday, August 10, 12
Friday, August 10, 12
Friday, August 10, 12
Friday, August 10, 12
Friday, August 10, 12
Flexible schema
                        Easily to scale, increased redundancy
                        Fast enough for web requests
                        Consolidate existing services
                        Hadoop support



Friday, August 10, 12
Friday, August 10, 12
Friday, August 10, 12
FUD
                        No Ad-hoc queries

                        No Indexes

                        No range queries

                        Limited tooling

                        Code complexity

Friday, August 10, 12
Friday, August 10, 12
Friday, August 10, 12
REST

                        CQL

                        Thrift




Friday, August 10, 12
SOLR Schema
                    <?xml version="1.0" encoding="UTF-8" ?>
                 <schema name="my_column_family" version="1.0">

                        <types>
                          <fieldType name="string" class="solr.StrField"/>
                          <fieldType name="date" class="solr.DateField"/>
                        </types>

                        <fields>
                          <field name="id" type="string" indexed="true" stored="true"/>
                          <field name="name" type="string" indexed="true" stored="true"/>
                          <field name="released_at" type="date" indexed="true" stored="true"/>
                        </fields>

                   <uniqueKey>id</uniqueKey>
                   <defaultSearchField>name</defaultSearchField>
                 </schema>




Friday, August 10, 12
Basic Queries

      http://localhost:8983/solr/my_keyspace.my_column_family/select?q=name:foo



               SELECT * FROM my_column_family WHERE solr_query='name:foo';




Friday, August 10, 12
Wide Rows
                <?xml version="1.0" encoding="UTF-8" ?>
             <schema name="my_column_family" version="1.0">

                 <types>
                   <fieldType name="string" class="solr.StrField"/>
                   <fieldType name="date" class="solr.DateField"/>
                 </types>

                 <fields>
                   <field name="id" type="string" indexed="true" stored="true"/>
                   <field name="name" type="string" indexed="true" stored="true"/>
                   <field name="released_at" type="date" indexed="true" stored="true"/>
                   <dynamicField name="wide_*" type="string" indexed="true" stored="true"/>
                 </fields>

               <uniqueKey>id</uniqueKey>
               <defaultSearchField>name</defaultSearchField>
             </schema>




Friday, August 10, 12
Fuzzy Search
     <schema name="my_column_family" version="1.0">
    <types>
      <fieldType name="string" class="solr.StrField"/>
      <fieldType name="ngram" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
          <tokenizer class="solr.KeywordTokenizerFactory"/>
          <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
                              generateNumberParts="1" catenateWords="1" catenateNumbers="1"
                              catenateAll="1" preserveOriginal="1"/>
          <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
        </analyzer>
        <analyzer type="query">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        </analyzer>
      </fieldType>
    </types>
    <fields>
      <field name="id" type="string" indexed="true" stored="true" />
      <field name="name" type="string" indexed="true" stored="true" />
      <field name="name_fuzzy" type="ngram" indexed="true" stored="true" />
    </fields>
    <copyField source="name" dest="name_fuzzy"/>
    <uniqueKey>id</uniqueKey>
    <defaultSearchField>name</defaultSearchField>
  </schema>


Friday, August 10, 12
• Full-text indexing
                        • Trigrams
                        • Rich data formats (PDF, Word, HTML)
                        • Easy interop (REST,CSV, XML, JSON)
                        • Geo-spatial search
                        • Highlighting
                        • Auto-suggest
                        • Faceted search and filtering

Friday, August 10, 12
Friday, August 10, 12
Storm




Friday, August 10, 12
Storm




Friday, August 10, 12
Increased performance
                   by 700% while growing
                        data by 500%



Friday, August 10, 12
Reduced operational
                           costs by 40%



Friday, August 10, 12
Deleted 15,000 lines of code




Friday, August 10, 12
Friday, August 10, 12

More Related Content

Similar to Cassandra summit

Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010
Rob Windsor
 
Solr integration in Magento Enterprise
Solr integration in Magento EnterpriseSolr integration in Magento Enterprise
Solr integration in Magento Enterprise
Tobias Zander
 

Similar to Cassandra summit (20)

Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Solr Anti Patterns
Solr Anti PatternsSolr Anti Patterns
Solr Anti Patterns
 
Solr Anti - patterns
Solr Anti - patternsSolr Anti - patterns
Solr Anti - patterns
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Open Source Search: An Analysis
Open Source Search: An AnalysisOpen Source Search: An Analysis
Open Source Search: An Analysis
 
Linked Data Presentation at TDWI Mpls
Linked Data Presentation at TDWI MplsLinked Data Presentation at TDWI Mpls
Linked Data Presentation at TDWI Mpls
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
 
Solr features
Solr featuresSolr features
Solr features
 
PostgreSQL's Secret NoSQL Superpowers
PostgreSQL's Secret NoSQL SuperpowersPostgreSQL's Secret NoSQL Superpowers
PostgreSQL's Secret NoSQL Superpowers
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
DataStax: Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax: Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax: Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax: Enabling Search in your Cassandra Application with DataStax Enterprise
 
Tagattr is it
Tagattr is itTagattr is it
Tagattr is it
 
Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Eric Redmond – Distributed Search on Riak 2.0 - NoSQL matters Barcelona 2014
Eric Redmond – Distributed Search on Riak 2.0 - NoSQL matters Barcelona 2014Eric Redmond – Distributed Search on Riak 2.0 - NoSQL matters Barcelona 2014
Eric Redmond – Distributed Search on Riak 2.0 - NoSQL matters Barcelona 2014
 
A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)
 
Solr integration in Magento Enterprise
Solr integration in Magento EnterpriseSolr integration in Magento Enterprise
Solr integration in Magento Enterprise
 
GreenDao Introduction
GreenDao IntroductionGreenDao Introduction
GreenDao Introduction
 
Cassandra 2.1 boot camp, Protocol, Queries, CQL
Cassandra 2.1 boot camp, Protocol, Queries, CQLCassandra 2.1 boot camp, Protocol, Queries, CQL
Cassandra 2.1 boot camp, Protocol, Queries, CQL
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Cassandra summit

  • 6. Flexible schema Easily to scale, increased redundancy Fast enough for web requests Consolidate existing services Hadoop support Friday, August 10, 12
  • 9. FUD No Ad-hoc queries No Indexes No range queries Limited tooling Code complexity Friday, August 10, 12
  • 12. REST CQL Thrift Friday, August 10, 12
  • 13. SOLR Schema <?xml version="1.0" encoding="UTF-8" ?> <schema name="my_column_family" version="1.0"> <types> <fieldType name="string" class="solr.StrField"/> <fieldType name="date" class="solr.DateField"/> </types> <fields> <field name="id" type="string" indexed="true" stored="true"/> <field name="name" type="string" indexed="true" stored="true"/> <field name="released_at" type="date" indexed="true" stored="true"/> </fields> <uniqueKey>id</uniqueKey> <defaultSearchField>name</defaultSearchField> </schema> Friday, August 10, 12
  • 14. Basic Queries http://localhost:8983/solr/my_keyspace.my_column_family/select?q=name:foo SELECT * FROM my_column_family WHERE solr_query='name:foo'; Friday, August 10, 12
  • 15. Wide Rows <?xml version="1.0" encoding="UTF-8" ?> <schema name="my_column_family" version="1.0"> <types> <fieldType name="string" class="solr.StrField"/> <fieldType name="date" class="solr.DateField"/> </types> <fields> <field name="id" type="string" indexed="true" stored="true"/> <field name="name" type="string" indexed="true" stored="true"/> <field name="released_at" type="date" indexed="true" stored="true"/> <dynamicField name="wide_*" type="string" indexed="true" stored="true"/> </fields> <uniqueKey>id</uniqueKey> <defaultSearchField>name</defaultSearchField> </schema> Friday, August 10, 12
  • 16. Fuzzy Search <schema name="my_column_family" version="1.0"> <types> <fieldType name="string" class="solr.StrField"/> <fieldType name="ngram" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/> <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> </fieldType> </types> <fields> <field name="id" type="string" indexed="true" stored="true" /> <field name="name" type="string" indexed="true" stored="true" /> <field name="name_fuzzy" type="ngram" indexed="true" stored="true" /> </fields> <copyField source="name" dest="name_fuzzy"/> <uniqueKey>id</uniqueKey> <defaultSearchField>name</defaultSearchField> </schema> Friday, August 10, 12
  • 17. • Full-text indexing • Trigrams • Rich data formats (PDF, Word, HTML) • Easy interop (REST,CSV, XML, JSON) • Geo-spatial search • Highlighting • Auto-suggest • Faceted search and filtering Friday, August 10, 12
  • 21. Increased performance by 700% while growing data by 500% Friday, August 10, 12
  • 22. Reduced operational costs by 40% Friday, August 10, 12
  • 23. Deleted 15,000 lines of code Friday, August 10, 12