Cavity
Background
• Project created on Google Code in June 2010 (under MIT licence)
• https://code.google.com/p/cavity/

• Very much a “personal” open source project
•
•
•
•
•

I didn’t want to be writing the same plumbing repeatedly
I figured other people might get mileage from the code and packages
It’s probably rather idiosyncratic
Everything is TDD with as near to total coverage as I can get
Releases are pushed to NuGet
• https://www.nuget.org/packages?q=Cavity

• At some point I will probably port to GitHub
Visual Studio Solutions
• There are 43 solutions, but the ones I use heavily are:
•
•
•
•
•
•
•
•

Cavity Configuration
Cavity Core
Cavity Data
Cavity Diagnostics log4net
Cavity Domain
Cavity Domain (Royal Mail)
Cavity Service Location
Cavity Unit Testing
Cavity Data
• Depends on:
• Cavity Core
• .NET 2.0 / 3.5 / 4.0 (all three frameworks are separately targeted)

• Comma-separation and tab-separation are currently implemented
• Designed as forward-only, read-only StreamReader implementations
• Expectation that data files are immutable (write-once, read-many)

• Reader implementations are wrapped by DataSheet encapsulations
• Data is primarily exposed as IEnumerable<KeyStringDictionary>
• Enables the power of System.Linq
•
•
•
•

Uses Cavity extensively
Drives much of the current development
Processes between 200 to 300 million records per day (≈ 50Gb data)
Example read rate (whole of UK property model)
• 30 million records (12½ Gb), with 180 columns (5.4 billion data points)
• StreamReader ReadLine()
• 2 minutes total read time = 15,000,000 records/sec

• CsvStreamReader ReadEntry<T>()
• 10 minutes total read time = 50,000 records/sec

• The philosophy is to squeeze maximum value from dedicated tin
• Predictable fixed cost with near-zero marginal cost
Use Cases
The Good

The Bad

• Consuming feeds
• Producing feeds
• (Near-) Non-Volatile data
• Excellent match to CQRS (+REST)
• When storage is cheap

• Volatile data
• Very large numbers of
consumers
• Mismatch to classic n-tier
architectures
• Limited tooling for ad hoc
queries

Cavity Data

  • 1.
  • 2.
    Background • Project createdon Google Code in June 2010 (under MIT licence) • https://code.google.com/p/cavity/ • Very much a “personal” open source project • • • • • I didn’t want to be writing the same plumbing repeatedly I figured other people might get mileage from the code and packages It’s probably rather idiosyncratic Everything is TDD with as near to total coverage as I can get Releases are pushed to NuGet • https://www.nuget.org/packages?q=Cavity • At some point I will probably port to GitHub
  • 3.
    Visual Studio Solutions •There are 43 solutions, but the ones I use heavily are: • • • • • • • • Cavity Configuration Cavity Core Cavity Data Cavity Diagnostics log4net Cavity Domain Cavity Domain (Royal Mail) Cavity Service Location Cavity Unit Testing
  • 4.
    Cavity Data • Dependson: • Cavity Core • .NET 2.0 / 3.5 / 4.0 (all three frameworks are separately targeted) • Comma-separation and tab-separation are currently implemented • Designed as forward-only, read-only StreamReader implementations • Expectation that data files are immutable (write-once, read-many) • Reader implementations are wrapped by DataSheet encapsulations • Data is primarily exposed as IEnumerable<KeyStringDictionary> • Enables the power of System.Linq
  • 5.
    • • • • Uses Cavity extensively Drivesmuch of the current development Processes between 200 to 300 million records per day (≈ 50Gb data) Example read rate (whole of UK property model) • 30 million records (12½ Gb), with 180 columns (5.4 billion data points) • StreamReader ReadLine() • 2 minutes total read time = 15,000,000 records/sec • CsvStreamReader ReadEntry<T>() • 10 minutes total read time = 50,000 records/sec • The philosophy is to squeeze maximum value from dedicated tin • Predictable fixed cost with near-zero marginal cost
  • 6.
    Use Cases The Good TheBad • Consuming feeds • Producing feeds • (Near-) Non-Volatile data • Excellent match to CQRS (+REST) • When storage is cheap • Volatile data • Very large numbers of consumers • Mismatch to classic n-tier architectures • Limited tooling for ad hoc queries