Cavity Data

Background
• Project created on Google Code in June 2010 (under MIT licence)
• https://code.google.com/p/cavity/

• Very much a “personal” open source project
•
•
•
•
•

I didn’t want to be writing the same plumbing repeatedly
I figured other people might get mileage from the code and packages
It’s probably rather idiosyncratic
Everything is TDD with as near to total coverage as I can get
Releases are pushed to NuGet
• https://www.nuget.org/packages?q=Cavity

• At some point I will probably port to GitHub

Visual Studio Solutions
• There are 43 solutions, but the ones I use heavily are:
•
•
•
•
•
•
•
•

Cavity Configuration
Cavity Core
Cavity Data
Cavity Diagnostics log4net
Cavity Domain
Cavity Domain (Royal Mail)
Cavity Service Location
Cavity Unit Testing

Cavity Data
• Depends on:
• Cavity Core
• .NET 2.0 / 3.5 / 4.0 (all three frameworks are separately targeted)

• Comma-separation and tab-separation are currently implemented
• Designed as forward-only, read-only StreamReader implementations
• Expectation that data files are immutable (write-once, read-many)

• Reader implementations are wrapped by DataSheet encapsulations
• Data is primarily exposed as IEnumerable<KeyStringDictionary>
• Enables the power of System.Linq

•
•
•
•

Uses Cavity extensively
Drives much of the current development
Processes between 200 to 300 million records per day (≈ 50Gb data)
Example read rate (whole of UK property model)
• 30 million records (12½ Gb), with 180 columns (5.4 billion data points)
• StreamReader ReadLine()
• 2 minutes total read time = 15,000,000 records/sec

• CsvStreamReader ReadEntry<T>()
• 10 minutes total read time = 50,000 records/sec

• The philosophy is to squeeze maximum value from dedicated tin
• Predictable fixed cost with near-zero marginal cost

Use Cases
The Good

The Bad

• Consuming feeds
• Producing feeds
• (Near-) Non-Volatile data
• Excellent match to CQRS (+REST)
• When storage is cheap

• Volatile data
• Very large numbers of
consumers
• Mismatch to classic n-tier
architectures
• Limited tooling for ad hoc
queries

Cavity Data

More Related Content

What's hot

Similar to Cavity Data

More from Alan Dean

Recently uploaded

Cavity Data