Big data guru John A. De Goes, CTO of Precog, presents an overview of Quirrel, a high-level, statistically-oriented, open source query language designed for advanced analytics and statistics on large-scale JSON data sets. John discusses how the language can be used to solve a variety of common problems encountered by modern application developers, and then overviews ongoing efforts to port the language to MongoDB as part of a pure open source distribution.
3. mongoDB
I want to get and
I want aggregates I want deep insight
put data
MongoDB MongoDB
Query Aggregation ???
Language Framework
SQL
data storage data intelligence
4. mongoDB
I want to get and
I want aggregates I want deep insight
put data
MongoDB MongoDB
Map
Query Aggregation
Reduce
Language Framework
SQL
data storage data intelligence
5. mongoDB
function map() {
emit(1, // Or put a GROUP BY key here
{sum: this.value, // the field you want stats for
min: this.value,
max: this.value,
count:1,
diff: 0, // M2,n: sum((val-mean)^2)
});
}
function reduce(key, values) {
var a = values[0]; // will reduce into here
for (var i=1/*!*/; i < values.length; i++){
var b = values[i]; // will merge 'b' into 'a'
// temp helpers
var delta = a.sum/a.count - b.sum/b.count; // a.mean - b.mean
var weight = (a.count * b.count)/(a.count + b.count);
// do the reducing
a.diff += b.diff + delta*delta*weight;
a.sum += b.sum;
a.count += b.count;
a.min = Math.min(a.min, b.min);
a.max = Math.max(a.max, b.max);
}
return a;
}
function finalize(key, value){
value.avg = value.sum / value.count;
value.variance = value.diff / value.count;
value.stddev = Math.sqrt(value.variance);
return value;
}
7. mongoDB
introducing
• Statistical query language for JSON data
• Purely declarative
• Implicitly parallel
• Inherently composable
8. mongoDB
a taste of quirrel
pageViews := //pageViews
bound := 1.5 * stdDev(pageViews.duration)
avg := mean(pageViews.duration)
lengthyPageViews :=
pageViews where pageViews.duration > (avg + bound)
lengthyPageViews.userId
9. mongoDB
a taste of quirrel
pageViews := //pageViews
bound := 1.5 * stdDev(pageViews.duration)
Users who spend an unusually
avg := mean(pageViews.duration) long
time looking at a page!
lengthyPageViews :=
pageViews where pageViews.duration > (avg + bound)
lengthyPageViews.userId
25. mongoDB
quirrel -> mongodb
• Quirrel is extremely expressive
• Aggregation framework insufficient
• Working with 10gen on new primitives
• Backup plan: AF + MapReduce