Streams of social
consciousness
Real-time data transformation
Who am I?
Psycholinguist
Research/Data
analysis
Flex Programmer
OO, Enterprise
Interactive
Developer
Browser + Server
2000 2008 2013
Marielle Lange @widged
Stream expertise
Fairly recent and rather limited
๏Gulp -> custom modules written by adapting other
modules.
๏Data analysis -> Using streams to process large size data
sets.
➡ I will Attempt to provide the minimal orientation to get
started. Staying clear of complex topics like back-
pressure handling.
Streams for data analysis
Garden Data. Aggregating data scrapped from a large number of websites.
Parsing them. Normalizing them (Farenheit vs Celsius, March in NH or SH). Reducing
them (converting [55-65] to 55 #1, 60 #1, 65 #1). Rendering them (average vs
visualisation).
➡
Streams manage a data flow.
Sources. Where data pour
from.
Sinks. Where results pour to.
Throughs. Where data gets
manipulated and
transformed.
ReadStream.
WriteStream.
What are they good for?
๏ Gulp - writing your own modules.
๏ Real-time data obtained from remote servers that would
be too impractical to buffer in a device with limited
memory.
๏ Map-reduce types of computations - a programming
model for processing and generating large data sets. A
map function generates a set of intermediate key/value
pairs ({word: ‘hello’, length: 5}) and a reduce function
merges all intermediate values associated with the same
intermediate key ([‘agile’ , ‘greet’ ,‘hello’] - list of words of
length 5). Great if you want to run computations on
distributed systems.
Streams 101
Readable Streams
Abstraction for a source of data that you are reading
data from.
‣ http responses, on the client
‣ http requests, on the server
‣ fs read streams
‣ zlib streams
‣ crypto streams
‣ tcp sockets
‣ child process stdout and stderr
‣ process.stdin
Notes
๏A readable stream will not start emitting data until you indicate that you are ready
to receive it.
๏Readable streams have two “modes”: a flowing mode and a non-flowing mode.
var flappyStream =
readable.read();
Writable Streams
Abstraction for a destination that you are writing data to.‣ http responses, on the client
‣ http requests, on the server
‣ fs write streams
‣ zlib streams
‣ crypto streams
‣ tcp sockets
‣ child process stdin
‣ process.stdout, process.stderr
writeable
.write(flappyBird);
Transforms
Compressing a file using gzip
var fs     = require(“fs”),
zlib   = require(“fs”);
var readable = fs.createReadStream("foo.txt"),
writable = fs.createWriteStream("foo.txt.gz");
readable
   .pipe(gzip)
   .pipe(writable);
var evilStream =
transform.output
.read();
transform.input
.write(flappyBird);
Abstraction for a stream that is both readable and writable,
where the input is related to the output (map or filter step).
Dominic Tarr’s `through`
module provides a similar
functionality
Basic API
Readable stream
var fs = require('fs');
var readable = fs.createReadStream('foo.txt');
// this is the classic api
readable
.on('data', function (data) { console.log('Data!', data); })
.on('error', function (err) { console.error('Error', err); })
.on('end', function () { console.log('All done!'); });
var fs = require('fs');
var readable = fs.createReadStream('foo.txt')
, writable = fs.createWriteStream('copy.txt');
readable.pipe(writable)
.on('finish', function () { writable.write('an extra line'); });
Writable stream
Toolbox
event-stream (D. Tarr)
var fs     = require(“fs”),
JSONStream = require('JSONStream'),
map = require('map-stream');
var input = fs.createReadStream("twitter-feed.json"),
output = fs.createWriteStream("twitter-sentiments.json");
input
.pipe(JSONStream.parse("*"))
.pipe(map(computeSentiments))
.pipe(output);
Stream playground (J. Resig)
Stream handbook (@Substack)
Rapidly define a list of files to read from with glob strings
Vinyl
var fs = require('fs'), vinyl = require('vinyl-fs')
vinyl.src('./data/*/quad/*.comp.json', { buffer: false }).pipe(map(mapSource));
function mapSource(file, asyncReturn) { var srcStream = file.contents;
srcStream .pipe(JSONStream.parse("*")) .pipe(SomeAnalysis)
.pipe(vinyl.dest("./out"))
};
Example
Twitter Sentiments
Register an application with the Twitter API –
https://dev.twitter.com/
Create an access token.
In your projects, add a file “secret_keys.js” with:
Takes advantage of the sentiment module:
https://github.com/thisandagain/sentiment
consumer_secret: "YOUR_CONSUMER_SECRET", access_token_key: "USER_ACCESS_TOKEN", access_token_secret: "USE
Programming
Style
Separation of concerns
The #1 reason to use streams for me is that the
piping structure encourages the writing of programs
as bite-size modules that are highly interchangeable.
In the early stages of writing the example program, I had:
tweets
.pipe(map(englishOnly))
.pipe(map(addSentiment))
Then I found out that the API gave you the option to specify a language filter.All I had to do was drop one line of code.
Functional Programming
A more functional style of programming encourages the
avoidance of side effects or state mutation.
var fs     = require(“fs”),
map   = require(“map-stream”);
var readable = fs.createReadStream("foo.txt"),
readable
   .pipe(map(filterEnglish))
function filterEnglish(data, asyncReturn) {
   if(data.language === “en”) {
// write these data to the output stream
      asyncReturn(null, data);
   } else {
// but don’t write these.
      asyncReturn();
   }
}
๏ Single Responsibility Principle: "A
function should do one thing, and do it
well."
๏ Pure functions. No knowledge of the
external world whatsoever. Every bit
of information required for the running
of the function is explicitly passed as
paramter.
๏ Immutable data. A function returns a
new data that captures the
transformation rather than a reference
to the old data.
๏ Higher Order Functions. Functions
that return functions (partials,
currying). A way to capture local

Streams

  • 1.
  • 2.
    Who am I? Psycholinguist Research/Data analysis FlexProgrammer OO, Enterprise Interactive Developer Browser + Server 2000 2008 2013 Marielle Lange @widged
  • 3.
    Stream expertise Fairly recentand rather limited ๏Gulp -> custom modules written by adapting other modules. ๏Data analysis -> Using streams to process large size data sets. ➡ I will Attempt to provide the minimal orientation to get started. Staying clear of complex topics like back- pressure handling.
  • 4.
    Streams for dataanalysis Garden Data. Aggregating data scrapped from a large number of websites. Parsing them. Normalizing them (Farenheit vs Celsius, March in NH or SH). Reducing them (converting [55-65] to 55 #1, 60 #1, 65 #1). Rendering them (average vs visualisation). ➡
  • 5.
    Streams manage adata flow. Sources. Where data pour from. Sinks. Where results pour to. Throughs. Where data gets manipulated and transformed. ReadStream. WriteStream.
  • 6.
    What are theygood for? ๏ Gulp - writing your own modules. ๏ Real-time data obtained from remote servers that would be too impractical to buffer in a device with limited memory. ๏ Map-reduce types of computations - a programming model for processing and generating large data sets. A map function generates a set of intermediate key/value pairs ({word: ‘hello’, length: 5}) and a reduce function merges all intermediate values associated with the same intermediate key ([‘agile’ , ‘greet’ ,‘hello’] - list of words of length 5). Great if you want to run computations on distributed systems.
  • 7.
  • 8.
    Readable Streams Abstraction fora source of data that you are reading data from. ‣ http responses, on the client ‣ http requests, on the server ‣ fs read streams ‣ zlib streams ‣ crypto streams ‣ tcp sockets ‣ child process stdout and stderr ‣ process.stdin Notes ๏A readable stream will not start emitting data until you indicate that you are ready to receive it. ๏Readable streams have two “modes”: a flowing mode and a non-flowing mode. var flappyStream = readable.read();
  • 9.
    Writable Streams Abstraction fora destination that you are writing data to.‣ http responses, on the client ‣ http requests, on the server ‣ fs write streams ‣ zlib streams ‣ crypto streams ‣ tcp sockets ‣ child process stdin ‣ process.stdout, process.stderr writeable .write(flappyBird);
  • 10.
    Transforms Compressing a fileusing gzip var fs     = require(“fs”), zlib   = require(“fs”); var readable = fs.createReadStream("foo.txt"), writable = fs.createWriteStream("foo.txt.gz"); readable    .pipe(gzip)    .pipe(writable); var evilStream = transform.output .read(); transform.input .write(flappyBird); Abstraction for a stream that is both readable and writable, where the input is related to the output (map or filter step). Dominic Tarr’s `through` module provides a similar functionality
  • 11.
    Basic API Readable stream varfs = require('fs'); var readable = fs.createReadStream('foo.txt'); // this is the classic api readable .on('data', function (data) { console.log('Data!', data); }) .on('error', function (err) { console.error('Error', err); }) .on('end', function () { console.log('All done!'); }); var fs = require('fs'); var readable = fs.createReadStream('foo.txt') , writable = fs.createWriteStream('copy.txt'); readable.pipe(writable) .on('finish', function () { writable.write('an extra line'); }); Writable stream
  • 12.
  • 13.
    event-stream (D. Tarr) varfs     = require(“fs”), JSONStream = require('JSONStream'), map = require('map-stream'); var input = fs.createReadStream("twitter-feed.json"), output = fs.createWriteStream("twitter-sentiments.json"); input .pipe(JSONStream.parse("*")) .pipe(map(computeSentiments)) .pipe(output);
  • 14.
  • 15.
  • 16.
    Rapidly define alist of files to read from with glob strings Vinyl var fs = require('fs'), vinyl = require('vinyl-fs') vinyl.src('./data/*/quad/*.comp.json', { buffer: false }).pipe(map(mapSource)); function mapSource(file, asyncReturn) { var srcStream = file.contents; srcStream .pipe(JSONStream.parse("*")) .pipe(SomeAnalysis) .pipe(vinyl.dest("./out")) };
  • 17.
  • 18.
    Twitter Sentiments Register anapplication with the Twitter API – https://dev.twitter.com/ Create an access token. In your projects, add a file “secret_keys.js” with: Takes advantage of the sentiment module: https://github.com/thisandagain/sentiment consumer_secret: "YOUR_CONSUMER_SECRET", access_token_key: "USER_ACCESS_TOKEN", access_token_secret: "USE
  • 19.
  • 20.
    Separation of concerns The#1 reason to use streams for me is that the piping structure encourages the writing of programs as bite-size modules that are highly interchangeable. In the early stages of writing the example program, I had: tweets .pipe(map(englishOnly)) .pipe(map(addSentiment)) Then I found out that the API gave you the option to specify a language filter.All I had to do was drop one line of code.
  • 21.
    Functional Programming A morefunctional style of programming encourages the avoidance of side effects or state mutation. var fs     = require(“fs”), map   = require(“map-stream”); var readable = fs.createReadStream("foo.txt"), readable    .pipe(map(filterEnglish)) function filterEnglish(data, asyncReturn) {    if(data.language === “en”) { // write these data to the output stream       asyncReturn(null, data);    } else { // but don’t write these.       asyncReturn();    } } ๏ Single Responsibility Principle: "A function should do one thing, and do it well." ๏ Pure functions. No knowledge of the external world whatsoever. Every bit of information required for the running of the function is explicitly passed as paramter. ๏ Immutable data. A function returns a new data that captures the transformation rather than a reference to the old data. ๏ Higher Order Functions. Functions that return functions (partials, currying). A way to capture local