Intro To Cascading

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    3 Favorites

    Intro To Cascading - Presentation Transcript

    1. Introduction to Cascading
    2. What is Cascading?
    3. What is Cascading? • abstraction over MapReduce
    4. What is Cascading? • abstraction over MapReduce • API for data-processing workflows
    5. Why use Cascading?
    6. Why use Cascading? • MapReduce can be:
    7. Why use Cascading? • MapReduce can be: • the wrong level of granularity
    8. Why use Cascading? • MapReduce can be: • the wrong level of granularity • cumbersome to chain together
    9. Why use Cascading?
    10. Why use Cascading? • Cascading helps create
    11. Why use Cascading? • Cascading helps create • higher-level data processing abstractions
    12. Why use Cascading? • Cascading helps create • higher-level data processing abstractions • sophisticated data pipelines
    13. Why use Cascading? • Cascading helps create • higher-level data processing abstractions • sophisticated data pipelines • reusable components
    14. Credits • Cascading written by Chris Wensel • based on his users guide http://bit.ly/cascading
    15. package cascadingtutorial.wordcount; /** * Wordcount example in Cascading */ public class Main { public static void main( String[] args ) { String inputPath = args[0]; String outputPath = args[1]; Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word")); Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); Flow parsedLogFlow = new FlowConnector(properties) .connect(sourceTap, sinkTap, wcPipe); parsedLogFlow.start(); parsedLogFlow.complete(); } }
    16. Data Model Pipes Filters
    17. Data Model file
    18. Data Model file
    19. Data Model file
    20. Data Model file
    21. Data Model file
    22. Data Model file
    23. Data Model file file file
    24. Data Model file file file Filters
    25. Tuple
    26. Pipe Assembly
    27. Pipe Assembly Tuple Stream
    28. Taps: sources & sinks
    29. Taps: sources & sinks Taps
    30. Taps: sources & sinks Taps
    31. Taps: sources & sinks Taps
    32. Taps: sources & sinks Taps Source
    33. Taps: sources & sinks Taps Source Sink
    34. Pipe + Taps = Flow Flow
    35. Flow Flow Flow Flow Flow
    36. n Flows Flow Flow Flow Flow Flow
    37. n Flows = Cascade Flow Flow Flow Flow Flow
    38. n Flows = Cascade Flow Flow Flow Flow Flow complex
    39. Example
    40. package cascadingtutorial.wordcount; /** * Wordcount example in Cascading */ public class Main { public static void main( String[] args ) { String inputPath = args[0]; String outputPath = args[1]; Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word")); Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); Flow parsedLogFlow = new FlowConnector(properties) .connect(sourceTap, sinkTap, wcPipe); parsedLogFlow.start(); parsedLogFlow.complete(); } }
    41. package cascadingtutorial.wordcount; /** * Wordcount example in Cascading */ public class Main { public static void main( String[] args ) { String inputPath = args[0]; String outputPath = args[1]; Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word")); Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); Flow parsedLogFlow = new FlowConnector(properties) .connect(sourceTap, sinkTap, wcPipe); parsedLogFlow.start(); parsedLogFlow.complete(); } }
    42. import cascading.tap.Tap; import cascading.tuple.Fields; import java.util.Properties; /** * Wordcount example in Cascading */ public class Main { public static void main( String[] args ) { Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    43. import cascading.tap.Tap; import cascading.tuple.Fields; import java.util.Properties; /** * Wordcount example in Cascading */ public class Main { public static void main( String[] args ) { Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    44. import cascading.tap.Tap; import cascading.tuple.Fields; import java.util.Properties; /** * Wordcount example in Cascading */ public class Main { public static void main( String[] args ) { Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); TextLine() Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    45. import cascading.tap.Tap; import cascading.tuple.Fields; import java.util.Properties; /** * Wordcount example in Cascading */ public class Main { public static void main( String[] args ) { Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); TextLine() Tap sinkTap SequenceFile() = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    46. import cascading.tap.Tap; import cascading.tuple.Fields; import java.util.Properties; /** * Wordcount example in Cascading */ public class Main { public static void main( String[] args ) { Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    47. /** * Wordcount example in Cascading */ public class Main { public static void main( String[] args ) { Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word"));
    48. /** * Wordcount example in Cascading */ public class Main { hdfs://master0:54310/user/nmurray/data.txt public static void main( String[] args ) { Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word"));
    49. /** * Wordcount example in Cascading */ public class Main { hdfs://master0:54310/user/nmurray/data.txt new Hfs() public static void main( String[] args ) { Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word"));
    50. /** * Wordcount example in Cascading */ public class Main { hdfs://master0:54310/user/nmurray/data.txt new Hfs() data/sources/obama-inaugural-address.txt public static void main( String[] args ) { Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word"));
    51. /** * Wordcount example in Cascading */ public class Main { hdfs://master0:54310/user/nmurray/data.txt new Hfs() data/sources/obama-inaugural-address.txt public static void main( String[] args ) new Lfs() { Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word"));
    52. /** * Wordcount example in Cascading */ public class Main { hdfs://master0:54310/user/nmurray/data.txt new Hfs() data/sources/obama-inaugural-address.txt public static void main( String[] args ) new Lfs() { Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), S3fs() new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word"));
    53. /** * Wordcount example in Cascading */ public class Main { hdfs://master0:54310/user/nmurray/data.txt new Hfs() data/sources/obama-inaugural-address.txt public static void main( String[] args ) new Lfs() { Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), S3fs() new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); GlobHfs() new Fields("count", "word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(),
    54. /** * Wordcount example in Cascading */ public class Main { public static void main( String[] args ) { Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word"));
    55. Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word")); Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); Flow parsedLogFlow = new FlowConnector(properties) .connect(sourceTap, sinkTap, wcPipe); parsedLogFlow.start(); parsedLogFlow.complete(); } }
    56. Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Pipe Assembly Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word")); Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); Flow parsedLogFlow = new FlowConnector(properties) .connect(sourceTap, sinkTap, wcPipe); parsedLogFlow.start(); parsedLogFlow.complete(); } }
    57. Pipe Assemblies
    58. Pipe Assemblies
    59. Pipe Assemblies • Define work against a Tuple Stream
    60. Pipe Assemblies • Define work against a Tuple Stream • May have multiple sources and sinks
    61. Pipe Assemblies • Define work against a Tuple Stream • May have multiple sources and sinks • Splits
    62. Pipe Assemblies • Define work against a Tuple Stream • May have multiple sources and sinks • Splits • Merges
    63. Pipe Assemblies • Define work against a Tuple Stream • May have multiple sources and sinks • Splits • Merges • Joins
    64. Pipe Assemblies
    65. Pipe Assemblies • Pipe
    66. Pipe Assemblies • Pipe • Each • GroupBy • CoGroup • Every • SubAssembly
    67. Pipe Assemblies • Pipe • Each • GroupBy • CoGroup • Every • SubAssembly
    68. Pipe Assemblies • Pipe Applies a • Each Function or Filter • GroupBy Operation to each • CoGroup Tuple • Every • SubAssembly
    69. Pipe Assemblies • Pipe • Each • GroupBy • CoGroup • Every • SubAssembly
    70. Pipe Assemblies • Pipe • Each • GroupBy • CoGroup • Every • SubAssembly
    71. Pipe Assemblies • Pipe • Each Group • GroupBy (& merge) • CoGroup • Every • SubAssembly
    72. Pipe Assemblies • Pipe • Each • GroupBy Joins • CoGroup (inner, outer, left, right) • Every • SubAssembly
    73. Pipe Assemblies • Pipe • Each • GroupBy • CoGroup • Every • SubAssembly
    74. Pipe Assemblies • Pipe • Each • GroupBy • CoGroup • Every Applies an Aggregator (count, sum) to every group of Tuples. • SubAssembly
    75. Pipe Assemblies • Pipe • Each • GroupBy • CoGroup • Every • SubAssembly Nesting
    76. Each vs. Every
    77. Each vs. Every • Each() is for individual Tuples
    78. Each vs. Every • Each() is for individual Tuples • Every() is for groups of Tuples
    79. Each new Each(previousPipe, argumentSelector, operation, outputSelector)
    80. Each new Each(previousPipe, argumentSelector, operation, outputSelector) Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    81. Each new Each(previousPipe, argumentSelector, operation, outputSelector) Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    82. Each new Each(previousPipe, argumentSelector, operation, outputSelector) Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    83. import cascading.tap.Tap; import cascading.tuple.Fields; import java.util.Properties; /** * Wordcount example in Cascading */ public class Main { public static void main( String[] args ) { Scheme inputScheme = new TextLine(new Fields("offset", "line")); Scheme outputScheme = new TextLine(); String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    84. Each new Each(previousPipe, argumentSelector, operation, outputSelector) Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    85. Each new Each(previousPipe, argumentSelector, operation, outputSelector) Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    86. Each new Each(previousPipe, argumentSelector, operation, outputSelector) Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    87. Each new Each(previousPipe, argumentSelector, operation, outputSelector) Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); Fields()
    88. Each offset line 0 the lazy brown fox Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    89. Each offset line 0 the lazy brown fox Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    90. Each X offset 0 line the lazy brown fox Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    91. Each X offset 0 line the lazy brown fox Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); word word word word the lazy brown fox
    92. Operations: Functions
    93. Operations: Functions • Insert()
    94. Operations: Functions • Insert() • RegexParser() / RegexGenerator() / RegexReplace()
    95. Operations: Functions • Insert() • RegexParser() / RegexGenerator() / RegexReplace() • DateParser() / DateFormatter()
    96. Operations: Functions • Insert() • RegexParser() / RegexGenerator() / RegexReplace() • DateParser() / DateFormatter() • XPathGenerator()
    97. Operations: Functions • Insert() • RegexParser() / RegexGenerator() / RegexReplace() • DateParser() / DateFormatter() • XPathGenerator() • Identity()
    98. Operations: Functions • Insert() • RegexParser() / RegexGenerator() / RegexReplace() • DateParser() / DateFormatter() • XPathGenerator() • Identity() • etc...
    99. Operations: Filters
    100. Operations: Filters • RegexFilter()
    101. Operations: Filters • RegexFilter() • FilterNull()
    102. Operations: Filters • RegexFilter() • FilterNull() • And() / Or() / Not() / Xor()
    103. Operations: Filters • RegexFilter() • FilterNull() • And() / Or() / Not() / Xor() • ExpressionFilter()
    104. Operations: Filters • RegexFilter() • FilterNull() • And() / Or() / Not() / Xor() • ExpressionFilter() • etc...
    105. Each new Each(previousPipe, argumentSelector, operation, outputSelector) Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    106. Each Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word"));
    107. String inputPath = args[0]; String outputPath = args[1]; Tap sourceTap = inputPath.matches( "^[^:]+://.*") ? new Hfs(inputScheme, inputPath) : new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word")); Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); Flow parsedLogFlow = new FlowConnector(properties) .connect(sourceTap, sinkTap, wcPipe); parsedLogFlow.start(); parsedLogFlow.complete(); } }
    108. new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word")); Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); Flow parsedLogFlow = new FlowConnector(properties) .connect(sourceTap, sinkTap, wcPipe); parsedLogFlow.start(); parsedLogFlow.complete(); } }
    109. new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word")); Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); Flow parsedLogFlow = new FlowConnector(properties) .connect(sourceTap, sinkTap, wcPipe); parsedLogFlow.start(); parsedLogFlow.complete(); } }
    110. new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word")); Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); Flow parsedLogFlow = new FlowConnector(properties) .connect(sourceTap, sinkTap, wcPipe); parsedLogFlow.start(); parsedLogFlow.complete(); } }
    111. Every new Every(previousPipe, argumentSelector, operation, outputSelector)
    112. Every new Every(previousPipe, argumentSelector, operation, outputSelector) new Every(wcPipe, new Count(), new Fields("count", "word"));
    113. new Every(wcPipe, new Count(), new Fields("count", "word"));
    114. Operations: Aggregator
    115. Operations: Aggregator • Average()
    116. Operations: Aggregator • Average() • Count()
    117. Operations: Aggregator • Average() • Count() • First() / Last()
    118. Operations: Aggregator • Average() • Count() • First() / Last() • Min() / Max()
    119. Operations: Aggregator • Average() • Count() • First() / Last() • Min() / Max() • Sum()
    120. Every new Every(previousPipe, argumentSelector, operation, outputSelector)
    121. Every new Every(previousPipe, argumentSelector, operation, outputSelector) new Every(wcPipe, new Count(), new Fields("count", "word"));
    122. Every new Every(previousPipe, argumentSelector, operation, outputSelector) new Every(wcPipe, new Count(), new Fields("count", "word")); new Every(wcPipe, Fields.ALL, new Count(), new Fields("count", "word"));
    123. Field Selection
    124. Predefined Field Sets
    125. Predefined Field Sets Fields.ALL all available fields Fields.GROUP fields used for last grouping Fields.VALUES fields not used for last grouping fields of argument Tuple Fields.ARGS (for Operations) replaces input with Operation result Fields.RESULTS (for Pipes)
    126. Predefined Field Sets Fields.ALL all available fields Fields.GROUP fields used for last grouping Fields.VALUES fields not used for last grouping fields of argument Tuple Fields.ARGS (for Operations) replaces input with Operation result Fields.RESULTS (for Pipes)
    127. Predefined Field Sets Fields.ALL all available fields Fields.GROUP fields used for last grouping Fields.VALUES fields not used for last grouping fields of argument Tuple Fields.ARGS (for Operations) replaces input with Operation result Fields.RESULTS (for Pipes)
    128. Predefined Field Sets Fields.ALL all available fields Fields.GROUP fields used for last grouping Fields.VALUES fields not used for last grouping fields of argument Tuple Fields.ARGS (for Operations) replaces input with Operation result Fields.RESULTS (for Pipes)
    129. Predefined Field Sets Fields.ALL all available fields Fields.GROUP fields used for last grouping Fields.VALUES fields not used for last grouping fields of argument Tuple Fields.ARGS (for Operations) replaces input with Operation result Fields.RESULTS (for Pipes)
    130. Field Selection new Each(previousPipe, argumentSelector, operation, outputSelector) pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("timestamp"));
    131. Field Selection new Each(previousPipe, argumentSelector, operation, outputSelector) Operation Input Tuple = Original Tuple Argument Selector
    132. Field Selection pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id"));
    133. Field Selection Original Tuple visitor_ search_ time page_ id id term stamp number pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id"));
    134. Field Selection Original Tuple Argument Selector visitor_ search_ time page_ id timestamp id term stamp number pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id"));
    135. Field Selection Original Tuple Argument Selector visitor_ search_ time page_ id timestamp id term stamp number pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id"));
    136. Field Selection Original Tuple Argument Selector visitor_ search_ time page_ id timestamp id term stamp number pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id")); Input to DateParser will be: timestamp
    137. Field Selection new Each(previousPipe, argumentSelector, operation, outputSelector) Output Tuple = Original Tuple ⊕ Operation Tuple Output Selector
    138. Field Selection pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id"));
    139. Field Selection Original Tuple visitor_ search_ time page_ id id term stamp number pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id"));
    140. Field Selection Original Tuple id visitor_ id search_ term time stamp page_ number ⊕ pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id"));
    141. Field Selection Original Tuple DateParser Output id visitor_ id search_ term time stamp page_ number ⊕ ts pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id"));
    142. Field Selection Original Tuple DateParser Output Output Selector id visitor_ id search_ term time stamp page_ number ⊕ ts ts search_ term visitor_ id pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id"));
    143. Field Selection Original Tuple DateParser Output Output Selector id visitor_ id search_ term time stamp page_ number ⊕ ts ts search_ term visitor_ id pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id"));
    144. Field Selection Original Tuple DateParser Output Output Selector id visitor_ id search_ term time stamp page_ number ⊕ ts ts search_ term visitor_ id pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id"));
    145. Field Selection Original Tuple DateParser Output Output Selector id visitor_ id search_ term time stamp page_ number ⊕ ts ts search_ term visitor_ id pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id")); Output of Each will be: search_ visitor_ ts term id
    146. Field Selection Original Tuple DateParser Output Output Selector id visitor_ id search_ term time stamp page_ number ⊕ ts ts search_ term visitor_ id pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id")); Output of Each will be: search_ visitor_ ts term id
    147. Field Selection Original Tuple DateParser Output Output Selector id visitor_ id search_ term time stamp page_ number ⊕ ts ts search_ term visitor_ id pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id")); Output of Each will be: search_ visitor_ ts term id
    148. Field Selection Original Tuple DateParser Output Output Selector id visitor_ id search_ term time stamp page_ number ⊕ ts ts search_ term visitor_ id pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id")); Output of Each will be: search_ visitor_ ts term id
    149. Field Selection Original Tuple DateParser Output Output Selector X id visitor_ id search_ term XX ⊕ time stamp page_ number ts ts search_ term visitor_ id pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), new Fields("ts", "search_term", "visitor_id")); Output of Each will be: search_ visitor_ ts term id
    150. word hello word hello word world new Every(wcPipe, new Count(), new Fields("count", "word"));
    151. word hello word hello word world new Every(wcPipe, new Count(), new Fields("count", "word")); count word 2 hello count word 1 world
    152. new Lfs(inputScheme, inputPath); Tap sinkTap = outputPath.matches("^[^:]+://.*") ? new Hfs(outputScheme, outputPath) : new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word")); Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); Flow parsedLogFlow = new FlowConnector(properties) .connect(sourceTap, sinkTap, wcPipe); parsedLogFlow.start(); parsedLogFlow.complete(); } }
    153. new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word")); Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); Flow parsedLogFlow = new FlowConnector(properties) .connect(sourceTap, sinkTap, wcPipe); parsedLogFlow.start(); parsedLogFlow.complete(); } }
    154. new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word")); Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); Flow parsedLogFlow = new FlowConnector(properties) .connect(sourceTap, sinkTap, wcPipe); parsedLogFlow.start(); parsedLogFlow.complete(); } }
    155. new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word")); Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); Flow parsedLogFlow = new FlowConnector(properties) .connect(sourceTap, sinkTap, wcPipe); parsedLogFlow.start(); parsedLogFlow.complete(); } }
    156. new Lfs(outputScheme, outputPath); Pipe wcPipe = new Each("wordcount", new Fields("line"), new RegexSplitGenerator(new Fields("word"), "s+"), new Fields("word")); wcPipe = new GroupBy(wcPipe, new Fields("word")); wcPipe = new Every(wcPipe, new Count(), new Fields("count", "word")); Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); Flow parsedLogFlow = new FlowConnector(properties) .connect(sourceTap, sinkTap, wcPipe); parsedLogFlow.start(); parsedLogFlow.complete(); } }
    157. Run
    158. input: mary had a little lamb little lamb little lamb mary had a little lamb whose fleece was white as snow
    159. input: mary had a little lamb little lamb little lamb mary had a little lamb whose fleece was white as snow command: hadoop jar ./target/cascading-tutorial-1.0.0.jar data/sources/misc/mary.txt data/output
    160. output: 2 a 1 as 1 fleece 2 had 4 lamb 4 little 2 mary 1 snow 1 was 1 white 1 whose
    161. What’s next?
    162. apply operations to tuple streams
    163. apply operations to tuple streams repeat.
    164. Cookbook
    165. Cookbook // remove all tuples where search_term is '-' pipe = new Each(pipe, new Fields("search_term"), new RegexFilter("^(-)$", true));
    166. Cookbook // convert "timestamp" String field to "ts" long field pipe = new Each(pipe, new Fields("timestamp"), new DateParser("yyyy-MM-dd HH:mm:ss"), Fields.ALL);
    167. Cookbook // td - timestamp of day-level granularity pipe = new Each(pipe, new Fields("ts"), new ExpressionFunction( new Fields("td"), "ts - (ts % (24 * 60 * 60 * 1000))", long.class), Fields.ALL);
    168. Cookbook // SELECT DISTINCT visitor_id // GROUP BY visitor_id ORDER BY ts pipe = new GroupBy(pipe, new Fields("visitor_id"), new Fields("ts")); // take the First() tuple of every grouping pipe = new Every(pipe, Fields.ALL, new First(), Fields.RESULTS);
    169. Custom Operations
    170. Desired Operation Pipe pipe = new Each("ngram pipe", new Fields("line"), new NGramTokenizer(2), new Fields("ngram"));
    171. Desired Operation line mary had a little lamb Pipe pipe = new Each("ngram pipe", new Fields("line"), new NGramTokenizer(2), new Fields("ngram"));
    172. Desired Operation line mary had a little lamb Pipe pipe = new Each("ngram pipe", new Fields("line"), new NGramTokenizer(2), new Fields("ngram")); ngram mary had
    173. Desired Operation line mary had a little lamb Pipe pipe = new Each("ngram pipe", new Fields("line"), new NGramTokenizer(2), new Fields("ngram")); ngram ngram mary had had a
    174. Desired Operation line mary had a little lamb Pipe pipe = new Each("ngram pipe", new Fields("line"), new NGramTokenizer(2), new Fields("ngram")); ngram ngram ngram mary had had a a little
    175. Desired Operation line mary had a little lamb Pipe pipe = new Each("ngram pipe", new Fields("line"), new NGramTokenizer(2), new Fields("ngram")); ngram ngram ngram ngram mary had had a a little little lamb
    176. Custom Operations
    177. Custom Operations • extend BaseOperation
    178. Custom Operations • extend BaseOperation • implement Function (or Filter)
    179. Custom Operations • extend BaseOperation • implement Function (or Filter) • implement operate
    180. public class NGramTokenizer extends BaseOperation implements Function { private int size; public NGramTokenizer(int size) { super(new Fields("ngram")); this.size = size; } public void operate(FlowProcess flowProcess, FunctionCall functionCall) { String token; TupleEntry arguments = functionCall.getArguments(); com.attinteractive.util.NGramTokenizer t = new com.attinteractive.util.NGramTokenizer(arguments.getString(0), this.size); while((token = t.next()) != null) { TupleEntry result = new TupleEntry(new Fields("ngram"), new Tuple(token)); functionCall.getOutputCollector().add(result); } } }
    181. public class NGramTokenizer extends BaseOperation implements Function { private int size; public NGramTokenizer(int size) { super(new Fields("ngram")); this.size = size; } public void operate(FlowProcess flowProcess, FunctionCall functionCall) { String token; TupleEntry arguments = functionCall.getArguments(); com.attinteractive.util.NGramTokenizer t = new com.attinteractive.util.NGramTokenizer(arguments.getString(0), this.size); while((token = t.next()) != null) { TupleEntry result = new TupleEntry(new Fields("ngram"), new Tuple(token)); functionCall.getOutputCollector().add(result); } } }
    182. public class NGramTokenizer extends BaseOperation implements Function { private int size; public NGramTokenizer(int size) { super(new Fields("ngram")); this.size = size; } public void operate(FlowProcess flowProcess, FunctionCall functionCall) { String token; TupleEntry arguments = functionCall.getArguments(); com.attinteractive.util.NGramTokenizer t = new com.attinteractive.util.NGramTokenizer(arguments.getString(0), this.size); while((token = t.next()) != null) { TupleEntry result = new TupleEntry(new Fields("ngram"), new Tuple(token)); functionCall.getOutputCollector().add(result); } } }
    183. public class NGramTokenizer extends BaseOperation implements Function { private int size; public NGramTokenizer(int size) { super(new Fields("ngram")); this.size = size; } public void operate(FlowProcess flowProcess, FunctionCall functionCall) { String token; TupleEntry arguments = functionCall.getArguments(); com.attinteractive.util.NGramTokenizer t = new com.attinteractive.util.NGramTokenizer(arguments.getString(0), this.size); while((token = t.next()) != null) { TupleEntry result = new TupleEntry(new Fields("ngram"), new Tuple(token)); functionCall.getOutputCollector().add(result); } } }
    184. public class NGramTokenizer extends BaseOperation implements Function { private int size; public NGramTokenizer(int size) { super(new Fields("ngram")); this.size = size; } public void operate(FlowProcess flowProcess, FunctionCall functionCall) { String token; TupleEntry arguments = functionCall.getArguments(); com.attinteractive.util.NGramTokenizer t = new com.attinteractive.util.NGramTokenizer(arguments.getString(0), this.size); while((token = t.next()) != null) { TupleEntry result = new TupleEntry(new Fields("ngram"), new Tuple(token)); functionCall.getOutputCollector().add(result); } } }
    185. public class NGramTokenizer extends BaseOperation implements Function { private int size; public NGramTokenizer(int size) { super(new Fields("ngram")); this.size = size; } public void operate(FlowProcess flowProcess, FunctionCall functionCall) { String token; TupleEntry arguments = functionCall.getArguments(); com.attinteractive.util.NGramTokenizer t = new com.attinteractive.util.NGramTokenizer(arguments.getString(0), this.size); while((token = t.next()) != null) { TupleEntry result = new TupleEntry(new Fields("ngram"), new Tuple(token)); functionCall.getOutputCollector().add(result); } } }
    186. public class NGramTokenizer extends BaseOperation implements Function { private int size; public NGramTokenizer(int size) { super(new Fields("ngram")); this.size = size; } public void operate(FlowProcess flowProcess, FunctionCall functionCall) { String token; TupleEntry arguments = functionCall.getArguments(); com.attinteractive.util.NGramTokenizer t = new com.attinteractive.util.NGramTokenizer(arguments.getString(0), this.size); while((token = t.next()) != null) { TupleEntry result = new TupleEntry(new Fields("ngram"), new Tuple(token)); functionCall.getOutputCollector().add(result); } } }
    187. public class NGramTokenizer extends BaseOperation implements Function { private int size; public NGramTokenizer(int size) { super(new Fields("ngram")); this.size = size; } public void operate(FlowProcess flowProcess, FunctionCall functionCall) { String token; TupleEntry arguments = functionCall.getArguments(); com.attinteractive.util.NGramTokenizer t = new com.attinteractive.util.NGramTokenizer(arguments.getString(0), this.size); while((token = t.next()) != null) { TupleEntry result = new TupleEntry(new Fields("ngram"), new Tuple(token)); functionCall.getOutputCollector().add(result); } } }
    188. Tuple 123 vis-abc tacos 2009-10-08 03:21:23 1
    189. Tuple 123 vis-abc tacos 2009-10-08 03:21:23 1 zero-indexed, ordered values
    190. Tuple 123 vis-abc tacos 2009-10-08 03:21:23 1 Fields id visitor_ search_ id term time stamp page_ number list of strings representing field names
    191. Tuple 123 vis-abc tacos 2009-10-08 03:21:23 1 Fields id visitor_ search_ id term time stamp page_ number
    192. TupleEntry visitor_ search_ time page_ id id term stamp number 123 vis-abc tacos 2009-10-08 03:21:23 1
    193. public class NGramTokenizer extends BaseOperation implements Function { private int size; public NGramTokenizer(int size) { super(new Fields("ngram")); this.size = size; } public void operate(FlowProcess flowProcess, FunctionCall functionCall) { String token; TupleEntry arguments = functionCall.getArguments(); com.attinteractive.util.NGramTokenizer t = new com.attinteractive.util.NGramTokenizer(arguments.getString(0), this.size); while((token = t.next()) != null) { TupleEntry result = new TupleEntry(new Fields("ngram"), new Tuple(token)); functionCall.getOutputCollector().add(result); } } }
    194. public class NGramTokenizer extends BaseOperation implements Function { private int size; public NGramTokenizer(int size) { super(new Fields("ngram")); this.size = size; } public void operate(FlowProcess flowProcess, FunctionCall functionCall) { String token; TupleEntry arguments = functionCall.getArguments(); com.attinteractive.util.NGramTokenizer t = new com.attinteractive.util.NGramTokenizer(arguments.getString(0), this.size); while((token = t.next()) != null) { TupleEntry result = new TupleEntry(new Fields("ngram"), new Tuple(token)); functionCall.getOutputCollector().add(result); } } }
    195. public class NGramTokenizer extends BaseOperation implements Function { private int size; public NGramTokenizer(int size) { super(new Fields("ngram")); this.size = size; } public void operate(FlowProcess flowProcess, FunctionCall functionCall) { String token; TupleEntry arguments = functionCall.getArguments(); com.attinteractive.util.NGramTokenizer t = new com.attinteractive.util.NGramTokenizer(arguments.getString(0), this.size); while((token = t.next()) != null) { TupleEntry result = new TupleEntry(new Fields("ngram"), new Tuple(token)); functionCall.getOutputCollector().add(result); } } }
    196. Success line mary had a little lamb Pipe pipe = new Each("ngram pipe", new Fields("line"), new NGramTokenizer(2), new Fields("ngram")); ngram ngram ngram ngram mary had had a a little little lamb
    197. code examples
    198. dot
    199. dot Flow importFlow = flowConnector.connect(sourceTap, tmpIntermediateTap, importPipe); importFlow.writeDOT( "import-flow.dot" );
    200. What next?
    201. What next?
    202. What next? • http://www.cascading.org/
    203. What next? • http://www.cascading.org/ • http://bit.ly/cascading
    204. Questions? <nmurray@att.com>
    SlideShare Zeitgeist 2009

    + Nate MurrayNate Murray Nominate

    custom

    125 views, 3 favs, 1 embeds more stats

    A practical introduction to Chris Wensel's Cascadin more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 125
      • 102 on SlideShare
      • 23 from embeds
    • Comments 0
    • Favorites 3
    • Downloads 3
    Most viewed embeds
    • 23 views on http://www.xcombinator.com

    more

    All embeds
    • 23 views on http://www.xcombinator.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories