Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Hydra
Chris Birchall
2014/2/17
M3 Tech Talk #m3dev
What is it?
https://github.com/addthis/hydra
● Hadoop-style distrib processing
framework, optimised for trees
● The Big Id...
Components
● Spawn: Job control (+ UI)
○ (think JobTracker, in Hadoop-speak)

● Minion: task runner
○ (think TaskTracker)
...
Getting started (OSX)
# Prerequisites
brew install rabbitmq maven coreutils wget
# Check this works without a passphrase
s...
Getting started (2)
# Start local stack
hydra-uber/bin/local-stack.sh start
hydra-uber/bin/local-stack.sh start
# yes, twi...
Hello world
# Sample job definition file available at
hydra-uber/local/sample/self-gen-tree.json
# Click ‘Create’, copy-pa...
Analysing text files
# Tips:
## “files” source is broken. Use “mesh2”.
## Docs are out of date. Read the source
code!
# Me...
Conclusions
● If you have Small Data,
use grep, awk, sort, uniq
● If you have Big Data,
use Hadoop
● If you really like tr...
Upcoming SlideShare
Loading in …5
×

2

Share

Download to read offline

Hydra

Download to read offline

Hydra

  1. 1. Hydra Chris Birchall 2014/2/17 M3 Tech Talk #m3dev
  2. 2. What is it? https://github.com/addthis/hydra ● Hadoop-style distrib processing framework, optimised for trees ● The Big Idea: data processing = building and navigating tree data structures
  3. 3. Components ● Spawn: Job control (+ UI) ○ (think JobTracker, in Hadoop-speak) ● Minion: task runner ○ (think TaskTracker) ● QueryMaster + QueryWorker ● Meshy: Distrib filesystem ○ (think read-only HDFS) ● Zookeeper, RabbitMQ
  4. 4. Getting started (OSX) # Prerequisites brew install rabbitmq maven coreutils wget # Check this works without a passphrase ssh localhost # Check that the GNU coreutils cmds # (grm, gcp, gln, gmv) are on your PATH # Clone & build git clone https://github.com/addthis/hydra.git cd hydra mvn package
  5. 5. Getting started (2) # Start local stack hydra-uber/bin/local-stack.sh start hydra-uber/bin/local-stack.sh start # yes, twice! hydra-uber/bin/local-stack.sh seed # UI should now be running open http://localhost:5052
  6. 6. Hello world # Sample job definition file available at hydra-uber/local/sample/self-gen-tree.json # Click ‘Create’, copy-paste the job config, # save the job and click ‘Kick’ to run it. # Click the ‘Q’ button to open the query UI # and see the resulting data.
  7. 7. Analysing text files # Tips: ## “files” source is broken. Use “mesh2”. ## Docs are out of date. Read the source code! # Mesh filesystem root is here: hydra-local/streams/ # Here’s an example job config I used to parse some TSV-formatted Apache logs https://gist.github.com/cb372/9046464
  8. 8. Conclusions ● If you have Small Data, use grep, awk, sort, uniq ● If you have Big Data, use Hadoop ● If you really like trees, use Hydra ;)
  • SaunyaAmos

    Jan. 16, 2015
  • cvlorenzo

    Mar. 7, 2014

Views

Total views

2,532

On Slideshare

0

From embeds

0

Number of embeds

37

Actions

Downloads

9

Shares

0

Comments

0

Likes

2

×