Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Azkaban	in	my	use	case
2017/03/09
@wyukawa
Workflow	Engines	Meetup	#1
#wfemeetup
https://connpass.com/event/50900/
Azkaban
• Implemented	at	LinkedIn	to	solve	the	problem	
of	Hadoop	job	dependencies
• Written	in	Java
– Not	modern	Java(raw...
Azkaban	feature
• Simple	Job	Management	Tool
– Define	job	dependency
– Retry
– Scheduling
– Web	UI
• See	dependency/execut...
Job	File
#	foo.job
type=command
command=echo	foo
retries=1
retry.backoff=300000
#	bar.job
type=command
dependencies=foo
co...
Job	History
Scheduling
Failure	Options
• Finish	Current	Running
– finishes	only	the	currently	running	job.	It	will	not	
start	any	new	jobs.
• Can...
Difference	when	ccc	failed
Why	“Finish	All	Possible”	is	not	
default?
Re-run	when	flow	failed
• User	can	execute	failed	jobs	only	if	user	
pushes	“prepare	execution”	button.	It’s	
convenient!
Concurrent	Execution	Options
• Skip	Execution
– Do	not	run	flow	if	it	is	already	running.
• Run	Concurrently
– Run	the	flo...
SLA		Notification
• If	duration	threshold	is	exceeded,	then	an	
alert	email	can	be	sent	or	the	flow	can	be	auto	
killed.
Flow	parameter
• can	set	parameter(for	example,	date)	when	
Azkaban	executes	flow
Q/A
https://connpass.com/event/50900/
○
△
❌
○
○
△
My	use	case
• Use	Azkaban	to	manage	hadoop job
– Write	batch	in	python
• Use	Azkaban	API
– I	created	client	https://github...
Python	batch	example
def validate_before(self):	
hive.exists("access_log",	"yyyymmdd='%s'"	%	(...))
def process(self):
ins...
Yaml example
foo:
type:	command
command:	echo	"foo”
retries:	1
retry.backoff:	300000
bar:
type:	command
command:	echo	"bar...
Job	Management	Overview
git push
push	button
upload	job
register	schedule
git pull
generate	job	file
Log	Analysis	Platform
Hadoop,	Hive	of	HDP2.5.3
Azkaban	3.15.0-
1-g77411d7
Presto	0.166
Cognos
Prestogres
Netezza
DBDB
ETL	...
My	usage	situation
• More	than	120	Azkaban	flows
• Many	daily	batches,	a	few	hourly,	weekly,	monthly	batches
• Most	flows	...
My	feeling
• Simple
• Easy	to	use
• Web	UI	is	convenient
• API	is	useful
• There	is	no	reason	to	replace	Azkaban
• I	hope	...
Podcast
• https://itunes.apple.com/jp/podcast/wyukaw
as-podcast/id1152456701
• http://wyukawa.tumblr.com/
Upcoming SlideShare
Loading in …5
×

Azkaban

5,181 views

Published on

Azkaban

Published in: Software
  • Be the first to comment

Azkaban

  1. 1. Azkaban in my use case 2017/03/09 @wyukawa Workflow Engines Meetup #1 #wfemeetup https://connpass.com/event/50900/
  2. 2. Azkaban • Implemented at LinkedIn to solve the problem of Hadoop job dependencies • Written in Java – Not modern Java(raw servlet, velocity,...)
  3. 3. Azkaban feature • Simple Job Management Tool – Define job dependency – Retry – Scheduling – Web UI • See dependency/execution time/log • Store log to db as blob – SPOF – Not register holiday – Not triggered by file creation event • Mail notification only – HTTP Job Callback • No binary – Need to build source • Not so active development • Mailing List doesn’t function very well
  4. 4. Job File # foo.job type=command command=echo foo retries=1 retry.backoff=300000 # bar.job type=command dependencies=foo command=echo bar
  5. 5. Job History
  6. 6. Scheduling
  7. 7. Failure Options • Finish Current Running – finishes only the currently running job. It will not start any new jobs. • Cancel All – immediately kills all jobs and fails the flow. • Finish All Possible – will keep executing jobs as long as its dependencies are met.
  8. 8. Difference when ccc failed
  9. 9. Why “Finish All Possible” is not default?
  10. 10. Re-run when flow failed • User can execute failed jobs only if user pushes “prepare execution” button. It’s convenient!
  11. 11. Concurrent Execution Options • Skip Execution – Do not run flow if it is already running. • Run Concurrently – Run the flow anyway. Previous execution is unaffected. • Pipeline
  12. 12. SLA Notification • If duration threshold is exceeded, then an alert email can be sent or the flow can be auto killed.
  13. 13. Flow parameter • can set parameter(for example, date) when Azkaban executes flow
  14. 14. Q/A https://connpass.com/event/50900/ ○ △ ❌ ○ ○ △
  15. 15. My use case • Use Azkaban to manage hadoop job – Write batch in python • Use Azkaban API – I created client https://github.com/wyukawa/eboshi – Commit scheduling information to GHE • Painful to write job file – I created generation tool https://github.com/wyukawa/ayd – generate 1 flow from 1 yaml file
  16. 16. Python batch example def validate_before(self): hive.exists("access_log", "yyyymmdd='%s'" % (...)) def process(self): insert_query = """ INSERT OVERWRITE TABLE aggregate PARTITION(yyyymmdd='%s') SELECT ... FROM access_log WHERE ... GROUP BY ... """ % (...) hiveCli.query(insert_query) def validate_after(self): hive.exists("aggregate", "yyyymmdd='%s'" % (...))
  17. 17. Yaml example foo: type: command command: echo "foo” retries: 1 retry.backoff: 300000 bar: type: command command: echo "bar” dependencies: foo retries: 1 retry.backoff: 300000
  18. 18. Job Management Overview git push push button upload job register schedule git pull generate job file
  19. 19. Log Analysis Platform Hadoop, Hive of HDP2.5.3 Azkaban 3.15.0- 1-g77411d7 Presto 0.166 Cognos Prestogres Netezza DBDB ETL with Python 2.7.13 InfiniDB Pentaho Saiku
  20. 20. My usage situation • More than 120 Azkaban flows • Many daily batches, a few hourly, weekly, monthly batches • Most flows are related to hive • There is the Azkaban in batch server • I prepare the template Azkaban flows to reaggregate past data – Set job name and date to parameter – Set Run Concurrently • I don’t use SLA but I may use in the future – https://github.com/azkaban/azkaban/pull/911 • I don’t use HTTP Job Callback – use hipchat in python ETL
  21. 21. My feeling • Simple • Easy to use • Web UI is convenient • API is useful • There is no reason to replace Azkaban • I hope development become active
  22. 22. Podcast • https://itunes.apple.com/jp/podcast/wyukaw as-podcast/id1152456701 • http://wyukawa.tumblr.com/

×