SlideShare a Scribd company logo
1 of 9
Azkaban and Pig Richard Park, Russell Jurney LinkedIn Search, Network, Analytics
Installing & Running Azkaban wgethttp://github.com/downloads/azkaban/azkaban/azkaban-0.04.tar.gz tar –xvzf azkaban-0.04.tar.gz mkdir /some-dir/azkaban-jobs cd azkaban-0.04 bin/azkaban-server.sh –job-dir /some-dir/azkaban-jobs
Azkaban @ localhost:8080
Pig Configuration myproject.properties – Global Configuration hadoop.job.ugi=rjurney,hadoop udf.import.list=org.apache.pig.builtin.,com.linkedin.pig.,com.linkedin… cc_0_compute_title_counts.job – Pig Job type=pig pig.script=cc_0_compute_title_counts.pig cc_1_reverse_engineer_durak_3000 – Pig Job with Dependency type=pig pig.script=cc_1_reverse_engineer_durak_3000.pig dependencies=cc_0_compute_title_counts
Running Job > bin/run-jobs.sh –job-dir /some-dir/azkaban-jobs my-job
Scheduling Jobs
Viewing Jobs
Editing Jobs
Azkaban Pig Docs

More Related Content

What's hot

Kapil Thangavelu - Cloud Custodian
Kapil Thangavelu - Cloud CustodianKapil Thangavelu - Cloud Custodian
Kapil Thangavelu - Cloud CustodianServerlessConf
 
Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...
Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...
Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...ServerlessConf
 
Ben Kehoe - Serverless Architecture for the Internet of Things
Ben Kehoe - Serverless Architecture for the Internet of ThingsBen Kehoe - Serverless Architecture for the Internet of Things
Ben Kehoe - Serverless Architecture for the Internet of ThingsServerlessConf
 
Era of server less computing
Era of server less computingEra of server less computing
Era of server less computingBaskar rao Dsn
 
Akka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsAkka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsNLJUG
 
Apache Zeppelin & Cluster
Apache Zeppelin & ClusterApache Zeppelin & Cluster
Apache Zeppelin & ClusterJongyoul Lee
 
E2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyE2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyRikin Tanna
 
Reacting to the Isomorphic Buzz
Reacting to the Isomorphic BuzzReacting to the Isomorphic Buzz
Reacting to the Isomorphic BuzzBruce Coddington
 
Going Headless with Craft CMS 3.3
Going Headless with Craft CMS 3.3Going Headless with Craft CMS 3.3
Going Headless with Craft CMS 3.3JustinHolt20
 
Era of server less computing final
Era of server less computing finalEra of server less computing final
Era of server less computing finalBaskar rao Dsn
 
Serverless Architecture Patterns - Manoj Ganapathi - Serverless Summit
Serverless Architecture Patterns - Manoj Ganapathi - Serverless SummitServerless Architecture Patterns - Manoj Ganapathi - Serverless Summit
Serverless Architecture Patterns - Manoj Ganapathi - Serverless SummitCodeOps Technologies LLP
 
Infrastructure as code
Infrastructure as codeInfrastructure as code
Infrastructure as codeNaseath Saly
 
Monitoring Open Source Databases with Icinga
Monitoring Open Source Databases with IcingaMonitoring Open Source Databases with Icinga
Monitoring Open Source Databases with IcingaIcinga
 
Ryan Brown - Open Community
Ryan Brown - Open CommunityRyan Brown - Open Community
Ryan Brown - Open CommunityServerlessConf
 
How Serverless Changes DevOps
How Serverless Changes DevOpsHow Serverless Changes DevOps
How Serverless Changes DevOpsRichard Donkin
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionLecole Cole
 
R2DBC - Good Enough for Production?
R2DBC - Good Enough for Production?R2DBC - Good Enough for Production?
R2DBC - Good Enough for Production?Olexandra Dmytrenko
 

What's hot (20)

Kapil Thangavelu - Cloud Custodian
Kapil Thangavelu - Cloud CustodianKapil Thangavelu - Cloud Custodian
Kapil Thangavelu - Cloud Custodian
 
Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...
Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...
Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...
 
Ben Kehoe - Serverless Architecture for the Internet of Things
Ben Kehoe - Serverless Architecture for the Internet of ThingsBen Kehoe - Serverless Architecture for the Internet of Things
Ben Kehoe - Serverless Architecture for the Internet of Things
 
Serverless by examples and case studies
Serverless by examples and case studiesServerless by examples and case studies
Serverless by examples and case studies
 
Era of server less computing
Era of server less computingEra of server less computing
Era of server less computing
 
Akka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsAkka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based Applications
 
Capybara
CapybaraCapybara
Capybara
 
Apache Zeppelin & Cluster
Apache Zeppelin & ClusterApache Zeppelin & Cluster
Apache Zeppelin & Cluster
 
E2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyE2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/Livy
 
Reacting to the Isomorphic Buzz
Reacting to the Isomorphic BuzzReacting to the Isomorphic Buzz
Reacting to the Isomorphic Buzz
 
Going Headless with Craft CMS 3.3
Going Headless with Craft CMS 3.3Going Headless with Craft CMS 3.3
Going Headless with Craft CMS 3.3
 
Serverless Summit - Quiz
Serverless Summit - QuizServerless Summit - Quiz
Serverless Summit - Quiz
 
Era of server less computing final
Era of server less computing finalEra of server less computing final
Era of server less computing final
 
Serverless Architecture Patterns - Manoj Ganapathi - Serverless Summit
Serverless Architecture Patterns - Manoj Ganapathi - Serverless SummitServerless Architecture Patterns - Manoj Ganapathi - Serverless Summit
Serverless Architecture Patterns - Manoj Ganapathi - Serverless Summit
 
Infrastructure as code
Infrastructure as codeInfrastructure as code
Infrastructure as code
 
Monitoring Open Source Databases with Icinga
Monitoring Open Source Databases with IcingaMonitoring Open Source Databases with Icinga
Monitoring Open Source Databases with Icinga
 
Ryan Brown - Open Community
Ryan Brown - Open CommunityRyan Brown - Open Community
Ryan Brown - Open Community
 
How Serverless Changes DevOps
How Serverless Changes DevOpsHow Serverless Changes DevOps
How Serverless Changes DevOps
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Edition
 
R2DBC - Good Enough for Production?
R2DBC - Good Enough for Production?R2DBC - Good Enough for Production?
R2DBC - Good Enough for Production?
 

Viewers also liked

Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG
 
Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkabandatamantra
 
Архитектура продукта Thumbtack RTB Bidder
Архитектура продукта Thumbtack RTB BidderАрхитектура продукта Thumbtack RTB Bidder
Архитектура продукта Thumbtack RTB BidderAnatoliy Nikulin
 
Vaadin thinking of u and i. Или как писать Rich Internet Applications, в стар...
Vaadin thinking of u and i. Или как писать Rich Internet Applications, в стар...Vaadin thinking of u and i. Или как писать Rich Internet Applications, в стар...
Vaadin thinking of u and i. Или как писать Rich Internet Applications, в стар...Anatoliy Nikulin
 
Куда мы катимся. Анализ многолетних наблюдений омской ИТ отрасли в пяти минутах
Куда мы катимся. Анализ многолетних наблюдений омской ИТ отрасли  в пяти минутахКуда мы катимся. Анализ многолетних наблюдений омской ИТ отрасли  в пяти минутах
Куда мы катимся. Анализ многолетних наблюдений омской ИТ отрасли в пяти минутахAnatoliy Nikulin
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanDataWorks Summit
 
Конференция Юкон. Процессинг данных на лямбда архитектуре.
Конференция Юкон. Процессинг данных на лямбда архитектуре.Конференция Юкон. Процессинг данных на лямбда архитектуре.
Конференция Юкон. Процессинг данных на лямбда архитектуре.Anatoliy Nikulin
 
NoSQL thumbtack experience, Анатолий Никулин
NoSQL thumbtack experience, Анатолий НикулинNoSQL thumbtack experience, Анатолий Никулин
NoSQL thumbtack experience, Анатолий НикулинAnatoliy Nikulin
 
俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkabanhdhappy001
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentationVlad Orlov
 
Luigi Presentation at OSCON 2013
Luigi Presentation at OSCON 2013Luigi Presentation at OSCON 2013
Luigi Presentation at OSCON 2013Erik Bernhardsson
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...David Chen
 
Networks All Around Us: Extracting networks from your problem domain
Networks All Around Us: Extracting networks from your problem domainNetworks All Around Us: Extracting networks from your problem domain
Networks All Around Us: Extracting networks from your problem domainRussell Jurney
 
Networks All Around Us: Extracting networks from your problem domain
Networks All Around Us: Extracting networks from your problem domainNetworks All Around Us: Extracting networks from your problem domain
Networks All Around Us: Extracting networks from your problem domainRussell Jurney
 
Social Network Analysis in Your Problem Domain
Social Network Analysis in Your Problem DomainSocial Network Analysis in Your Problem Domain
Social Network Analysis in Your Problem DomainRussell Jurney
 

Viewers also liked (20)

Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkaban
 
Hive vs Pig
Hive vs PigHive vs Pig
Hive vs Pig
 
Архитектура продукта Thumbtack RTB Bidder
Архитектура продукта Thumbtack RTB BidderАрхитектура продукта Thumbtack RTB Bidder
Архитектура продукта Thumbtack RTB Bidder
 
Vaadin thinking of u and i. Или как писать Rich Internet Applications, в стар...
Vaadin thinking of u and i. Или как писать Rich Internet Applications, в стар...Vaadin thinking of u and i. Или как писать Rich Internet Applications, в стар...
Vaadin thinking of u and i. Или как писать Rich Internet Applications, в стар...
 
Куда мы катимся. Анализ многолетних наблюдений омской ИТ отрасли в пяти минутах
Куда мы катимся. Анализ многолетних наблюдений омской ИТ отрасли  в пяти минутахКуда мы катимся. Анализ многолетних наблюдений омской ИТ отрасли  в пяти минутах
Куда мы катимся. Анализ многолетних наблюдений омской ИТ отрасли в пяти минутах
 
HBase inside
HBase insideHBase inside
HBase inside
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
 
Конференция Юкон. Процессинг данных на лямбда архитектуре.
Конференция Юкон. Процессинг данных на лямбда архитектуре.Конференция Юкон. Процессинг данных на лямбда архитектуре.
Конференция Юкон. Процессинг данных на лямбда архитектуре.
 
NoSQL thumbtack experience, Анатолий Никулин
NoSQL thumbtack experience, Анатолий НикулинNoSQL thumbtack experience, Анатолий Никулин
NoSQL thumbtack experience, Анатолий Никулин
 
俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban
 
Quartz
QuartzQuartz
Quartz
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
 
Luigi future
Luigi futureLuigi future
Luigi future
 
Luigi Presentation at OSCON 2013
Luigi Presentation at OSCON 2013Luigi Presentation at OSCON 2013
Luigi Presentation at OSCON 2013
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
 
Networks All Around Us: Extracting networks from your problem domain
Networks All Around Us: Extracting networks from your problem domainNetworks All Around Us: Extracting networks from your problem domain
Networks All Around Us: Extracting networks from your problem domain
 
Networks All Around Us: Extracting networks from your problem domain
Networks All Around Us: Extracting networks from your problem domainNetworks All Around Us: Extracting networks from your problem domain
Networks All Around Us: Extracting networks from your problem domain
 
Social Network Analysis in Your Problem Domain
Social Network Analysis in Your Problem DomainSocial Network Analysis in Your Problem Domain
Social Network Analysis in Your Problem Domain
 

Similar to Azkaban and Pig at LinkedIn

Scaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web ServicesScaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web ServicesAndrew Turner
 
ChefConf 2012 Spiceweasel
ChefConf 2012 SpiceweaselChefConf 2012 Spiceweasel
ChefConf 2012 SpiceweaselMatt Ray
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
konfigurasi freeradius + daloradius in debian 9
konfigurasi freeradius + daloradius in debian 9konfigurasi freeradius + daloradius in debian 9
konfigurasi freeradius + daloradius in debian 9Walid Umar
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache PigAvkash Chauhan
 
Introduction to cloudforecast
Introduction to cloudforecastIntroduction to cloudforecast
Introduction to cloudforecastMasahiro Nagano
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark StreamingGerard Maas
 
Solving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with RailsSolving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with Railsfreelancing_god
 
Puppet at Pinterest
Puppet at PinterestPuppet at Pinterest
Puppet at PinterestPuppet
 
Rancher最速セットアップ理論 プロジェクトr to the next stage
Rancher最速セットアップ理論 プロジェクトr to the next stageRancher最速セットアップ理論 プロジェクトr to the next stage
Rancher最速セットアップ理論 プロジェクトr to the next stagenmrmsys
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark Aakashdata
 
Unlocked Nov 2013: Main Slide Pack
Unlocked Nov 2013: Main Slide PackUnlocked Nov 2013: Main Slide Pack
Unlocked Nov 2013: Main Slide PackRackspace Academy
 
10 things I learned building Nomad packs
10 things I learned building Nomad packs10 things I learned building Nomad packs
10 things I learned building Nomad packsBram Vogelaar
 
[FrontDays'2017] Леонид Блохин (Big Data Engineer): Мист. Сервис для работы с...
[FrontDays'2017] Леонид Блохин (Big Data Engineer): Мист. Сервис для работы с...[FrontDays'2017] Леонид Блохин (Big Data Engineer): Мист. Сервис для работы с...
[FrontDays'2017] Леонид Блохин (Big Data Engineer): Мист. Сервис для работы с...Provectus
 
DistributingSoftwareKnowledgeForDevOps
DistributingSoftwareKnowledgeForDevOpsDistributingSoftwareKnowledgeForDevOps
DistributingSoftwareKnowledgeForDevOpsPaul Worrall
 

Similar to Azkaban and Pig at LinkedIn (20)

Scaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web ServicesScaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web Services
 
ChefConf 2012 Spiceweasel
ChefConf 2012 SpiceweaselChefConf 2012 Spiceweasel
ChefConf 2012 Spiceweasel
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
konfigurasi freeradius + daloradius in debian 9
konfigurasi freeradius + daloradius in debian 9konfigurasi freeradius + daloradius in debian 9
konfigurasi freeradius + daloradius in debian 9
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
Introduction to cloudforecast
Introduction to cloudforecastIntroduction to cloudforecast
Introduction to cloudforecast
 
infra-as-code
infra-as-codeinfra-as-code
infra-as-code
 
PYSPARK PROGRAMMING.pdf
PYSPARK PROGRAMMING.pdfPYSPARK PROGRAMMING.pdf
PYSPARK PROGRAMMING.pdf
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark Streaming
 
Solving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with RailsSolving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with Rails
 
Puppet at Pinterest
Puppet at PinterestPuppet at Pinterest
Puppet at Pinterest
 
Rancher最速セットアップ理論 プロジェクトr to the next stage
Rancher最速セットアップ理論 プロジェクトr to the next stageRancher最速セットアップ理論 プロジェクトr to the next stage
Rancher最速セットアップ理論 プロジェクトr to the next stage
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
 
Unlocked Nov 2013: Main Slide Pack
Unlocked Nov 2013: Main Slide PackUnlocked Nov 2013: Main Slide Pack
Unlocked Nov 2013: Main Slide Pack
 
10 things I learned building Nomad packs
10 things I learned building Nomad packs10 things I learned building Nomad packs
10 things I learned building Nomad packs
 
爬虫点滴
爬虫点滴爬虫点滴
爬虫点滴
 
[FrontDays'2017] Леонид Блохин (Big Data Engineer): Мист. Сервис для работы с...
[FrontDays'2017] Леонид Блохин (Big Data Engineer): Мист. Сервис для работы с...[FrontDays'2017] Леонид Блохин (Big Data Engineer): Мист. Сервис для работы с...
[FrontDays'2017] Леонид Блохин (Big Data Engineer): Мист. Сервис для работы с...
 
My First Big Data Application
My First Big Data ApplicationMy First Big Data Application
My First Big Data Application
 
DistributingSoftwareKnowledgeForDevOps
DistributingSoftwareKnowledgeForDevOpsDistributingSoftwareKnowledgeForDevOps
DistributingSoftwareKnowledgeForDevOps
 

More from Russell Jurney

Agile Data Science 2.0: Using Spark with MongoDB
Agile Data Science 2.0: Using Spark with MongoDBAgile Data Science 2.0: Using Spark with MongoDB
Agile Data Science 2.0: Using Spark with MongoDBRussell Jurney
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0Russell Jurney
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0Russell Jurney
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0Russell Jurney
 
Predictive Analytics with Airflow and PySpark
Predictive Analytics with Airflow and PySparkPredictive Analytics with Airflow and PySpark
Predictive Analytics with Airflow and PySparkRussell Jurney
 
Agile Data Science 2.0 - Big Data Science Meetup
Agile Data Science 2.0 - Big Data Science MeetupAgile Data Science 2.0 - Big Data Science Meetup
Agile Data Science 2.0 - Big Data Science MeetupRussell Jurney
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySparkRussell Jurney
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0Russell Jurney
 
Networks All Around Us: Extracting networks from your problem domain
Networks All Around Us: Extracting networks from your problem domainNetworks All Around Us: Extracting networks from your problem domain
Networks All Around Us: Extracting networks from your problem domainRussell Jurney
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsRussell Jurney
 
Agile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsAgile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsRussell Jurney
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoopRussell Jurney
 

More from Russell Jurney (13)

Agile Data Science 2.0: Using Spark with MongoDB
Agile Data Science 2.0: Using Spark with MongoDBAgile Data Science 2.0: Using Spark with MongoDB
Agile Data Science 2.0: Using Spark with MongoDB
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Predictive Analytics with Airflow and PySpark
Predictive Analytics with Airflow and PySparkPredictive Analytics with Airflow and PySpark
Predictive Analytics with Airflow and PySpark
 
Agile Data Science 2.0 - Big Data Science Meetup
Agile Data Science 2.0 - Big Data Science MeetupAgile Data Science 2.0 - Big Data Science Meetup
Agile Data Science 2.0 - Big Data Science Meetup
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
Networks All Around Us: Extracting networks from your problem domain
Networks All Around Us: Extracting networks from your problem domainNetworks All Around Us: Extracting networks from your problem domain
Networks All Around Us: Extracting networks from your problem domain
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics Applications
 
Agile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsAgile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics Applications
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Azkaban and Pig at LinkedIn

  • 1. Azkaban and Pig Richard Park, Russell Jurney LinkedIn Search, Network, Analytics
  • 2. Installing & Running Azkaban wgethttp://github.com/downloads/azkaban/azkaban/azkaban-0.04.tar.gz tar –xvzf azkaban-0.04.tar.gz mkdir /some-dir/azkaban-jobs cd azkaban-0.04 bin/azkaban-server.sh –job-dir /some-dir/azkaban-jobs
  • 4. Pig Configuration myproject.properties – Global Configuration hadoop.job.ugi=rjurney,hadoop udf.import.list=org.apache.pig.builtin.,com.linkedin.pig.,com.linkedin… cc_0_compute_title_counts.job – Pig Job type=pig pig.script=cc_0_compute_title_counts.pig cc_1_reverse_engineer_durak_3000 – Pig Job with Dependency type=pig pig.script=cc_1_reverse_engineer_durak_3000.pig dependencies=cc_0_compute_title_counts
  • 5. Running Job > bin/run-jobs.sh –job-dir /some-dir/azkaban-jobs my-job