SlideShare a Scribd company logo
1 of 20
Jean-Pierre König, MeMo News AG



  OPENING THE TOOL BOX
  DEVELOPMENT, TESTING AND DEPLOYMENT IN THE HADOOP
  ECOSYSTEM

  14.05.12

http://www.flickr.com/photos/theaucitron/5810163712/sizes/l/in/photostream/
Development

 THE APPLICATION


http://www.flickr.com/photos/oskay/2523189273/sizes/l/in/photostream/
Development

The Applicationisa ...
  • Distributed newsagent
  • GUI-less Java Application
  • Spring-based 2-layer architecture
     • Services and data access objects
  • Client of Hadoop
     • Dependencies to Zookeeper and HBase




                                             14.05.12
Development(2)

We use Maven 3 for
  • Project structure -Corporate POM & Modules
  • Dependency Management
  • Build the artifact             Corporate
                                                                   POM


           global                            newsagent           tools   mapred

                                               Loader (Client)
                            Infrastructure
            Model

                    Utils




                                                  Services

                                                Data Access
                                                  Objects

                                                                             14.05.12
Development

 MAPREDUCEJOBS


http://www.flickr.com/photos/elasticsoul/61062372/sizes/l/in/photostream/
MapReduce
6


    • Java MR jobs for business processes
      • Input and output paths either HDFS or HBase
      • MR job chaining by Azkaban
    • PIG, HIVE for ad-hoc queries




                                                14.05.12
Development

 HBASE


http://www.flickr.com/photos/isherwoodchris/6902155937/sizes/l/in/photostream/
HBase

• HBase Schema Manager
  • github.com/jkoenig/hbase-schema-manager
• Utilities to copy/move/rename column-families
  and copy complete tables with it's data
  • github.com/memonews/hbase-utils
• Stargate REST API without compression
  • github.com/memonews/hbase-stargate



                                          14.05.12
Hadoop, HBase, Zookeeper

 TESTING


http://www.flickr.com/photos/42106306@N00/4380803535/sizes/m/in/photostream/
HBase

• We use the Apache HBaseTestingUtility
• It’s in-memory  complete hadoop instance
  with dfs, zk and hbase
• It‘s very slow – conciderlongrunning IT
publicclassConfigurableHBaseClient {
protectedstaticHBaseTestingUtility TEST_UTIL;
static{
   final Configurationconf = HBaseConfiguration.create();
conf.addResource("hbase-default-test.xml");
try{
TEST_UTIL = HBaseTestingUtilityFactory.getMiniCluster(1, conf);
   } catch (final Exception e) {
fail("Couldnot start hadoop mini cluster.");
   }
 }
}

                                                                  14.05.12
MapReduce

• Since business logic involved, we use hadoop-
  mrunit for testing Map/Reduce Jobs
• It’s in-memory testing
    • Parameterized Mapper/Reducer with a driver


@Test
publicvoidreduceShouldWriteExactlyOneLinePerMap() throwsIOException {
final List<DoubleWritable>values = newArrayList<DoubleWritable>();
values.add(new DoubleWritable(399287729));
this.driver.withInput(newText("de.t-online/nachrichten"), values);
this.driver.run();
 assertEquals(1, this.driver.getCounters().findCounter(
MeMoCounters.SIGNALS_WRITTEN).getValue());
}

                                                                        14.05.12
Zookeeper

• We use the Apache Zookeeper ClientBase
• It‘s not in-memory but against the staging
  cluster
    • Prefix paths e.g.: /test/memo/subscribers



@Test
publicvoidgetNumberOfSubscribersShouldSetWatchFlag()
throwsKeeperException,InterruptedException{
  final SubscriberDaoImplsubscriberDao =
newSubscriberDaoImpl(zookeeperDao, DIR, null);
subscriberDao.getNumberOfSubscribers(listener);
verify(this.zookeeper, times(1)).getChildren(eq(DIR), eq(subscriberDao));
}

                                                                            14.05.12
Deployment

 THE APPLICATION


http://www.flickr.com/photos/navalsurfaceforces/5553412190/sizes/l/in/photostream/
The Application

• Automated build and restart via capistrano
• Build on every machine
    • There is a .m2 repository everywhere

set :deploy_to, "/usr/share/memo-newsagent“
set:keep_releases, 1

after "deploy:setup" do
run "mkdir -p /var/run/memo #{shared_path}/logs /var/log/memo/"
  ...
end

after "deploy:update_code" do
run "cd #{current_release} &&mvninstall-Pfast> #{shared_path}/logs/build.log"
end

after "deploy", "rowlog:stop", "newsagent:restart", "rowlog:start"

                                                                           14.05.12
Deployment

 MAPREDUCE JOBS


http://www.flickr.com/photos/navalsurfaceforces/6257239933/sizes/l/in/photostream/
Map Reduce Jobs

• We use a Maven HadoopPlugin
hadoop:pack a la mvn:package
hadoop:deploy HDFS and target folder
• All dependencies packed-in  Careful: Huge
  JARs without dependency management



see github.com/memonews/maven-hadoop

                                       14.05.12
DevOps

 OTHER TOOLS IN USE


http://www.flickr.com/photos/damongman/4979871047/sizes/l/in/photostream/
Other Tools

• Staging environment in-house, 1 to 1 copy
  from production (virtualized)
• Azkaban for MR job scheduling
• Jenkins for (Integration-) Tests and Metrics
• GIT
• Icinga for Monitoring & Alerting
• Ganglia / Graphite for Hadoop Metrics
• Fliwi for automated cluster provisioning

                                           14.05.12
jean-pierre.koenig@menonews.com

THANKS!
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)

More Related Content

Similar to 14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)

Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonVitthal Gogate
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...London Microservices
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environmentEvans Ye
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsNamuk Park
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauCodemotion
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Steve Min
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)BigDataEverywhere
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emcTaldor Group
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloudSteve Loughran
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 

Similar to 14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG) (20)

DavidWible_res
DavidWible_resDavidWible_res
DavidWible_res
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College London
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
MapR Unique features
MapR Unique featuresMapR Unique features
MapR Unique features
 
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environment
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its Components
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin Leau
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 
Monika_Raghuvanshi
Monika_RaghuvanshiMonika_Raghuvanshi
Monika_Raghuvanshi
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 

More from Swiss Big Data User Group

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useSwiss Big Data User Group
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorSwiss Big Data User Group
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisSwiss Big Data User Group
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesSwiss Big Data User Group
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningSwiss Big Data User Group
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseSwiss Big Data User Group
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexitySwiss Big Data User Group
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceSwiss Big Data User Group
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketSwiss Big Data User Group
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridSwiss Big Data User Group
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseSwiss Big Data User Group
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computingSwiss Big Data User Group
 

More from Swiss Big Data User Group (20)

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operator
 
Data Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2CData Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2C
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companies
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Educating Data Scientists of the Future
Educating Data Scientists of the FutureEducating Data Scientists of the Future
Educating Data Scientists of the Future
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data Warehouse
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexity
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computing
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
Big Data Visualization With ParaView
Big Data Visualization With ParaViewBig Data Visualization With ParaView
Big Data Visualization With ParaView
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 

Recently uploaded

PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsUXDXConf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 

Recently uploaded (20)

PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 

14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)

  • 1. Jean-Pierre König, MeMo News AG OPENING THE TOOL BOX DEVELOPMENT, TESTING AND DEPLOYMENT IN THE HADOOP ECOSYSTEM 14.05.12 http://www.flickr.com/photos/theaucitron/5810163712/sizes/l/in/photostream/
  • 3. Development The Applicationisa ... • Distributed newsagent • GUI-less Java Application • Spring-based 2-layer architecture • Services and data access objects • Client of Hadoop • Dependencies to Zookeeper and HBase 14.05.12
  • 4. Development(2) We use Maven 3 for • Project structure -Corporate POM & Modules • Dependency Management • Build the artifact Corporate POM global newsagent tools mapred Loader (Client) Infrastructure Model Utils Services Data Access Objects 14.05.12
  • 6. MapReduce 6 • Java MR jobs for business processes • Input and output paths either HDFS or HBase • MR job chaining by Azkaban • PIG, HIVE for ad-hoc queries 14.05.12
  • 8. HBase • HBase Schema Manager • github.com/jkoenig/hbase-schema-manager • Utilities to copy/move/rename column-families and copy complete tables with it's data • github.com/memonews/hbase-utils • Stargate REST API without compression • github.com/memonews/hbase-stargate 14.05.12
  • 9. Hadoop, HBase, Zookeeper TESTING http://www.flickr.com/photos/42106306@N00/4380803535/sizes/m/in/photostream/
  • 10. HBase • We use the Apache HBaseTestingUtility • It’s in-memory  complete hadoop instance with dfs, zk and hbase • It‘s very slow – conciderlongrunning IT publicclassConfigurableHBaseClient { protectedstaticHBaseTestingUtility TEST_UTIL; static{ final Configurationconf = HBaseConfiguration.create(); conf.addResource("hbase-default-test.xml"); try{ TEST_UTIL = HBaseTestingUtilityFactory.getMiniCluster(1, conf); } catch (final Exception e) { fail("Couldnot start hadoop mini cluster."); } } } 14.05.12
  • 11. MapReduce • Since business logic involved, we use hadoop- mrunit for testing Map/Reduce Jobs • It’s in-memory testing • Parameterized Mapper/Reducer with a driver @Test publicvoidreduceShouldWriteExactlyOneLinePerMap() throwsIOException { final List<DoubleWritable>values = newArrayList<DoubleWritable>(); values.add(new DoubleWritable(399287729)); this.driver.withInput(newText("de.t-online/nachrichten"), values); this.driver.run(); assertEquals(1, this.driver.getCounters().findCounter( MeMoCounters.SIGNALS_WRITTEN).getValue()); } 14.05.12
  • 12. Zookeeper • We use the Apache Zookeeper ClientBase • It‘s not in-memory but against the staging cluster • Prefix paths e.g.: /test/memo/subscribers @Test publicvoidgetNumberOfSubscribersShouldSetWatchFlag() throwsKeeperException,InterruptedException{ final SubscriberDaoImplsubscriberDao = newSubscriberDaoImpl(zookeeperDao, DIR, null); subscriberDao.getNumberOfSubscribers(listener); verify(this.zookeeper, times(1)).getChildren(eq(DIR), eq(subscriberDao)); } 14.05.12
  • 14. The Application • Automated build and restart via capistrano • Build on every machine • There is a .m2 repository everywhere set :deploy_to, "/usr/share/memo-newsagent“ set:keep_releases, 1 after "deploy:setup" do run "mkdir -p /var/run/memo #{shared_path}/logs /var/log/memo/" ... end after "deploy:update_code" do run "cd #{current_release} &&mvninstall-Pfast> #{shared_path}/logs/build.log" end after "deploy", "rowlog:stop", "newsagent:restart", "rowlog:start" 14.05.12
  • 16. Map Reduce Jobs • We use a Maven HadoopPlugin hadoop:pack a la mvn:package hadoop:deploy HDFS and target folder • All dependencies packed-in  Careful: Huge JARs without dependency management see github.com/memonews/maven-hadoop 14.05.12
  • 17. DevOps OTHER TOOLS IN USE http://www.flickr.com/photos/damongman/4979871047/sizes/l/in/photostream/
  • 18. Other Tools • Staging environment in-house, 1 to 1 copy from production (virtualized) • Azkaban for MR job scheduling • Jenkins for (Integration-) Tests and Metrics • GIT • Icinga for Monitoring & Alerting • Ganglia / Graphite for Hadoop Metrics • Fliwi for automated cluster provisioning 14.05.12