SlideShare a Scribd company logo
Jean-Pierre König, MeMo News AG



  OPENING THE TOOL BOX
  DEVELOPMENT, TESTING AND DEPLOYMENT IN THE HADOOP
  ECOSYSTEM

  14.05.12

http://www.flickr.com/photos/theaucitron/5810163712/sizes/l/in/photostream/
Development

 THE APPLICATION


http://www.flickr.com/photos/oskay/2523189273/sizes/l/in/photostream/
Development

The Applicationisa ...
  • Distributed newsagent
  • GUI-less Java Application
  • Spring-based 2-layer architecture
     • Services and data access objects
  • Client of Hadoop
     • Dependencies to Zookeeper and HBase




                                             14.05.12
Development(2)

We use Maven 3 for
  • Project structure -Corporate POM & Modules
  • Dependency Management
  • Build the artifact             Corporate
                                                                   POM


           global                            newsagent           tools   mapred

                                               Loader (Client)
                            Infrastructure
            Model

                    Utils




                                                  Services

                                                Data Access
                                                  Objects

                                                                             14.05.12
Development

 MAPREDUCEJOBS


http://www.flickr.com/photos/elasticsoul/61062372/sizes/l/in/photostream/
MapReduce
6


    • Java MR jobs for business processes
      • Input and output paths either HDFS or HBase
      • MR job chaining by Azkaban
    • PIG, HIVE for ad-hoc queries




                                                14.05.12
Development

 HBASE


http://www.flickr.com/photos/isherwoodchris/6902155937/sizes/l/in/photostream/
HBase

• HBase Schema Manager
  • github.com/jkoenig/hbase-schema-manager
• Utilities to copy/move/rename column-families
  and copy complete tables with it's data
  • github.com/memonews/hbase-utils
• Stargate REST API without compression
  • github.com/memonews/hbase-stargate



                                          14.05.12
Hadoop, HBase, Zookeeper

 TESTING


http://www.flickr.com/photos/42106306@N00/4380803535/sizes/m/in/photostream/
HBase

• We use the Apache HBaseTestingUtility
• It’s in-memory  complete hadoop instance
  with dfs, zk and hbase
• It‘s very slow – conciderlongrunning IT
publicclassConfigurableHBaseClient {
protectedstaticHBaseTestingUtility TEST_UTIL;
static{
   final Configurationconf = HBaseConfiguration.create();
conf.addResource("hbase-default-test.xml");
try{
TEST_UTIL = HBaseTestingUtilityFactory.getMiniCluster(1, conf);
   } catch (final Exception e) {
fail("Couldnot start hadoop mini cluster.");
   }
 }
}

                                                                  14.05.12
MapReduce

• Since business logic involved, we use hadoop-
  mrunit for testing Map/Reduce Jobs
• It’s in-memory testing
    • Parameterized Mapper/Reducer with a driver


@Test
publicvoidreduceShouldWriteExactlyOneLinePerMap() throwsIOException {
final List<DoubleWritable>values = newArrayList<DoubleWritable>();
values.add(new DoubleWritable(399287729));
this.driver.withInput(newText("de.t-online/nachrichten"), values);
this.driver.run();
 assertEquals(1, this.driver.getCounters().findCounter(
MeMoCounters.SIGNALS_WRITTEN).getValue());
}

                                                                        14.05.12
Zookeeper

• We use the Apache Zookeeper ClientBase
• It‘s not in-memory but against the staging
  cluster
    • Prefix paths e.g.: /test/memo/subscribers



@Test
publicvoidgetNumberOfSubscribersShouldSetWatchFlag()
throwsKeeperException,InterruptedException{
  final SubscriberDaoImplsubscriberDao =
newSubscriberDaoImpl(zookeeperDao, DIR, null);
subscriberDao.getNumberOfSubscribers(listener);
verify(this.zookeeper, times(1)).getChildren(eq(DIR), eq(subscriberDao));
}

                                                                            14.05.12
Deployment

 THE APPLICATION


http://www.flickr.com/photos/navalsurfaceforces/5553412190/sizes/l/in/photostream/
The Application

• Automated build and restart via capistrano
• Build on every machine
    • There is a .m2 repository everywhere

set :deploy_to, "/usr/share/memo-newsagent“
set:keep_releases, 1

after "deploy:setup" do
run "mkdir -p /var/run/memo #{shared_path}/logs /var/log/memo/"
  ...
end

after "deploy:update_code" do
run "cd #{current_release} &&mvninstall-Pfast> #{shared_path}/logs/build.log"
end

after "deploy", "rowlog:stop", "newsagent:restart", "rowlog:start"

                                                                           14.05.12
Deployment

 MAPREDUCE JOBS


http://www.flickr.com/photos/navalsurfaceforces/6257239933/sizes/l/in/photostream/
Map Reduce Jobs

• We use a Maven HadoopPlugin
hadoop:pack a la mvn:package
hadoop:deploy HDFS and target folder
• All dependencies packed-in  Careful: Huge
  JARs without dependency management



see github.com/memonews/maven-hadoop

                                       14.05.12
DevOps

 OTHER TOOLS IN USE


http://www.flickr.com/photos/damongman/4979871047/sizes/l/in/photostream/
Other Tools

• Staging environment in-house, 1 to 1 copy
  from production (virtualized)
• Azkaban for MR job scheduling
• Jenkins for (Integration-) Tests and Metrics
• GIT
• Icinga for Monitoring & Alerting
• Ganglia / Graphite for Hadoop Metrics
• Fliwi for automated cluster provisioning

                                           14.05.12
jean-pierre.koenig@menonews.com

THANKS!
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)

More Related Content

Similar to 14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)

DavidWible_res
DavidWible_resDavidWible_res
DavidWible_res
david wible
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College London
Vitthal Gogate
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
Richard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
outstanding59
 
Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
Prashanth Shankar kumar
 
MapR Unique features
MapR Unique featuresMapR Unique features
MapR Unique features
Vishwas Tengse
 
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
London Microservices
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environment
Evans Ye
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
Emil Andreas Siemes
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its Components
Namuk Park
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin Leau
Codemotion
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
Steve Min
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
BigDataEverywhere
 
Monika_Raghuvanshi
Monika_RaghuvanshiMonika_Raghuvanshi
Monika_Raghuvanshi
Monika Raghuvanshi
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
Taldor Group
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
Steve Loughran
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
Rajesh Nadipalli
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
Rajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
nvvrajesh
 

Similar to 14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG) (20)

DavidWible_res
DavidWible_resDavidWible_res
DavidWible_res
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College London
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
MapR Unique features
MapR Unique featuresMapR Unique features
MapR Unique features
 
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environment
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its Components
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin Leau
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 
Monika_Raghuvanshi
Monika_RaghuvanshiMonika_Raghuvanshi
Monika_Raghuvanshi
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 

More from Swiss Big Data User Group

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
Swiss Big Data User Group
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operator
Swiss Big Data User Group
 
Data Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2CData Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2C
Swiss Big Data User Group
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
Swiss Big Data User Group
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companies
Swiss Big Data User Group
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
Swiss Big Data User Group
 
Educating Data Scientists of the Future
Educating Data Scientists of the FutureEducating Data Scientists of the Future
Educating Data Scientists of the Future
Swiss Big Data User Group
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data Warehouse
Swiss Big Data User Group
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
Swiss Big Data User Group
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexity
Swiss Big Data User Group
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
Swiss Big Data User Group
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
Swiss Big Data User Group
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
Swiss Big Data User Group
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
Swiss Big Data User Group
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computing
Swiss Big Data User Group
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
Swiss Big Data User Group
 
Big Data Visualization With ParaView
Big Data Visualization With ParaViewBig Data Visualization With ParaView
Big Data Visualization With ParaView
Swiss Big Data User Group
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
Swiss Big Data User Group
 

More from Swiss Big Data User Group (20)

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operator
 
Data Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2CData Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2C
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companies
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Educating Data Scientists of the Future
Educating Data Scientists of the FutureEducating Data Scientists of the Future
Educating Data Scientists of the Future
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data Warehouse
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexity
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computing
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
Big Data Visualization With ParaView
Big Data Visualization With ParaViewBig Data Visualization With ParaView
Big Data Visualization With ParaView
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 

Recently uploaded

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 

Recently uploaded (20)

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 

14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)

  • 1. Jean-Pierre König, MeMo News AG OPENING THE TOOL BOX DEVELOPMENT, TESTING AND DEPLOYMENT IN THE HADOOP ECOSYSTEM 14.05.12 http://www.flickr.com/photos/theaucitron/5810163712/sizes/l/in/photostream/
  • 3. Development The Applicationisa ... • Distributed newsagent • GUI-less Java Application • Spring-based 2-layer architecture • Services and data access objects • Client of Hadoop • Dependencies to Zookeeper and HBase 14.05.12
  • 4. Development(2) We use Maven 3 for • Project structure -Corporate POM & Modules • Dependency Management • Build the artifact Corporate POM global newsagent tools mapred Loader (Client) Infrastructure Model Utils Services Data Access Objects 14.05.12
  • 6. MapReduce 6 • Java MR jobs for business processes • Input and output paths either HDFS or HBase • MR job chaining by Azkaban • PIG, HIVE for ad-hoc queries 14.05.12
  • 8. HBase • HBase Schema Manager • github.com/jkoenig/hbase-schema-manager • Utilities to copy/move/rename column-families and copy complete tables with it's data • github.com/memonews/hbase-utils • Stargate REST API without compression • github.com/memonews/hbase-stargate 14.05.12
  • 9. Hadoop, HBase, Zookeeper TESTING http://www.flickr.com/photos/42106306@N00/4380803535/sizes/m/in/photostream/
  • 10. HBase • We use the Apache HBaseTestingUtility • It’s in-memory  complete hadoop instance with dfs, zk and hbase • It‘s very slow – conciderlongrunning IT publicclassConfigurableHBaseClient { protectedstaticHBaseTestingUtility TEST_UTIL; static{ final Configurationconf = HBaseConfiguration.create(); conf.addResource("hbase-default-test.xml"); try{ TEST_UTIL = HBaseTestingUtilityFactory.getMiniCluster(1, conf); } catch (final Exception e) { fail("Couldnot start hadoop mini cluster."); } } } 14.05.12
  • 11. MapReduce • Since business logic involved, we use hadoop- mrunit for testing Map/Reduce Jobs • It’s in-memory testing • Parameterized Mapper/Reducer with a driver @Test publicvoidreduceShouldWriteExactlyOneLinePerMap() throwsIOException { final List<DoubleWritable>values = newArrayList<DoubleWritable>(); values.add(new DoubleWritable(399287729)); this.driver.withInput(newText("de.t-online/nachrichten"), values); this.driver.run(); assertEquals(1, this.driver.getCounters().findCounter( MeMoCounters.SIGNALS_WRITTEN).getValue()); } 14.05.12
  • 12. Zookeeper • We use the Apache Zookeeper ClientBase • It‘s not in-memory but against the staging cluster • Prefix paths e.g.: /test/memo/subscribers @Test publicvoidgetNumberOfSubscribersShouldSetWatchFlag() throwsKeeperException,InterruptedException{ final SubscriberDaoImplsubscriberDao = newSubscriberDaoImpl(zookeeperDao, DIR, null); subscriberDao.getNumberOfSubscribers(listener); verify(this.zookeeper, times(1)).getChildren(eq(DIR), eq(subscriberDao)); } 14.05.12
  • 14. The Application • Automated build and restart via capistrano • Build on every machine • There is a .m2 repository everywhere set :deploy_to, "/usr/share/memo-newsagent“ set:keep_releases, 1 after "deploy:setup" do run "mkdir -p /var/run/memo #{shared_path}/logs /var/log/memo/" ... end after "deploy:update_code" do run "cd #{current_release} &&mvninstall-Pfast> #{shared_path}/logs/build.log" end after "deploy", "rowlog:stop", "newsagent:restart", "rowlog:start" 14.05.12
  • 16. Map Reduce Jobs • We use a Maven HadoopPlugin hadoop:pack a la mvn:package hadoop:deploy HDFS and target folder • All dependencies packed-in  Careful: Huge JARs without dependency management see github.com/memonews/maven-hadoop 14.05.12
  • 17. DevOps OTHER TOOLS IN USE http://www.flickr.com/photos/damongman/4979871047/sizes/l/in/photostream/
  • 18. Other Tools • Staging environment in-house, 1 to 1 copy from production (virtualized) • Azkaban for MR job scheduling • Jenkins for (Integration-) Tests and Metrics • GIT • Icinga for Monitoring & Alerting • Ganglia / Graphite for Hadoop Metrics • Fliwi for automated cluster provisioning 14.05.12