Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analysis Software Development

870 views

Published on

  • Be the first to comment

  • Be the first to like this

Analysis Software Development

  1. 1. The EventView Analysis Framework Akira Shibata, QMUL Amir Farbin, CERN Kyle Cranmer, BNL 1
  2. 2. Outline 1. Event Data Model and Analysis Model • Where we are. • Current issues. 2. What is EventView? • The EventView EDM class. • EventView framework. • PhysicsView packages. 3. Where is EventView? • What is the scope of EventView in PAT. • The scope of EventView in DA. • Current use cases. 4. Future developments • Future developments in EDM • Future developments in PANDA 2
  3. 3. Our Goal • Is to do physics analysis to the level that we can publish our result. • Great deal of effort went in. Lots of trial and error and it is still evolving very fast. (Computing Service Commissioning, CSC is one such place.) • There are thousands of people trying to analyse huge amount of data and we need the right environment to do this. • Physics analysis tools and distributed analysis are trying to solve the problems. 3
  4. 4. Acronyms • Athena: The Atlas Software (algorithms in C++, steering with Python). Used for Monte Carlo, Simulation, Recosntruction, and Analysis. Also used by the High Level Trigger. • RAW: Events as output by the Event Filter for reconstruction. The computing model assumes it is only at Tier0 site (1.6MB/event at rate of 200Hz). • CBNT: “ComBined NTuple” summarizes output of reconstruction. Read into Root, can’t be read in Athena. Not part of the computing model. • ESD: Event Summary Data, is the output of Reconstruction can be read into Athena. Only to be stored at Tier1 sites. (500kB/event). • AOD: Analysis Object Data, distributed to every one and meant for analysis, can be readin Athena(100kB/event). • DPD: Derived Physics Data, included in computing model, something analogous to an ntuple. • Tags: Event-level metadata used to quickly select “interesting events” (1kB/event). This talk focuses mainly on DPD production from AOD. 4
  5. 5. tier 0 RAW Production ESD tier 1 EV Code Production AOD Slim AOD Tag tier 2 Athena Code Athena Dump Event View Private Dump Private CBNT EV Ntuple Ntuple Analysis Ntuple home (DPD) impossible no standard standardized personalized waste of resource common prod. Less Analysis in ROOT 5
  6. 6. Part 1 Event Data Model and Analysis Model 6
  7. 7. ! object usedused by reconstruction algorithm to carry state ! object by reconstruction algorithm to carry state between various tools Event Data Model between various tools History of our data format and Evolution: MigrationCBNT dumper to be dumped into Evolution of information toCBNTto POOL dumped into ! carry all Workflow to the the dumper to history of our analysis. ! carry all information be Evolution “flat” of Workflow Evolution ntuple a of Workflow a “flat”Athena.exe ntuple Root or paw/hbook Ancient History DC1 Athena.exe Athena.exe Root Athena Rootpaw/hbook or or paw/hbook ROOT Signal significance H 5 !! $ L dt = 30 fb!1 ttH (H 5 bb) (no K!factors) H 5 ZZ (*) 5 4l (*) ATLAS H 5 WW 5 l"l" 10 2 (*) Event DC1 qqH 5 qq WW ¯ Hadronic Top Mass qqH 5 qq ## ttbar (MC NLO) b Before we used POOL to save Reconstructionthe input Event reconstruction files, to Total significance Signal significance H 5 !! $ L dt = 30 fb!1 Signal significance Entry H 5 !! ttH (H 5 bb) !1 $ L dt = 30 fb ttH (H 5 bb) (no K!factors) W + Jets (Alpgen) H 5 ZZ (*) 5 4l Selection (*) (no K!factors) H 5 ZZ (*) 5 4 l ATLAS H 5 WW 5 l"l" 300 ¯ H 5 WW 2 5 l"l" (*) (*) Geant3 ATLAS 10 DC1 qqH 5 qq WW q Event 10 2 (*) qqH 5 qq ## DC1 qqH 5 qq WW qqH 5 qq ## Total significance 250 10 Total significance q was a ZEBRA file, and the EDM (egamma, etc.) had two purposes: Analysis Selection Selection Reconstruction Reconstruction 200 Geant3 Geant3 W+ Reconstruction l+ 150 10 t 10 ν 100 1 100 120 140 160 180 mH (GeV/c ) 200 2 50 object used by reconstruction algorithm to carry state b ! 1 100 120 140 160 180 200 1 0 120 2 100 0 50 140 100 160 150 180 200 250200 300 350 400 mH (GeV/c ) 2 mH (GeV/c ) GeV between various tools root Zebra CBNT PostSc ! carry all root information to the CBNT dumper to be dumped into Zebra CBNTRoot root Zebra Zebra CBNT CBNT PostScriptPostScript PS Evolution of Workflow a “flat” ntuple or paw/hbook or PyRoot from athena −i −i Athena or PyRoot ROOTathena Root from DC2/Rome/CSC Athena.py Athena.exe Root Much Athena.py current situation is related to of our bour of Root this historical Much Athena.py ¯current situation is related to this historical Root or PyRoot from athena −i situation. Signal significance Hadronic TopHMass Signal significance ttbar (MC NLO) $ L dt = 30 fb!1 H 5 !! situation. 5 !! ttH (H 5 bb) $ L dt = 30 fb!1 ttH (H 5 bb) (no K!factors) (*) H 5 ZZ 5 4l (no K!factors) H 5 ZZ (*) 5 4l Entry (*) Signal significance ATLAS H 5 WW 5 ll ¯ (*) ATLAS H 5 WW 5 ll W + Jets (Alpgen) H 5 ! ! $ L dt =10 2fb!1 Signal significance (*) 30 q Event Tag 10 2 (*) ttH (H 5 bb) qqH 5 qq WW DC1 H 5 5 ! qq WW qqH ! 300dt = 30 fb!1 $L (no K!factors) (*) qqH 5 qq ## AOD Event DC2 ttH (H5 5 qq ## qqH bb) H 5 ZZ 5 4l (*) (no K!factors) H 5 ZZ (*) 5 4 l ATLAS H 5 WW 5 ll Total significance Total significance H 5 WW 2 5 ll (*) (*) ATLAS 10 qqH 5 qq WW q AOD Tag Analysis Event 10 2 250 (*) qqH 5 qq ## DC2 qqH 5 qq WW Reconstruction AOD Tag Selection Event qqH 5 qq ## Total significance DC2 l+ Reconstruction Reconstruction Builders Selection Total significance Geant3 Geant4 200 + W Reconstruction t 10 10 ν Reconstruction Builders Builders Selection Selection 150 Geant4 Geant4 10 Kyle Cranmer Cranmer (BNL) Kyle (BNL) Analysis Model Workshop, October October 26, 2006 Analysis Model Workshop, 26, 2006 10 4 100 b 1 100 50 120 140 160 180 200 1 100 120 140 160 180 200 2 mH (GeV/c ) 2 0 1200 mH (GeV/c ) 0 50 100 150 250 100 300 120 350 140 400 160 180 200 1 2 100 120 140 160 180 200 GeV mH (GeV/c ) 2 mH (GeV/c ) EVNT Hits ESD/ Zebra SimulDigi CBNT Ntuple PS PostScript root Pool pool SimulDigi Pool SimulDigi ESD ESD AOD AOD RootTuple RootTuple AOD RootTuple PostSc pool pool ESD AOD PostScriptPostScript pool pool pool Tags Tags Tags Much Athena.py current situation is relatedRoot or PyRoot from athena −i of our to this historical 7
  8. 8. • CBNT • Flat ntuple with all information from reconstruction. • No common look and feel. • No official mechanism to share activity after reconstruction. • AOD/ESD is nice! • It has common interface to all objects. • With it, we can do analysis using Athena. • We need to publish and share analysis code for cross check and reproduction of results. • But why Athena? • All of our reconstruction alg is in Athena. • We’ll need to be able to re-run clustering, vertexing, track extrapolation, calibration etc on AOD. • How about Root • We always need to use Root for our final analysis. • Athena is good for event-level processing but not sample level. • Two stage analysis • We cannot do everything in one place. Some in Athena, rest in Root. • Support flexibility in analysis. But what to do where? 8
  9. 9. Where to do Analysis? Root lovers / Athena purists physicists / computing specialists 1. Pre-physics analysis: reconstruction. 2. Baseline analysis: preselection overlap removal, further processing such as jet re-calibration etc. 3. Event level analysis: object reconstruction, event variable calculation etc. 4. Sample level analysis: plot histogram, fitting templates, toy MC etc. Athena Central Athena Analysis on Root Analysis Production Distributed Ana Local or DA 9
  10. 10. Two-Stage Analysis A global view Feedback / port analysis to Athena Athena Analysis Development on Lxplus Root Analysis at Home AOD Athena Ntuple Ntuple Histogram DPD Production on DA Publish Monitor and retrieve DA Submission Feed back Backnavigation Re-Submission AOD AOD Ath Ntupl DA Submission analysis Scope of this workshop: Athena analysis and distributed analysis 10
  11. 11. • Athena analysis should offer: • The official place for common analysis. • Should be easy enough for physicists to use! • Simple mechanism for sharing code - modular analysis. • Should be more than an ntuple dumper, need to do “analysis”. (e.g. tune PID, re-calibrate) • AOD object definition is loose (needs to be): • It has overlapping objects (e.g. electrons and jets). • It has loose object selection (e.g. no isolation cut). • It has multiple algorithms (e.g. Staco/Muid muons). • First step in analysis: Define your “view” of events: • Do this by constructing EventView. • Further baseline and event-level analysis using EventView tools and modules. • Produce Derived Physics Data, DPD (ntuple) for Root analysis: • Private ntuple. • Group standard DPD. 11
  12. 12. Part 2 What is EventView? 12
  13. 13. What is EventView? EventView is a container which holds objects (or links to them) needed in physics analysis. 13
  14. 14. EventView Modular Analysis Framework Calibrate Reconst. AOD AOD Insert Electron W and nu Preselect BJet Calculate Truth Truth AOD AOD Mt Remove Calibrate Elec Associate MET Match Overlap MET B-Jet Truth AOD Muon Once we have defined our container, now analysis can be divided into pieces. Each step of analysis divided into separate tools. Each tool is made independent and general as much as possible. Analysis information is propagated through EventView. Each tool receives EventView and execute something on EventView. Gradually build up EventView with more information. 14
  15. 15. EVToolLooper Components of EV EVTool (Inserter) Framework EV • EVTool EVObjTool (ObjLooper) Obj (ObjCalculator) EVToolLooper (or just looper) EV • Schedule all EV components. Tool EVModule • EVTools (or just Tools) • Obj EV Tool Smallest component in EV. Tool • Written in C++. EV • Mainly EV tool and EV obj tool. • Module Tool Obj Tool Other types of tools too. • Tool EV EVModules (or just Modules) Tool • Set of tools and configurations. • Written in Python. DPD • Can be EV level or obj level. 15
  16. 16. Extending the Framework • It is crucial that the framework is easily extendable. • EventView framework is a collection of tools, extendibility is its natural feature. • Minimize work needed to add components. • Subdivide usefulness of the tools into different types. • Interface and functionality inherited through class hierarchy. • Skeleton code generated for typical type of tools. Generate skeleton Just implement this function 16
  17. 17. AlgTool Object Oriented Design Minimize effort in development EventViewBaseTool bool MultipleOutput bool MultipleInOut virtual StatusCode execute(EventView*) virtual StatusCode executeMO(EventView* EventViewContainer ) virtual StatusCode executeMIMO(EventViewContainer , EventViewContainer ) EVUDCalcBase EVUDEVCalcBase EVUDObjCalcBase bool m_EVUDTool executeEV(EventView*, initializeTools() string m_prefix prefix, postfix) execute(*ev, *UD, *part, prefix, postfix) string m_postfix executeDummy(*ev, *UD, prefix, postfix) execute(ev, previx, postfix) T EVUDObjLooperBase EVUDObjCalcBaseT m_Labels executeObj(*ev, *UD, *part, m_requireLabels prefix, postfix) m_rejectLabels A type of tools which copyUDtoEV calculates object level variables EVUDElectronAll EVUDFinalStateLooper EVUDMuonAll EVUDInferredObjectLooper EVUDParticleJetAll 17
  18. 18. Standard EventView Tool/Module Library • EventViewBuilder (Container package) • EventViewBuilderUtils: Base classes of all tools • EventViewBuilderAlgs: Tool loopers • EventViewConfiguration: High level configuration interface • EventViewInserters: Tools for selection and overlap removal • EventViewDumpers: Output to screen, ntuple, and atlantis • EventViewUserData: Calculators and associators • EventViewTrigger: Tools for reading trigger info • EventViewPerformance: Performance study modules • EventViewCombiners: Combinatorics tools • EventViewSelectors: Event selection tools List growing thanks to contributions from people but we need more people to work on them. 18
  19. 19. EVToolLooper Multiple View (Branch) An example EV analysis Reco • Branch Tool Electron Branch Tau Branch Tau Branch 2 One can define multiple Electron Inserter Tau Inserter Tau Inserter views of an event: Muon Inserter Electron Inserter Electron Inserter • e.g. use different jet in each view. Tau Inserter Muon Inserter Muon Inserter EVToolLooper • Jet Inserter Jet Inserter Jet Inserter Truth e.g. use different object (cone04) (cone04) (cone07) Vtx selection in each view. Filter Truth Inserter • Truth Same analysis executed on Analysis Combinatorics Analysis Selection each EventView. • NtupleDumper - Truth Each view can talk to each KinematicCalc TruthAssociator other: • compare views. NtupleDumper - Electron Branch Ntuple Output Trees • NtupleDumper - Tau Branch order views. Truth NtupleDumper - Tau Branch 2 • Electron Tau Tau2 match views. End • Each view saved into separate Root trees in ntuple. 19
  20. 20. Multiple View (Combinatorics) Another example analysis EVSort Unsorted Sorted Comparator EventView Container EV0 EV1 EV0 EV0 EV0 (ex EV2) Combination EV1 EventView EV1 EV1 Sort EV1 EV2 Tool (ex EV1) EV2 EV2 EV2 (ex EV0) • Multiple EV produced from combinatorics: • e.g. take all combinations of dijets. • e.g. combine one W and a b-jet to obtain top quark. • Manipulation of multiple EV through Multi Input (MI) tools: • e.g. sort EV according to the pt of reconstructed top. • e.g. remove EVs not satisfying condition. • Each view saved to separate tree or all views in one tree. 20
  21. 21. DPD Production • EV analysis can easily produce custom ntuple with desired amount of information. • It is not just copying information but a lot more needs to be done: • Selection of objects • Slimming contents • Adding derived variables • Calibration / correction • you name it. • Various physics groups started making their own ntuple: • They are based on the same tools but the format is slightly different. • Need common package to provide interface for producing DPD. HighPtView. • Provide same look and feel but still support difference in contents. • Central production of default ntuples. Good starting point and reference point. 21
  22. 22. • HighPtView implements a set HighPtView of configurations to manipulate tools/modules in EventViewBuilder. EventView EVBuilder • Provides common interface EventView EventView EventView Performance for DPD building for all Performance Configuration PhysicsView packages. Combiners Study • EventView EventView Jet BuilderUtils Package EventView UserData Electron/Photon Default physics contents Muon (selection cuts) come from EventView EventView BuilderAlgs Dumpers Tau EventView Transformation EventView Selectors EventView Inserters Flavour Tagging performance study. Tools/ Feedback • Physics groups may have their own “view” packages (eg Interface Tools/ HighPtView TopView) for optimization Feedback (override configuration) and Interface Optimization extra functionality (eg tools GroupView for analysis). SUSYView TopView HiggsTo TauTau HiggsTo GamGam • Group view packages will look similar inheriting interface from HighPtView. 22
  23. 23. Part 3 Where is EventView? 23
  24. 24. People Involved • Core Developers: This is a VERY incomplete list of people. • Kyle Cranmer (BNL) There are lots of other people contributing! • Amir Farbin (CERN) • Akira Shibata (QMUL) • Package Developers: • Attila Krasznahorkay (EventViewTrigger) • Jamie Boyd (EventViewPerformance) • Lorenzo Feligioni (Btagging) • Liza Mijovic (Development for release 13) • GroupArea management: • Akira (Lxplus), Fabien Tarrade (BNL), Sandrine Laplace (Lyon) • We have a lot of interaction with • physics groups and performance groups • distributed analysis and Athena core developers. 24
  25. 25. Where is EventView? EventViewTools EventView Trigger HiggsTo HiggsTo Gamma TauTau EventViewCore Gamma EventView TopView Inserters HighPt Physics EventView View Groups EventView BuilderAlgs Performance Groups EventView BuilderAlgs Performance SUSY EventView View Physics Configuration Validation EventView Transformation Event Distributed Analysis View EventView UserData Central Production EventView Combiners EventView Dumpers EventView Selectors Athena Core Developers Analysis EDM Physics Analysis Tools 25
  26. 26. • Interaction with various groups is VERY important to maintain a functional framework: • Performance groups: Maintenance of relevant tools and input of physics contents (object selection, common performance study.) • EDM: Compatibility of the tools to developing EDM. Optimal DPD format for the future • Physics analysis tools: tool development and maintenance. • Core developers: support for new feature request and help on core development. • Central production: production of HighPtView ntuple. • Distributed analysis: support for EV analysis. • Physics validation: validation on standard DPD and RTT on Physics View packages. • Physics groups: development of group tools and modules in PhysicsVies packages. • These things are just beginning to happen. More input and collaboration is required for: • Development and maintenance of PhysicsView packages. • Standardised performance study and selection tools. 26
  27. 27. DA and EventView EventView has a long experience with PANDA distributed framework because It is highly integrated with data management tool (dq2). It supports GroupArea and configurables. It runs athena jobs seamlessly and no book keeping is necessary. 1. Local test with sample fetched with DQ2. athena TopView_Top_jobOptions.py 2. Set up grid/DQ2. source GridSetupLxplus.sh 3. Send job (replace “athena” by “pathena”): pathena TopView_Top_jobOptions.py --inDS csc11.005200.T1_McAtNlo_Jimmy.recon.AOD.v11004103 --outDS user.top_wg.TopView21.T1_McAtNlo_Jimmy.001 27
  28. 28. TopView Use Case 1. Develop analysis based on EventView tools under TopView runtime package: Write C++ tools, write python modules Athena and write job options. Analysis Developmnt 2. Run tests on lxplus with test samples fetched with DQ2 end user tool. 3. Commit changes to CVS and tag. (iterate 1 and 2 until I’m happy; use HN if I need help) Athena Ana. 4. Send PANDA job specifying input/output dataset and a few Submission other parameters. on DA 5. Check status on the web. (if job failed, report Savannah bug and iterate 4 and 5) 6. Fetch results (AANTuple) using DQ2 end user tool and Post-Athena analyze with ROOT locally. Analysis 7. Go back to the AOD/ESD using the AANT and do further analysis. (in future). 28
  29. 29. Two-stage Iterative Analysis Process Again! 1st stage: Define common optimal preparation for a physics group. Includes 2nd stage: Interactive analysis in ROOT with quick turnaround. Use TMVA, Roofit selection, calibration, b-tagging. Use and other tools outside framework. modular analysis framework and Produce final plots to be published. Back- feedback/tools from performance study. navigate through DA if necessary. Parameterise using common tools and python configuration. Feedback / port analysis to Athena Athena Analysis Development on Lxplus Root Analysis at Home AOD Athena Ntuple Ntuple Histogram DPD Production on DA Publish Monitor and retrieve DA Submission Feed back Backnavigation Re-Submission AOD AOD Ath Ntupl DA Submission analysis Group-wise central production on DA system. 29
  30. 30. Part 4 Future Developments 30
  31. 31. • Faster AOD access: EDM in Release 13 • Factor of ~10 in I/O speed expected thanks to transient persistent separation. • Broadens the use case of Athena analysis. • AOD access from root or structured DPD: • Different technology now under test: pAOD, SAN, and new Scott’s recipe. • Broadens the use case of Root analysis and lowers the wall between Athena and Root: higher portability of analysis code. • We will still need common framework just like before! EventView will write out structured format. • ESD/AOD merger • ESD / AOD classes are now merged: identical interface for ESD and AOD analysis. • Can use ESD based tools on AOD objects: e.g. particle identification. 31
  32. 32. Future developments in PANDA • Root analysis support on PANDA • Using PROOF or otherwise • Support for more sites with improvements in operation • BNL has been fully operational for a long time. • Added lcg sites: ANALY_CERN, ANALY_LYON, ANALY_RAL, ANALY_FZK, ANALY_CNAF, ANALY_PIC, ANALY_SARA 32
  33. 33. Tutorial? Let’s see how this all works. 33

×