Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DPF 2017: GPUs in LHCb for Analysis

41 views

Published on

Datasets with millions of events in charm decays at LHCb have prompted the development of powerful fitting and analysis tools capable of handling unbinned datasets using GPUs and multithreaded architectures.

GooFit, the original GPU fitting program with a familiar syntax resembling classic RooFit, has undergone significant redesign and has expanded physics and computing capabilities. The performance has been improved and tested on a variety of systems. GooFit 2.0 is easier than ever to install, develop, and use on any system.

A new templated header-only library, Hydra, provides highly optimized general framework for fits, Monte Carlo generation, integration, and more. The design and benefits of this system along with initial tests will be shown.

Finally, a model-independent search for direct CP violation using an unbinned approach called an energy test was performed directly using the Thrust library (which both of the previous packages are based on). Public results from this analysis and performance comparisons will be presented.

Published in: Science
  • Be the first to comment

  • Be the first to like this

DPF 2017: GPUs in LHCb for Analysis

  1. 1. GPUs in LHCb for Analysis Henry F. Schreiner1 on behalf of the LHCb collaboration August 3, 2017 1University of Cincinnati DPF 2017
  2. 2. Lighting Introduction to GPUs NVIDIA GPUs • Programing language: CUDA • Massively parallel identical operations • Separate memory model (coprocessor) Name Stream processors Clock TFLOPS Cost Gamer GTX 1050 Ti 768 1290 Mhz 1.98 $150 GTX 1080 Ti 3,584 1596 Mhz 11.3 $850 Server Tesla K40 2,880 745 Mhz 4.29 $3,000 Tesla P100 3,584 1329 Mhz 9.3 $10,000 1/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  3. 3. GooFit CPU and GPU fitting package Hydra CPU and GPU system for HEP computation Manet Energy test GPU code 2/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  4. 4. Introduction GooFit /GooFit/GooFit LGPLv3 • Designed for speed; resembles the popular RooFit package in ROOT • Built for CUDA or OpenMP using the Thrust library • Binned and unbinned fits; 3- and 4-body time integrated and dependent analyses • Composed in C++ 2.1 (Python coming soon) 3/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  5. 5. Reduce Time to Insight GooFit 1 2 4 12 24 48 44 85 213 418 843 OpenMP threads on 24 core Xeon Time[s] πππ0 3-body 16 amplitudes • Original RooFit code: 19,489 s CPU Core 2 Duo 1,159 s GPU GeForce GTX 1050 Ti 86.4 s GPU Tesla K40 64.0 s MPI Tesla K40 ×2 39.3 s GPU Tesla P100 20.3 s 1 2 4 12 24 48 36 60 118 336 625 1,240 OpenMP threads on 24 core Xeon Time[s] ZachFit: D∗+ − D BaBar measurement • 142,576 events in unbinned fit CPU Core 2 Duo 738 s GPU GeForce GTX 1050 Ti 60.3 s GPU Tesla K40 96.6 s MPI Tesla K40 ×2 54.3 s GPU Tesla P100 23.5 s [CHEP 2013] 4/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  6. 6. New features GooFit Modernization 2013 2014 2015 2016 2017 0.1 0.2 0.3 OpenMP 0.4 Work in forks Minor updates 1.0 CMake 2.0 CMake: New build features • IDEs, macOS, multiple backends • Datafiles auto-download • Auto-library download and discovery • Unit tests, Docker, CI builds • /CLIUtils/cmake • /GooFit/Minuit2 New design features • C++11, code cleanup • Colorful logging • /CLIUtils/CLI11 • Optimization warnings • MPI support • Optimizations for newer NVIDIA cards 5/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  7. 7. New Physics Features GooFit Three body time-dependent amplitude analyses • Mixing in D0 → π+π−π0 time-dependent amplitude analysis (BaBar) [Phys.Rev. D93 (2016) no.11, 112014] • Mixing and CP violation search in D0 → K0 S π−π− [CERN-THESIS-2015-348] (paper in preparation) Four body time-integrated and time-dependent amplitude analyses • Mixing parameters in D0 → K+π−π+π− [CHEP 2016] Toy Monte Carlo generation using /MultithreadCorner/MCBooster • MIPWA in GooFit, such as D+ → h+h+h+ [CHEP 2016] 6/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  8. 8. Easy To Get Started GooFit docker run -it alpine apk add --no-cache make cmake g++ git git clone --branch=stable https://github.com/GooFit/GooFit.git cd GooFit make Simple installation • More systems available on • Or use Docker images: goofit/goofit-omp and goofit/goofit-cuda Python Install 2.1 pip install scikit-build cmake pip install -v goofit 7/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  9. 9. Plans GooFit Compose PDFs Backend Python bindings • Interface to composition • Working prototype in GooFit 2.0 • All PDFs added for 2.1 • Pythonization of objects ongoing • Converting/adding examples PDF rework • Work by Bradley Hittle at Ohio Supercomputer Center • Simpler PDF authoring • Easier to alter backend Future work • Add Hydra (optional at first) 8/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  10. 10. Python GooFit from goofit import * import numpy as np xvar = Variable("xvar", -10, 10) xdata = UnbinnedDataSet(xvar) npdata = np.random.normal(1, 2.5, 100000) xdata.from_numpy([npdata], filter=True) mean = Variable("mean", 0, -10, 10) sigma = Variable("sigma", 1, 0, 5) gauss = GaussPdf("gauss", xvar, mean, sigma) exppdf.fitTo(data) grid, values = gauss.evaluatePdf(xval) −10 −5 0 5 10 xvar Data for red line PDF plot 9/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  11. 11. Introduction Hydra HYDRA Multithreaded Data Analysis Framework /MultithreadCorner/Hydra GPLv3 • Header only templated C++11 library • For parallel HEP data analysis on GPUs and CPUs • Uses variadic version of Thrust and CUDA 8 • Supports all Thrust backends: CUDA, OpenMP, TBB, CPP 2.0 (runtime selection) • Developed by A. Augusto Alves Jr., replaces /MultithreadCorner/MCBooster 10/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  12. 12. Design Hydra Speed up: 15x to 250x depending on algorithm, problem size, and device Features • Phase-space Monte Carlo generation • Multidimensional PDF sampling • Function evaluation over multiple dimensions • Interface to Minuit2 minimization • Numerical integration 2.0 (advanced) Design • Designed using static polymorphism • Clean and concise • No explicit backend coding needed • Interfaces hard to use incorrectly • Single source for multiple backends • Structure of arrays (SOA) helper 2.0 11/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  13. 13. Composition Details Hydra User formulas as functors • Functors are created by the user • C++11 lambda functions wrapped • Supports caching • Arithmetic and composition overloaded • No limit to number of functors • Named parameters 2.0 Data • Organized in memory to support coalesced access and vectorization Integrators • Flat Monte Carlo sampling • Vegas-like self-adaptive importance • Gauss-Kronrod quadrature 2.0 • Genz-Malik quadrature 2.0 12/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  14. 14. Examples Hydra data Entries 2e+07 Mean 5.499 Std Dev 3.694 X 0 2 4 6 8 10 12 14 Yield 100 200 300 400 500 600 700 800 900 3 10× data Entries 2e+07 Mean 5.499 Std Dev 3.694 data Entries 2e+07 Mean 5.499 Std Dev 3.694 20M maximum likelihood unbinned fit • Tesla K40: 4.865 seconds • Xeon 2.5 Ghz 1 thread: 299.9 seconds • 63 times faster Number of events 1 2 3 4 5 6 7 8 9 10 6 10× Duration[ms] 1 10 2 10 3 10 Number of events 1 2 3 4 5 6 7 8 9 10 6 10× Speed-upGPUvsCPU 50 100 150 200 250 300 GPU CPU speed-up 3-body phase space • Tesla K40 • Xeon 2.5 Ghz 1 thread • Well over 200 times faster 13/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  15. 15. Code Fragments Hydra // Creating a parameter: named arguments std::string Mean("Mean"); auto mean = Parameter::Create().Name(Mean).Value(3).Limits(1, 4); // Registering parameters with Hydra UserParameters upar; upar.AddParameter(&mean); // Making a PDF and FCN Gauss gaussian(mean, sigma, 0, kFalse); auto modelFCN = make_loglikehood_fcn(gaussian, data_d.begin(), data_d.end()); // Minuit2 minimization MnMinimize minimize(modelFCN, upar.GetState(), strategy); FunctionMinimum fmin(minimize(iterations, tolerance/1000)); 14/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  16. 16. Introduction Manet Manet Manchester Energy Test D0 → π−π+π0 D0 → π+π−π+π− Energy Test • An unbinned model-independent statistical method • Searches for time-integrated CP violation in multi-body decays • Made possible in reasonable computation time using GPUs • Two analyses published using Manet 15/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  17. 17. Procedure Manet T ≈ Matter decay 1 n (n − 1) n i,j>i ψij + 1 ¯n (¯n − 1) ¯n i,j>i ψij Antimatter decay − Between events 1 n¯n n,¯n i,j ψij Test Statistic • ψij ≡ e−d2 ij /2σ2 is Gaussian with tunable width • dij is distance between two events in 3-body phase space • Sum of weighted distances among events • ψ goes down as distance increases, so T is large for CP asymmetry 16/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  18. 18. Test Statistic Simulation Manet Simulation: D0 → π− π+ π0 [Phys. Lett. B 740 (2015) 158] • 2% CP violation in amplitude, T (left) and significance (right) 17/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  19. 19. Three Body Results Manet [Phys. Lett. B 740 (2015) 158] Results • CP symmetry: p = (2.6 ± 0.5)% • Best sensitivity in single experiment Manet [J.Phys. G44 (2017) no.8, 085001] • Tesla K40: 30 minutes for 1 M events • manet.hepforge.org 18/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  20. 20. Four Body Simulation Manet 0 1 0 1 ]σSignificance[ -3 -2 -1 0 1 2 3 ]2c)[GeV/3π2π1π(m 0.5 1 1.5 )2cCandidates/(0.03GeV/ 0 2 4 6 8 10 12 14 16 18 3 10× LHCb (c) simulation 0 1 ]σSignificance[ -3 -2 -1 0 1 2 3 ]2c)[GeV/2π1π(m 0.4 0.6 0.8 1 1.2 )2cCandidates/(0.02GeV/ 0 2 4 6 8 10 12 14 16 3 10× LHCb (e) simulation 0 1 0 1 ]σSignificance[ -3 -2 -1 0 1 2 3 ]2c)[GeV/3π2π1π(m 0.5 1 1.5 )2cCandidates/(0.03GeV/ 0 2 4 6 8 10 12 14 16 18 3 10× LHCb (d) simulation 0 1 ]σSignificance[ -3 -2 -1 0 1 2 3 ]2c)[GeV/2π1π(m 0.4 0.6 0.8 1 1.2 )2cCandidates/(0.02GeV/ 0 2 4 6 8 10 12 14 16 18 3 10× LHCb (f) simulation Simulation • 3◦ phase CP violation (both) • P-even in S-wave a1(1260)+ (left) • P-odd in P-wave ρ0(770)ρ0(770) (right) [Phys.Lett. B769 (2017) 345-356] See CP violation and mixing in charm at LHCb by Riccardo Cenci: Quark and Lepton Flavor 14:30 19/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  21. 21. Four Body Results Manet 0 1 0 1 ]σSignificance[ -3 -2 -1 0 1 2 3 ]2c)[GeV/3π2π1π(m 0.5 1 1.5 )2cCandidates/(0.03GeV/ 0 2 4 6 8 10 12 14 16 3 10× LHCb (c) 0 1 ]σSignificance[ -3 -2 -1 0 1 2 3 ]2c)[GeV/2π1π(m 0.4 0.6 0.8 1 1.2 )2cCandidates/(0.02GeV/ 0 2 4 6 8 10 12 14 16 18 3 10× LHCb (e) 0 1 0 1 ]σSignificance[ -3 -2 -1 0 1 2 3 ]2c)[GeV/3π2π1π(m 0.5 1 1.5 )2cCandidates/(0.03GeV/ 0 2 4 6 8 10 12 14 16 3 10× LHCb (d) 0 1 ]σSignificance[ -3 -2 -1 0 1 2 3 ]2c)[GeV/2π1π(m 0.4 0.6 0.8 1 1.2 )2cCandidates/(0.02GeV/ 0 2 4 6 8 10 12 14 16 18 3 10× LHCb (f) Final results • 3.0 fb−1 Run 1 • p-value: (4.6 ± 0.5)% P-even • p-value: (0.6 ± 0.2)% P-odd • CP non-conservation: 2.7σ • First test for P-odd [Phys.Lett. B769 (2017) 345-356] See CP violation and mixing in charm at LHCb by Riccardo Cenci: Quark and Lepton Flavor 14:30 20/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  22. 22. Summary Summary HYDRA Multithreaded Data Analysis Framework Manet Manchester Energy Test GooFit • Now easier to use • Many examples & PDFs • Active development • Python bindings soon Hydra • New lower-level library • Templated header only • Multiple backends • Versatile toolkit Manet • Energy test method • High sensitivity for CP • Used in 3- and 4-body • Possible using GPUs 21/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  23. 23. Questions? 21/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  24. 24. Other Tools Backup IPanema-β • A Python CUDA package for fits • A collection of examples and helpers • https://arxiv.org/abs/1706.01420 22/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  25. 25. Running Timing Examples Backup General notes • You can pick cards with the prefix: CUDA_VISIBLE_DEVICES=0,1 πππ0 • time ./pipipi0DPFit canonical dataFiles/cocktail_pp_0.txt --blindSeed=0 • time mpiexec -np 2 ./pipipi0DPFit canonical dataFiles/cocktail_pp_0.txt --blindSeed=0 ZachFit • time ./zachFit 0 1 • time mpiexec -np 2 ./zachFit 0 1 23/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  26. 26. CMake Backup Build features • Travis CI build • Coverage, docs • Unit tests • Docker support CMake features • IDE support (Xcode, etc.) • Library configuration • Multiple compiler support • Debug/tidy/format. . . • Datafiles from releases Git submodules • Libraries are submodules • Automatic checkout by CMake build • Separate CMake folder ( /CLIUtils/cmake) 24/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  27. 27. Cleanup Backup C++11 • Limited to CUDA 7.0+ • Simpler code • Used Clang-Tidy to convert (CMake 3.6+ integration) Standalone: /GooFit/Minuit2 • Newly forked from ROOT 6.08 • CMake build, no other changes • Already being used outside GooFit Cleanup • Readability: Clang-Format • Moved all code to namespace • Compile-time logging choice /fmtlib/fmt • Smart color output /agauniyal/rang • Removed custom classes and iterators (complex, etc) 25/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  28. 28. CLI11 Backup /CLIUtils/CLI11 • No dependencies • Compiles to single header file • Nested subcommands • Configuration files • 100% test coverage • CI tests on macOS/Linux/Windows • + GooFit’s features ./MyAnalysis generate_toy --params=file.ini --release_K892_mass --A12=0.3 --plot GooFit::Application • Auto logging • Optimization warnings • GPU switches • MPI support • Completely optional 26/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017
  29. 29. New Features Backup Expanded physics tools • Three body time-dependent amplitude analyses • Four body time-integrated and time-dependent amplitude analyses • Toy Monte Carlo generation using MCBooster Caching: /bryancatanzaro/generics • Support for LDG caching • LDG generalized form • Performance boost for mid-age cards MPI • Available for Application • Supports multiple GPUs /MultithreadCorner/MCBooster is deprecated in favor of /MultithreadCorner/Hydra 27/21Henry F. Schreiner on behalf of the LHCb collaboration GPUs in LHCb for Analysis August 3, 2017

×