Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Searching for dark matter in CERN's Large Hadron Collider dataset

642 views

Published on

Slides from talk at GDG.

Published in: Technology
  • Be the first to comment

Searching for dark matter in CERN's Large Hadron Collider dataset

  1. 1. Searching for dark matter in CERN's Large Hadron Collider dataset Lavanya Shukla
  2. 2. What's on the menu today? The Standard Model Sneak Peak into the Large Hadron Collider Machine Learning Challenges Searching for Dark Matter Particle Identification How Can You Get Involved
  3. 3. you can play with datasets from CERN's LHC experiments http://opendata.cern.ch
  4. 4. Particle Physics and the Standard Model
  5. 5. All matter is made up of particles. These particles interact with each other by exchanging other particles associated with the fundamental forces. The Standard Model describes what the universe is made up of and how it holds together. The Standard Model
  6. 6. Pretty much everything we see in the universe is made of up quarks, down quarks and electrons. All the other things exist only for a short amount of time before decaying into other particles. The Standard Model
  7. 7. The 12 fermions are the building blocks of matter. Quarks group of 6 particles, organize together to make protons (2 up, 1 down) and neutrons (1 up, 2 down). Dependent particles, have partial charge so are never seen alone in the wild. Need to combine with other quarks using strong nuclear force to form particles, most commonly in the nucleus. Fermions
  8. 8. The 12 fermions are the building blocks of matter. Leptons group of 6 particles, with electrons being the best known. Independent, have whole units of charge. Not seen in the nucleus. Fermions Err..
  9. 9. Fermions interact with each other through fundamental forces. The 5 bosons are the building blocks of fundamental forces. Strong force (very short range, typically smaller than that of an atomic nucleus) gluons – bind the quarks in protons and neutrons photons – carry electromagnetic force in light Role: Holds atomic nuclei and quarks together. Bosons
  10. 10. Fermions interact with each other through fundamental forces. The 5 bosons are the building blocks of fundamental forces. Bosons Weak force (very long range, but million times weaker than strong force) Carried by Z and W bosons Higgs boson Role: responsible for radioactive decay
  11. 11. Behind the curtains of the Large Hadron Collider
  12. 12. 16 mile long tunnel accelerates protons to almost the speed of light (0.9999999c) and smashes them into each other 4 big detectors ALICE, ATLAS, CMS and LHCb detect particles resulting from proton-proton bunch collisions 40 million bunch collisions per second 10 petabytes of data per year The detector measures the energy and momentum of every particle flying out of the collision event and identify the types of those particles by calculating their mass.
  13. 13. The Standard Model describes what the universe is made up of and how it holds together. The Standard Model Every particle leaves different traces in different sub-detectors of the experiment.
  14. 14. Simulating the LHCb's output http://clangenb.web.cern.ch/clangenb/
  15. 15. It's time for some Machine Learning!
  16. 16. Anomaly detection (data quality and infrastructure monitoring) Detector design optimization (using bayesian optimization, surrogate modeling etc.) Precise and fast particle tracking (single tracks, shower, jets etc.) Fast and accurate data processing, and design of triggers Particle Identification Some Machine Learning Challenges in Particle Physics
  17. 17. Searching for dark matter
  18. 18. If we add up all the stars, planets, galaxies, comets, black holes, dark clouds - everything out there and the gravity doesn't add up. There is something holding the universe and our galaxy together, without which our galaxy would fall apart Dark 'matter' is a misnomer 85% of the gravity in the universe has an unknown source Hi! I'm Fred!
  19. 19. There is something. It interacts with gravity. Mass and gravity go together so it has mass, maybe? Probably is invisible has no interaction with light or electromagnetic force, doesn't emit or reflect light There is a lot of it. So.. Not much really. What do we know for sure? I like playing hard to get
  20. 20. WIMP = weakly interacting massive particle heavy like a neutrino, has 100 - 1000 times the mass of a proton a particle interacts with other matter like the neutrino weak interaction = force between subatomic particles responsible for radioactive decay WIMP theory of Dark Matter Neither absorb nor emit light Don't interact strongly with other particles. But when they encounter each other, they annihilate and make gamma rays, which we can detect.
  21. 21. THE CHALLENGE WE DON'T KNOW WHAT WE'RE LOOKING FOR There are plenty of theories that try to predict presence of dark matter, but we have no inkling of the nature of effect we should be looking for. LACK OF CLEAN, OBSERVABLE DATA The direct and indirect targets that might predict dark matter’s presence are laced with noise and background phenomenon that might lead to misleading results.
  22. 22. We'll use ML to tackle the problem of signal extraction from background noise
  23. 23. Eliminate background noise from base tracks 1. 2. Cluster tracks into neutrino and dark matter interactions. The scale of the problem is massive. 10,000 out of 10 million basetracks are results of electromagnetic showers, rest are noise.
  24. 24. Feynman diagram of a dark matter particle X scattering an electron of lead nuclei. Neutrinos produce similar showers The problem is that when a neutrino interacts with a nucleus, it also produces an electron that get boosted and similar showers are produced. One of the key distinguishing qualities between them is the energy-angle correlation. We apply clustering to distinguish the two electromagnetic showers.
  25. 25. The Dataset
  26. 26. The OPERA dataset has the following features 15 million base tracks Each base track contains: X, Y, Z co-ordinates angles from origin (TX, TY) Signal Background consists of base tracks scattered randomly in space, and has a label=0 Signals consist of base tracks forming a cone shape, has a label=1
  27. 27. In addition we compute the following features distance from the origin (dX, dY, dZ, dTX, dTY) alpha, the angle between directions d, the projection of base track1 Signal has a defined geometric structure Noise is largely random vectors
  28. 28. Eliminate background noise from base tracks 1.
  29. 29. Feature Engineering: Add particle track info to dataset Detect particle track Signal base tracks are part of a larger vector of movement of particles. This particle moves across many layers in a straight line path. Detect parent in case of decay Look for other children particles in the same layer.
  30. 30. Train a model: to classify signal from noise Define Neural Network Train Neural Network Make Predictions
  31. 31. Train classifier: Neural Network to classify signal from noise Output of classifer: Probability of each id being a signal (higher is better) Define and Train XGBoost Train a model: to classify signal from noise
  32. 32. Eliminate background noise from base tracks 1.
  33. 33. Eliminate background noise from base tracks 1.
  34. 34. 2. Cluster tracks into electromagnetic showers that are good candidates for dark matter interactions.
  35. 35. Clustering: 2 Models DBSCAN K-Means Our dataset DBSCAN relies on euclidean distance, which won’t work for our case because we care about relative angles between base tracks, in addition of relative positions.
  36. 36. Clustering: K-Means Find the optimal number of clusters The inertia's rate of decline flattens around k=6 clusters. So we'll train K-Means with 6 clusters.
  37. 37. Clustering: K-Means Train K-Means with 6 clusters Visualize clusters Excellent! From all that background noise, we’ve extracted the signal base tracks that are the best candidates for dark matter particle interactions! w00t!
  38. 38. 2. Cluster tracks into electromagnetic showers that are good candidates for dark matter interactions.
  39. 39. 2. Cluster tracks into electromagnetic showers that are good candidates for dark matter interactions.
  40. 40. Looking ahead to SHiP
  41. 41. Search for Hidden Particles (SHiP) Expected to launch in 2025 Designed more specifically to search for dark matter Would employ the same techniques we used above to reconstruct dark matter particle trajectories from data Hopefully, the new data will allow us to determine with greater confidence and accuracy which of these signal clusters are a result of dark matter interactions! When SHiP starts up, I’ll hope you'll use some of the techniques we've used today and join me in exploring the dataset!
  42. 42. Particle Identification
  43. 43. The Goal Identify the type of a particle associated with a track using responses from different detector systems. There are five particle types: Electron, Proton, Kaon, Pion and Muon. Therefore Particle Identification is a multi- class classification Problem
  44. 44. The Problem The inputs to the classifier are the particle track responses from the 5 sub-detector systems: Tracking system Ring Imaging Cherenkov detector Electromagnetic calorimeter Hadron calorimeter Muon Chambers The outputs of the classifier are six labels, five of them correspond to five different particle types and Ghost is a catch-all for noisy tracks and other particle types.
  45. 45. Getting a little in the weeds Each particle has a certain energy, momentum and mass. Energy and momentum are determined by the speed of the particle, and mass by its type. From the laws of conservations of energy and momentum, we know that when a particle decays: the energy and momentum of the mother particle = the sum of energies and momentums of the daughter particles.
  46. 46. Getting a little in the weeds The sub-detectors estimate the daughter particle’s trajectory, momentum, energy and type. From this we can reconstruct mother particle’s parameters (e.g. the mass) and therefor detect the particle.
  47. 47. Training a neural net and a boosting model
  48. 48. The Dataset 1.2 million data-points, each representing a particle track. 50 features measurements from the sub- detectors derived features from these measurements
  49. 49. AdaBoost Classifier
  50. 50. Neural Net Classifier
  51. 51. Neural Net Classifier Here’s a plot of the ROC curves for all particle classifiers. AdaBoost model performs slightly better than the neural network in this case. AdaBoost Neural Network
  52. 52. Now it's your turn!http://kaggle.com
  53. 53. Thank you! lavanyaai lavanya.ai Twitter.com/ http://

×