用 Python 玩 LHC 公開數據

PythonPython 也可以玩也可以玩 LHCLHC 數據數據
教你用教你用 PythonPython 挑戰挑戰 Higgs MLHiggs ML
Yuan CHAO ( 趙元 )
(National Taiwan University, Taipei, Taiwan)
COSCUP
2014/07/19-20

我是誰？
Yuan CHAO (John)
YChao
...

研究員
高能物理
使用 OSS 做研究 ...

大型強子對撞型加速器？
Large Hadron Collider!

Organisation Européenne
pour la Recherche Nucléaire
CERN
Switzerland

Meyrin
Canton de
Genève
Border of CH/FR

LHC 週長 :
27 公里
地下 50~150 M

LHCLHC
WWWWWW 的出生地的出生地 !!!!!!
SERN
晚上黑壓壓，沒有 24hr 超市 ...

LHCLHC
CERN Ski ClubCERN Ski Club

LHCLHC
( 農地 ... 不能徵收 )

11
Atlas DetectorAtlas Detector
A Toroidal LHC ApparatusA Toroidal LHC Apparatus 超環面儀器超環面儀器
通用型偵測器通用型偵測器

12
緊湊渺子線圈緊湊渺子線圈 CMS DetectorCMS Detector
Compact Muon SolenoidCompact Muon Solenoid 緊湊渺子線圈緊湊渺子線圈
通用型偵測器通用型偵測器
3.8

13
緊湊渺子線圈緊湊渺子線圈 CMS DetectorCMS Detector
Compact Muon SolenoidCompact Muon Solenoid
A general purposed detectorA general purposed detector
3.8

Learning to discover
來自 Atlas 的戰帖
https://www.kaggle.com/c/higgs-boson
http://higgsml.lal.in2p3.fr/files/2014/04/documentation_v1.5.pdf

尋找希格斯粒子
Atlas Higgs ML
Challenge
https://www.kaggle.com/c/higgs-boson
$13,000 & 876 teams

Learning to discover
來自 Atlas 的戰帖
提供 250000 筆模擬訓練數據
與 550000 筆測試用數據
https://www.kaggle.com/c/higgs-boson/leaderboard

聽說有沈不住氣的同事第二天就跑去打槍了 ...
https://www.kaggle.com/c/higgs-boson/leaderboard

什麼是希格斯粒子？
對稱性破壞？？
質量的來源？？？
http://en.wikipedia.org/wiki/Higgs_boson

19
Big News on 2012/07/04Big News on 2012/07/04
Discovery of a New BosonDiscovery of a New Boson
with Masswith Mass ~125 GeV~125 GeV
CERN-HI-1207136_92

20
Congrats to prof. Englert and Higgs!

什麼是希格斯粒子？
BBHNNK?
捕獲野生的 P. Higgs
http://en.wikipedia.org/wiki/Higgs_boson
BEGHHK?
P. Higgs is ASGC prof. C.-C. Lin's advisor.

22
標準模型簡介標準模型簡介 Standard ModelStandard Model
~10-18
m
宇宙的尺度 http://htwins.net/scale2/
~10-1
m
膠子光子 W/Z 子重力子
強作用力強作用力電磁力電磁力弱作用力弱作用力重力重力
夸
克
輕
子
奈米 =10-9
m

23
標準模型簡介標準模型簡介 Standard ModelStandard Model
http://atlas.kek.jp/sub/photos/Physics/PhotoPhysicsSM.htm
強
子
輕
子
媒
介
子
無
法
單
獨
存
在
The "God-dammed" particle!
構成
pingooo@FNAL

今天物理
到此為止
... 重點放在怎麼玩數據

如何訓練機器？
Supervised vs.
Unsupervised Learning

Supervised Learning
徵音梅林音源處理
Vowel detection
N
U
E
O
I
A
mei-ka-keng-ken-lian zhun-xi-lai-sou-pian
N
U
E
O
I
A

Unsupervised Learning
The Google Cat
@ ICML'12
Deep Learning
Trained on 16K cores
Done in 3 days
Over 10M YouTube
stills
http://arxiv.org/abs/1112.6209

LHC Data
meets
Machine Learning

電子化之前都靠人工
http://en.wikipedia.org/wiki/Cloud_chamber

數位化
讓電腦自動處理
大量的數據

質子團每秒通過
四千萬次 (40MHz)
平均每次有 15 個對撞

真正有意義的對撞約
只有百萬分之一

37
檢視檢視 KaggleKaggle 挑戰數據挑戰數據
Data files provided on the Kaggle website:Data files provided on the Kaggle website:
Training datasetTraining dataset
InIn CSVCSV formatformat
250000 events250000 events
ID +ID + 30 features30 features
WeightedWeighted events!!!events!!!
Class label: s, bClass label: s, b
Test datasetTest dataset
550000 events550000 events
Same formatSame format
random_submissionrandom_submission
Sample for evaluationSample for evaluation
AMS MetricAMS Metric
Python script for competition evaluation metricPython script for competition evaluation metric
https://www.kaggle.com/c/higgs-boson/data

38
ROOTROOT
RROOTOOT OObject-bject-OOrientedriented TToolkitoolkit
Data Analysis toolData Analysis tool
Written in C++ (millions of lines)Written in C++ (millions of lines)
Open sourceOpen source
Integrated C++ interpreterIntegrated C++ interpreter
File formatsFile formats
I/O handling, graphics, plotting,I/O handling, graphics, plotting,
math, histogram binning, eventmath, histogram binning, event
display, geometric navigationdisplay, geometric navigation
Powerful fitting (RooFit) andPowerful fitting (RooFit) and
statistical (RooStats) packagesstatistical (RooStats) packages
In use by most of HEP experimentsIn use by most of HEP experiments
Standard tool for producing physicsStandard tool for producing physics
results at LHCresults at LHC
New tools for model creation andNew tools for model creation and
combinationscombinations
http://root.cern.ch/drupal/

39
pyROOTpyROOT
RROOTOOT OObject-bject-OOrientedriented TToolkitoolkit
Python binding for ROOTPython binding for ROOT
就算你不是慣就算你不是慣 CC 也沒問題！也沒問題！
All the booking and plottingAll the booking and plotting
functions have correspondingfunctions have corresponding
python bindingspython bindings
You can also use the sameYou can also use the same
data structure as used to be in C++data structure as used to be in C++
http://root.cern.ch/drupal/

40
TMVATMVA
Multi-variate analysis tool-kitMulti-variate analysis tool-kit
Based on supervised learningBased on supervised learning
Embedded in ROOTEmbedded in ROOT
Easy training and testingEasy training and testing
Providing various classifiersProviding various classifiers
Linear Discriminant (LD)Linear Discriminant (LD)
Artificial Neural Networks (NN)Artificial Neural Networks (NN)
Boosted Decision Trees (BDT)Boosted Decision Trees (BDT)
......
http://tmva.sourceforge.net
/

41
pyTMVApyTMVA
Multi-variate analysis tool-kitMulti-variate analysis tool-kit
用用 PythonPython 也可以！也可以！
Providing various classifiersProviding various classifiers
Linear Discriminant (LD)Linear Discriminant (LD)
Artificial Neural Networks (NN)Artificial Neural Networks (NN)
Boosted Decision Trees (BDT)Boosted Decision Trees (BDT)
......
http://tmva.sourceforge.net
/

42
Input VariablesInput Variables

43

44

45

46

47
Correlation MatrixCorrelation Matrix

48
TMVA OutputsTMVA Outputs
TMVA by default takes ½ of sample for training and
the other ½ for performance tests.

49
TMVA OutputsTMVA Outputs
TMVA by default takes ½ of sample for training and
the other ½ for performance tests.

50
還有什麼工具？還有什麼工具？
Pure Python ToolsPure Python Tools
SciPy (NumPy, Matplotlib)SciPy (NumPy, Matplotlib)
Scientific computing with PythonScientific computing with Python
Interactive operation withInteractive operation with IPythonIPython
Creating & manipulating dataCreating & manipulating data
Matlab-like plotting withMatlab-like plotting with MatplotLibMatplotLib
SciKit-LearnSciKit-Learn
Machine learning in PythonMachine learning in Python
Cooperate with SciPy, NumPy,Cooperate with SciPy, NumPy,
matplotlib...matplotlib...
Multi-class classificationMulti-class classification
RegressionRegression
ClusteringClustering
And more...And more...

51
視覺化函式庫視覺化函式庫
MatplotLib --MatplotLib -- 提供類似提供類似 matlabmatlab 語法的繪圖工具語法的繪圖工具

Fork Me on GitHub!
https://github.com/yuanchao/pyHiggsML

You could also
Win a Prize!!!
也許你有機會與近三千位不認識的人成為論文共同作者

Open Data
Open Access
Open Source
研究成果開放取用
取之於民、與民享之

“Big data is like teenage sex: everyone talks
about it, nobody really knows how to do it,
everyone thinks everyone else is doing it, so
everyone claims they are doing it...”
- Dan Ariely (Duke)

58
Installing ROOTInstalling ROOT
Get the ROOT binary for UbuntuGet the ROOT binary for Ubuntu
Go to here:Go to here:
http://sourceforge.net/projects/cernrootdebs/http://sourceforge.net/projects/cernrootdebs/
Download the i386/x86_64 package:Download the i386/x86_64 package:
Click on "Files" → "32bits!" → "root_5.32.00_i386.deb"Click on "Files" → "32bits!" → "root_5.32.00_i386.deb"
Open a terminalOpen a terminal
Type in the following commands:Type in the following commands:
$ cd Download/$ cd Download/
$ sudo dpkg -i root_5.32.00_i386.deb$ sudo dpkg -i root_5.32.00_i386.deb ← use your passwd!← use your passwd!
$ sudo apt-get install libssl0.9.8$ sudo apt-get install libssl0.9.8
$ sudo apt-get install libjpeg62$ sudo apt-get install libjpeg62
$ source /opt/root/bin/thisroot.sh$ source /opt/root/bin/thisroot.sh ← you can put in ~/.bashrc← you can put in ~/.bashrc
You can run root now:You can run root now:
$ root -l$ root -l ← " -l" means no splash window← " -l" means no splash window
root [0]root [0] TBrowser tTBrowser t ← make sure no error messages← make sure no error messages

LHCLHC
LHCLHC 確認希格斯粒子與標準模型相容確認希格斯粒子與標準模型相容 ......
尚未發現微觀黑洞或超對稱的存在尚未發現微觀黑洞或超對稱的存在 ......
http://cdsweb.cern.ch/record/1428128?ln=en

用 Python 玩 LHC 公開數據

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (20)

Similar to 用 Python 玩 LHC 公開數據

Similar to 用 Python 玩 LHC 公開數據 (20)

More from Yuan CHAO

More from Yuan CHAO (10)

用 Python 玩 LHC 公開數據