PythonPython 也可以玩也可以玩 LHCLHC 數據數據
教你用教你用 PythonPython 挑戰挑戰 Higgs MLHiggs ML
Yuan CHAO ( 趙元 )
(National Taiwan University, Taipei, Taiwan)
COSCUP
2014/07/19-20
我是誰?
Yuan CHAO (John)
YChao
...
研究員
高能物理
使用 OSS 做研究 ...
大型強子對撞型加速器?
Large Hadron Collider!
Organisation Européenne
pour la Recherche Nucléaire
CERN
Switzerland
Meyrin
Canton de
Genève
Border of CH/FR
LHC 週長 :
27 公里
地下 50~150 M
LHCLHC
WWWWWW 的出生地的出生地 !!!!!!
SERN
晚上黑壓壓,沒有 24hr 超市 ...
LHCLHC
CERN Ski ClubCERN Ski Club
LHCLHC
( 農地 ... 不能徵收 )
11
Atlas DetectorAtlas Detector
A Toroidal LHC ApparatusA Toroidal LHC Apparatus 超環面儀器超環面儀器
通用型偵測器通用型偵測器
12
緊湊渺子線圈緊湊渺子線圈 CMS DetectorCMS Detector
Compact Muon SolenoidCompact Muon Solenoid 緊湊渺子線圈緊湊渺子線圈
通用型偵測器通用型偵測器
3.8
13
緊湊渺子線圈緊湊渺子線圈 CMS DetectorCMS Detector
Compact Muon SolenoidCompact Muon Solenoid
A general purposed detectorA general purposed detector
3.8
Learning to discover
來自 Atlas 的戰帖
https://www.kaggle.com/c/higgs-boson
http://higgsml.lal.in2p3.fr/files/2014/04/documentation_v1.5.pdf
尋找希格斯粒子
Atlas Higgs ML
Challenge
https://www.kaggle.com/c/higgs-boson
$13,000 & 876 teams
Learning to discover
來自 Atlas 的戰帖
提供 250000 筆模擬訓練數據
與 550000 筆測試用數據
https://www.kaggle.com/c/higgs-boson/leaderboard
聽說有沈不住氣的同事第二天就跑去打槍了 ...
https://www.kaggle.com/c/higgs-boson/leaderboard
什麼是希格斯粒子?
對稱性破壞??
質量的來源???
http://en.wikipedia.org/wiki/Higgs_boson
19
Big News on 2012/07/04Big News on 2012/07/04
Discovery of a New BosonDiscovery of a New Boson
with Masswith Mass ~125 GeV~125 GeV
CERN-HI-1207136_92
20
Congrats to prof. Englert and Higgs!
什麼是希格斯粒子?
BBHNNK?
捕獲野生的 P. Higgs
http://en.wikipedia.org/wiki/Higgs_boson
BEGHHK?
P. Higgs is ASGC prof. C.-C. Lin's advisor.
22
標準模型 簡介標準模型 簡介 Standard ModelStandard Model
~10-18
m
宇宙的尺度 http://htwins.net/scale2/
~10-1
m
膠子光子 W/Z 子 重力子
強作用力強作用力電磁力電磁力 弱作用力弱作用力 重力重力
夸
克
輕
子
奈米 =10-9
m
23
標準模型 簡介標準模型 簡介 Standard ModelStandard Model
http://atlas.kek.jp/sub/photos/Physics/PhotoPhysicsSM.htm
強
子
輕
子
媒
介
子
無
法
單
獨
存
在
The "God-dammed" particle!
構成
pingooo@FNAL
今天物理
到此為止
... 重點放在怎麼玩數據
機器學習?
What & Why?
如何訓練機器?
Supervised vs.
Unsupervised Learning
Supervised Learning
Supervised Learning
Supervised Learning
徵音梅林音源處理
Vowel detection
N
U
E
O
I
A
mei-ka-keng-ken-lian zhun-xi-lai-sou-pian
N
U
E
O
I
A
Unsupervised Learning
The Google Cat
@ ICML'12
Deep Learning
Trained on 16K cores
Done in 3 days
Over 10M YouTube
stills
http://arxiv.org/abs/1112.6209
LHC Data
meets
Machine Learning
電子化之前都靠人工
http://en.wikipedia.org/wiki/Cloud_chamber
數位化
讓電腦自動處理
大量的數據
質子團每秒通過
四千萬次 (40MHz)
平均每次有 15 個對撞
真正有意義的對撞約
只有百萬分之一
攏係靠電腦選的!
37
檢視檢視 KaggleKaggle 挑戰數據挑戰數據
Data files provided on the Kaggle website:Data files provided on the Kaggle website:
Training datasetTraining dataset
InIn CSVCSV formatformat
250000 events250000 events
ID +ID + 30 features30 features
WeightedWeighted events!!!events!!!
Class label: s, bClass label: s, b
Test datasetTest dataset
550000 events550000 events
Same formatSame format
random_submissionrandom_submission
Sample for evaluationSample for evaluation
AMS MetricAMS Metric
Python script for competition evaluation metricPython script for competition evaluation metric
https://www.kaggle.com/c/higgs-boson/data
38
ROOTROOT
RROOTOOT OObject-bject-OOrientedriented TToolkitoolkit
Data Analysis toolData Analysis tool
Written in C++ (millions of lines)Written in C++ (millions of lines)
Open sourceOpen source
Integrated C++ interpreterIntegrated C++ interpreter
File formatsFile formats
I/O handling, graphics, plotting,I/O handling, graphics, plotting,
math, histogram binning, eventmath, histogram binning, event
display, geometric navigationdisplay, geometric navigation
Powerful fitting (RooFit) andPowerful fitting (RooFit) and
statistical (RooStats) packagesstatistical (RooStats) packages
In use by most of HEP experimentsIn use by most of HEP experiments
Standard tool for producing physicsStandard tool for producing physics
results at LHCresults at LHC
New tools for model creation andNew tools for model creation and
combinationscombinations
http://root.cern.ch/drupal/
39
pyROOTpyROOT
RROOTOOT OObject-bject-OOrientedriented TToolkitoolkit
Python binding for ROOTPython binding for ROOT
就算你不是慣就算你不是慣 CC 也沒問題!也沒問題!
All the booking and plottingAll the booking and plotting
functions have correspondingfunctions have corresponding
python bindingspython bindings
You can also use the sameYou can also use the same
data structure as used to be in C++data structure as used to be in C++
http://root.cern.ch/drupal/
40
TMVATMVA
Multi-variate analysis tool-kitMulti-variate analysis tool-kit
Based on supervised learningBased on supervised learning
Embedded in ROOTEmbedded in ROOT
Easy training and testingEasy training and testing
Providing various classifiersProviding various classifiers
Linear Discriminant (LD)Linear Discriminant (LD)
Artificial Neural Networks (NN)Artificial Neural Networks (NN)
Boosted Decision Trees (BDT)Boosted Decision Trees (BDT)
......
http://tmva.sourceforge.net
/
41
pyTMVApyTMVA
Multi-variate analysis tool-kitMulti-variate analysis tool-kit
用用 PythonPython 也可以!也可以!
Providing various classifiersProviding various classifiers
Linear Discriminant (LD)Linear Discriminant (LD)
Artificial Neural Networks (NN)Artificial Neural Networks (NN)
Boosted Decision Trees (BDT)Boosted Decision Trees (BDT)
......
http://tmva.sourceforge.net
/
42
Input VariablesInput Variables
43
Input VariablesInput Variables
44
Input VariablesInput Variables
45
Input VariablesInput Variables
46
Input VariablesInput Variables
47
Correlation MatrixCorrelation Matrix
48
TMVA OutputsTMVA Outputs
TMVA by default takes ½ of sample for training and
the other ½ for performance tests.
49
TMVA OutputsTMVA Outputs
TMVA by default takes ½ of sample for training and
the other ½ for performance tests.
50
還有什麼工具?還有什麼工具?
Pure Python ToolsPure Python Tools
SciPy (NumPy, Matplotlib)SciPy (NumPy, Matplotlib)
Scientific computing with PythonScientific computing with Python
Interactive operation withInteractive operation with IPythonIPython
Creating & manipulating dataCreating & manipulating data
Matlab-like plotting withMatlab-like plotting with MatplotLibMatplotLib
SciKit-LearnSciKit-Learn
Machine learning in PythonMachine learning in Python
Cooperate with SciPy, NumPy,Cooperate with SciPy, NumPy,
matplotlib...matplotlib...
Multi-class classificationMulti-class classification
RegressionRegression
ClusteringClustering
And more...And more...
51
視覺化函式庫視覺化函式庫
MatplotLib --MatplotLib -- 提供類似提供類似 matlabmatlab 語法的繪圖工具語法的繪圖工具
Fork Me on GitHub!
https://github.com/yuanchao/pyHiggsML
You could also
Win a Prize!!!
也許你有機會與近三千位不認識的人成為論文共同作者
Open Data
Open Access
Open Source
研究成果開放取用
取之於民、與民享之
以上
謝謝
Remerci de
Votre
Attention
“Big data is like teenage sex: everyone talks
about it, nobody really knows how to do it,
everyone thinks everyone else is doing it, so
everyone claims they are doing it...”
- Dan Ariely (Duke)
58
Installing ROOTInstalling ROOT
Get the ROOT binary for UbuntuGet the ROOT binary for Ubuntu
Go to here:Go to here:
http://sourceforge.net/projects/cernrootdebs/http://sourceforge.net/projects/cernrootdebs/
Download the i386/x86_64 package:Download the i386/x86_64 package:
Click on "Files" → "32bits!" → "root_5.32.00_i386.deb"Click on "Files" → "32bits!" → "root_5.32.00_i386.deb"
Open a terminalOpen a terminal
Type in the following commands:Type in the following commands:
$ cd Download/$ cd Download/
$ sudo dpkg -i root_5.32.00_i386.deb$ sudo dpkg -i root_5.32.00_i386.deb ← use your passwd!← use your passwd!
$ sudo apt-get install libssl0.9.8$ sudo apt-get install libssl0.9.8
$ sudo apt-get install libjpeg62$ sudo apt-get install libjpeg62
$ source /opt/root/bin/thisroot.sh$ source /opt/root/bin/thisroot.sh ← you can put in ~/.bashrc← you can put in ~/.bashrc
You can run root now:You can run root now:
$ root -l$ root -l ← " -l" means no splash window← " -l" means no splash window
root [0]root [0] TBrowser tTBrowser t ← make sure no error messages← make sure no error messages
LHCLHC
LHCLHC 確認希格斯粒子與標準模型相容確認希格斯粒子與標準模型相容 ......
尚未發現微觀黑洞或超對稱的存在尚未發現微觀黑洞或超對稱的存在 ......
http://cdsweb.cern.ch/record/1428128?ln=en

用 Python 玩 LHC 公開數據