Py “Baseball” Data
PyCon mini Hirosima 2016
Python
Shinichi Nakagawa(Baseball Analyst&Pythonista)
Starting Member
• Who am I?( )
• PyData
• PyData / #
• Python
• PyData + (FIP/RC27)
•
Who am I?
• Shinichi Nakagawa(@shinyorke)
• Python , Hack ※ Python
• HR .
• Python/Agile/PyData/SABRmetrics( )
•
• ( ) .
• ( ) HR .
• 1 2
.
• (Django) Python .
• https://service.visasq.com
• https://tech.visasq.com
•
•
•
• &
• etc…
• Web Python
•
• IPython + pandas 

(Hello World )
•
• 

.

.
• Deep Learning , 

.
• (Pandas )

& .
PyData / #
PyData
“””
PyData
Python Python Library
“””
※@iktakahiro
http://www.slideshare.net/
iktakahiro/pydata-67913897
PyData
• , ,Python 

&( ) .
• , or .
• Excel Python, Deep Learning,
etc… PyData 

PyData ( )
( )
“””
“””
https://ja.wikipedia.org/
wiki/
( )
• , 

• 1970 

, &
• 

( , )
•
• ( , ,FA)
• ( )
•
• ( , etc…)
• ( , J )
× ( )
※ × +
× ( )
※ × +
※
5
• ( - ) = 5 ( )
•
•
•
•
•
.
• ( - ) 5 5
(ry .
• = ( 2 )÷( 2 + 2 )
•
Python×Pandas
Python×pandas
# Python 3 (3.4 ) ( )
$ pip install ipython pandas beautifulsoup4 numpy lxml html5lib
# ipython ( Jupyter )
$ ipython
Python×pandas
#
import pandas as pd
import numpy as np
# ( )
df = pd.read_html('http://baseball.yahoo.co.jp/npb/standings/')
#
df_cl = df[0].drop([0]) #
Python×pandas
#
# ( )
df_cl.columns = ['rank', 'name', 'games', 'win', 'lose', 'draw',
'pct', 'gb', 're_games', 'r', 'er', 'hr', 'sb', 'ba', 'era']
#
df_cl['win'] = df_cl['win'].fillna(0).astype(np.int64) #
df_cl['lose'] = df_cl['lose'].fillna(0).astype(np.int64) #
df_cl['pct'] = df_cl['pct'].fillna(0).astype(np.float64) #
df_cl['r'] = df_cl['r'].fillna(0).astype(np.int64) #
df_cl['er'] = df_cl['er'].fillna(0).astype(np.int64) #
Python×pandas
#
df_cl['difference'] = df_cl['r'] - df_cl['er']
#
df_cl['pythagorean_win_per'] = (df_cl['r'] ** 2) / (df_cl['r'] ** 2
+ df_cl['er'] ** 2)
#
df_cl['pythagorean_win'] = (df_cl['pythagorean_win_per'] *
143).fillna(0).astype(np.int64)
df_cl['pythagorean_lose'] = 143 - df_cl['pythagorean_win']
#
df_cl.sort_values(by='pythagorean_win_per', ascending=False)
https://gist.github.com/Shinichi-Nakagawa/8ff55af83390fcd2e2dd34bcb914868c
( )














×
•
• (+187)
• 5
• /
• DeNA ,
•
• ( )
?
( )
• & (& ) 

• , , ,
•
×PyData
• FIP
• (RC27)
• scrapy CSV
• CSV pandas, seaborn, jupyter & 



( )
FIP(Fielding Independent Pitching)
• , ( )
• , (+ ),
• ( )
• xFIP 

FIP .
FIP( TOP 20)
FIP( & )
FIP(50 Histogram)
FIP(50 Histogram)
FIP(50 Histogram)
FIP
•
•
•
• FIP
• 

FIP ( )
RC27
• 9 1
?
• VS , ?
• RC(Run Created, ) 1
•
RC27 (350 )
RC27 TOP30(350 )
RC27(Histogram)
RC27(Histogram)
RC27(Histogram)
RC27
• 1-6
• RC27 Top30 6
•
•
•
• ( )
• 

6 Top30 

• ,
• ,
, FIP ( )
•
[ ]
• , 

FIP, WHIP, K/BB, etc…
• , 

RC27 3 ( 6 )
•
Py "Baseball" Data - Python
※pandas, Re:dash (& )
MonotaRO TechTalk #4
http://www.kokuchpro.com/event/monotarotech4/
&
Shinichi Nakagawa(Twitter/Facebook/visasQ:@shinyorke)

Py "Baseball" Data入門 - 広島東洋カープ編 #pyconhiro