Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PyCon JP 2016 Talk#024 en

896 views

Published on

Start hacking finance data with Python
https://pycon.jp/2016/ja/schedule/presentation/24/

sample code:
https://github.com/drillan/pyconjp2016

Published in: Economy & Finance
  • Be the first to comment

PyCon JP 2016 Talk#024 en

  1. 1. Start hacking finance data with Python driller@patraqushe PyConJP 2016 September 22, 2016
  2. 2. About me driller @patraqushe derivative trader 1.5 year 2
  3. 3. Agenda Are you still exhausted from Excel? Don’t be afraid of time series data Using Jupyter Notebook 3
  4. 4. Why Python for Finance? Analyzing data with simple code Substantial libraries(especially Deep leaning) pandas Jupyter Notebook Cooperating with: Scraping, crawling Web frame work Infrastructure 4
  5. 5. Why Python in my case Developed trading tools using Excel Adding new functions and changing rules of exchange became gradually more difficult Python is the most manageable(in my opinion) Pandas is similar to Excel Drastically cheaper using Jupyter Notebooks 5
  6. 6. Are you still exhausted from Excel? ~ Migrate from Excel to Python to improve productivity ~ 6
  7. 7. Generate stock prices using Monte-Carlo simulation Stock price 1,000 yen time remaining until expiration 30 days risk-free interest rate 0.1% annual volatility of stock price 20% sample paths 10,000 -> 50,000 7
  8. 8. Case1-1: Implement Monte-Carlo Simulation in Excel Function 1. Input formula into a Cell to generate random number with geometric Brownian motion(it satisfies the following stochastic differential equation) 𝑑𝑆𝑡 = 𝜇𝑆𝑡 𝑑𝑡 + 𝜎𝑆𝑡 𝐵𝑡 2. Copy above formula count of sample paths 3. Classify the result into bins 4. Count the number of each bins then visualize Sample: Case1-1_1-2.xlsm 8
  9. 9. It's possible to implement Monte-Carlo Simulations using only mathematical formula, however: To increase cell for increasing sample paths To correspond existing cell for adding cells To become heavy and slow by recalculation 9
  10. 10. Case1-2: Implement Monte-Carlo Simulation in VBA Very long code(especially histogram) When changing the layout of an Excel sheet, you have to change all the addresses of related cells(can handle by "name manager" partially) Very slow 10 Sample: Case1-1_1-2.xlsm
  11. 11. Case1-3: Implement Monte-Carlo Simulation in Python Very short code(especially histogram) No need to consider data storage Faster than VBA Sample: Case1-3.ipynb 11
  12. 12. Excel vs. Python Lines of Code: 105 Wall Time: 7.89s More complex Lines of Code: 10 Wall Time: 0.83s More simple 12
  13. 13. Though, Excel has the advantage of… Easy to input Easy to create a template A huge number of users(high data compatibility) 13
  14. 14. Python packages to work with Excel files xlrd xlwt XlsxWriter xlutils openpyxl xlwings ExcelPython 14
  15. 15. There are many packages, but… pandas.read_excel() will almost always be what I want Use other packages for the operation that pandas cannot do Write data to an opened file Operate Cell Draw graphs… 15
  16. 16. Case1-4: Relationship Economic indicator and exchange rate and stock price Open Economic indicator & Stock price Excel file @ vdata.nikkei.com through pandas Economic indicator Real Gross Domestic Product Diffusion Index Currency Pair : USD/JPY Stock: Nikkei Index Visualize by seaborn Sample: Case1-4.ipynb 16
  17. 17. Case 1-5: Relationship ETF/J-REIT purchases and stock prices Open Excel files @ boj.or.jp through pandas Load TSE REIT index and stock Indices price data from k-db site Stock Indices TOPIX JPX400 Nikkei225 Visualize relationship using seaborn Sample: Case1-5.ipynb 17
  18. 18. Use xlwings Read/Write open Excel files Supports Numpy and pandas data types Call Python script from Excel Write Excel User Defined Functions(UDF) in Python Use openpyxl for Cell & Chart operation 18
  19. 19. Call Python script from Excel module function 19
  20. 20. User Defined Functions(UDF) You can use custom functions written in Python! 20
  21. 21. UDF returns multiple values to each cells Using array formula (Ctrl + Shift + Enter) 21
  22. 22. Case1-6: Download stock prices and store in Excel Cells xlwings features Calling Python from Excel Put pandas DataFrame data into Excel Cells Uses syntax close to VBA Get stock prices using pandas_datareader 22 Sample: Case1-6.xlsm
  23. 23. Case 1-7: Create User Defined Functions using Python, and use it in Excel Windows only Install add-in Call function written in Python like an Excel function Fetch Excel Range(multiple Cells) as array data(pandas or Numpy) in UDF function Input multiple return values into Excel Range(multiple Cells) 23 Sample: Case1-7.xlsm
  24. 24. Don’t be afraid of time series data ~Get used to pandas~ 24
  25. 25. Why pandas? Wes McKinney built pandas during his tenure at AQR(a quantitative investment management firm) Enable all these things in one place Data structures with labeled axes supporting automatic or explicit data alignment  Integrated time series functionality One data structure to handle both time series and non-time series data Arithmetic operations and reductions Flexible handling of missing data Merge and other relational operations found in popular database databases(SQL based, for example) Pandas is developed by financial specialists, so it is well suited to analyse financial data 25
  26. 26. Case2-1: Use DatetimeIndex pandas.date_range is very useful to create continuous data Advantage of DatetimeIndex : Specify various types when selecting a location datetime.date, datetime.datetime, datetime.time, str, int and so on… Able to parse most known formats(similar to parsing by dateutil.parser) Allows slicing into year, month, etc Handles missing values Sample: Case2-1_2.ipynb 26
  27. 27. Case2-2: Create OHLC data and covert time range Not that easy to create OHLC Convert time-series data into frequencies using the .resample() method  .resample() performs resampling operations during frequency conversion Daily, Weekly, 30minute, 1hour, Quarter, etc There are tips to convert between different OHLC data representations Sample: Case2-1_2.ipynb 27
  28. 28. Resampling image 1/4 100 99 102 105 102 103 105 106 104 102 Daily Open High Low Close 100 105 99 102 Weekly 28
  29. 29. Resampling image 2/4 100 99 102 105 102 103 105 106 104 102 Daily Open High Low Close 100 105 99 102 103 106 102 102 Weekly 29
  30. 30. Resampling image 3/4 100 99 102 105 102 103 105 106 104 102 Daily Open High Low Close 100 105 99 102 103 106 102 102 101 102 97 98 98 100 107 105 106 110 106 108 109 115 107 112 110 120 110 115 113 117 110 115 110 111 102 103 100 101 94 96 Weekly Open High Low Close 100 106 97 105 Monthly 30
  31. 31. Resampling image 4/4 100 99 102 105 102 103 105 106 104 102 Daily Open High Low Close 100 105 99 102 103 106 102 102 101 102 97 98 98 100 107 105 106 110 106 108 109 115 107 112 110 120 110 115 113 117 110 115 110 111 102 103 100 101 94 96 Weekly Open High Low Close 100 110 97 108 106 120 106 115 Monthly 31
  32. 32. Handling the last trading day of derivatives Exchange JPX Products Futures, Options last trading day The 2nd Friday of every month * If the 2nd Friday is holiday, the day before http://www.jpx.co.jp/derivatives/r ules/last-trading-day/ 32
  33. 33. Example of last trading day May 2017 – Aug 2017 Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat May 1 2 3 4 5 6 Jul 7/1 7 8 9 19 11 12 13 2 3 4 5 6 7 8 14 15 16 17 18 19 20 9 10 11 12 13 14 15 21 22 23 24 25 26 27 16 17 18 19 20 21 22 28 29 30 31 23 24 25 26 27 28 29 Jun 6/1 2 3 30 31 4 5 6 7 8 9 10 Aug 1 2 3 4 5 11 12 13 14 15 16 17 6 7 8 9 10 11 12 18 19 20 21 22 23 24 13 14 15 16 17 18 19 25 26 27 28 29 30 20 21 22 23 24 25 26 27 28 29 30 31 33
  34. 34. Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat May 1 2 3 4 5 6 Jul 7/1 7 8 9 19 11 12 13 2 3 4 5 6 7 8 14 15 16 17 18 19 20 9 10 11 12 13 14 15 21 22 23 24 25 26 27 16 17 18 19 20 21 22 28 29 30 31 23 24 25 26 27 28 29 Jun 6/1 2 3 30 31 4 5 6 7 8 9 10 Aug 1 2 3 4 5 11 12 13 14 15 16 17 6 7 8 9 10 11 12 18 19 20 21 22 23 24 13 14 15 16 17 18 19 25 26 27 28 29 30 20 21 22 23 24 25 26 27 28 29 30 31 Example of last trading day May 2017 – Aug 2017 34
  35. 35. Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat May 1 2 3 4 5 6 Jul 7/1 7 8 9 19 11 12 13 2 3 4 5 6 7 8 14 15 16 17 18 19 20 9 10 11 12 13 14 15 21 22 23 24 25 26 27 16 17 18 19 20 21 22 28 29 30 31 23 24 25 26 27 28 29 Jun 6/1 2 3 30 31 4 5 6 7 8 9 10 Aug 1 2 3 4 5 11 12 13 14 15 16 17 6 7 8 9 10 11 12 18 19 20 21 22 23 24 13 14 15 16 17 18 19 25 26 27 28 29 30 20 21 22 23 24 25 26 27 28 29 30 31 Example of last trading day May 2017 – Aug 2017 35
  36. 36. Issues Be aware of holidays Picking the 2nd friday 36
  37. 37. Dealing with Japanese public holidays pandas.tseries.holiday only supports US holidays(as of Sep 22nd, 2016) It's possible ot create your own holiday rules by inheriting AbstractHolidayCalendar, but... Does not solve holidays such as Spring/Autumn Equinox Instead, use CustomBusinessDay to implement individual holidays Implement Japanese holidays in pandas using existing calendar data 37
  38. 38. Case2-3: Compute the last trading day using the CustomBusinessDay class Import holiday data from YAML file Select the 2nd friday of evey month using pandas.date_range(feq='WOM-2FRI') Skip holidays using the CustomBusinessDay class Sample: Case2-3.ipynb 38
  39. 39. Using Jupyter Notebook ~ Don’t miss useful functions ~
  40. 40. Case3-1: Create own magic command Search stock price using "line magic", and output it to IPython.display.Iframe Paste data in various formats into notebook cells using "cell magic" , and convert it into a pandas DataFrame Save frequently used commands to a file and re-use them using %load_ext Sample: Case3-1.ipynb 40
  41. 41. Case3-2: ipywidgets is the easiest way to create a UI Easy to implement a UI using the ipywidgets.interact decorator Automatically creates UI controls for function arguments bool: check box Int: slider Creates interactive visualization of moving averages and Bollinger- Bands Sample: Case3-2.ipynb 41
  42. 42. Useful Nbextensions Best installed using jupyter_contrib_nbextensions https://github.com/ipython-contrib/jupyter_contrib_nbextensions * Of course, it is possible to install Nbextention individually Easy to enable/disable indiviudal extensions using the Nbextensions edit menu Create your own extensions using Javascript 42
  43. 43. Today’s summary Python >>> Excel Suitable for handling time series data Easy to create commands and UI 43
  44. 44. Sample code and files Sample code and excel files on Github https://github.com/drillan/pyconjp2016 Some code is redundant due to: Python 2/3 support Offline mode No license limitations 44
  45. 45. Thank you See you next year? 45

×