Developing and releasing SOFA Statistics - Presentation Transcript
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand SOFA Statistics Developing and releasing a Python open source application Grant Paton-Simpson sofastatistics.com
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Overview
Making report tables from data (database, spreadsheet, directly entered)
Producing charts
Running key statistical tests
The slogan is “ease of use, learn as you go, and beautiful output”
May be useful for specialist statisticians but emphasis on supporting non-specialists, and learning statisticians
Currently version 0.8.10 and pushing on towards a version 1.0 release
SOFA Architecture ... Linking not importing SQLite MySQL MS Access PostgreSQL SQL Server SOFA Scripts from GUI or by hand (available for automation) HTML output (spreadsheet- friendly)
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Working with SQL
Not using an abstraction layer. Wrote my own code using MySQLdb module etc
Already experienced with SQL
Want control over the information I get about data configuration (e.g character set)
Want control over how I interact with the databases for performance reasons
Adding other databases, e.g. Oracle, is a process of copying an existing module and changing the implementation
SQL databases do things very differently
SQLite with data type integrity (like dynamic typing)
PostgreSQL and SUM(expression)
HTML Report Table Output
Tree-based for each dimension (rows, cols)
Created an artificial limit of 5000 cells
Scales linearly and Python not the bottleneck
Rendered locally using wxWebKit
Statistics Modules
Looked at using existing libraries but ended up using a modified subset of their code
Was not my preferred approach. Benefits to plugging in a module:
Saving time and effort
Less risk (the results of statistical algorithms can wildly diverge because of small floating point errors compounding and multiplying)
Any issues found and fixed can help everyone
Reasons for creating own code (often based on existing)
Standard code didn't return results separated from formatting
No option of using decimal instead of floating point maths
Half-baked code in some places
Keeping the installer file size down
But I do use existing libraries to test my code against (using nosetests)
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Demonstration of key functionality
Background for discussion of the GUI toolkit and other topics
NB still lots more functionality to be added
Switch over to Jaunty
Interactive visualisations using MatPlotLib (and wxWebKit)
e.g. showing how a t-test works (ideas from Statistics Without Tears)
e.g. impact of altering your sample size
Output charting using Raphael JS (SVG & JS)
Mac OS X package, more flexible packaging
Add ability to import from Calc and SPSS
Other databases e.g. Oracle, DB2, Interbase
Increase test coverage
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Plans
More languages e.g. French, Spanish, German
ROC, power calculations
Overall, not trying to compete with R (or RK Ward)
The slogan is “ease of use, learn as you go, and beautiful output”
May be useful for specialist statisticians but emphasis on supporting non-specialists, and learning statisticians
Mailing list (with Robin Dunn a regular contributor)
Lots of online documentation (but googling and integration of different ideas often required)
There is a GUI for making GUIs but I prefer handcoding
Clean
Can reuse code across different forms
Can delegate parts of the GUI e.g. to database plugin modules
Lots of sophisticated, configurable widgets
Was able to make a data entry table work like I thought it should e.g. new row has column label of … , specific behaviour when tabbing
Focus on grid control
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Documentation & Community
With flexibility and power comes some complexity to handle
Only 1100 lines of code to make the data grid you saw in the demonstration (inc validation, ability to add new rows and edit values etc)
May be sensible to have more lines of documentation than code in some modules
Resolving issues can take you to the edge of what is known/ documented
Had data entry working like clockwork in Ubuntu
Found out Windows intercepted Tabs and Returns before they could be exposed and reacted to
But there was a solution
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand wx.Grid
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Example of wx.Grid code self.frame.Bind(wx.grid.EVT_GRID_EDITOR_CREATED, self.OnGridEditorCreated) … def OnGridEditorCreated(self, event): """ Need to bind KeyDown to the control itself e.g. a choice control. wx.WANTS_CHARS makes it work. """ control = event.GetControl() control.WindowStyle |= wx.WANTS_CHARS control.Bind(wx.EVT_KEY_DOWN, self.OnGridKeyDown) event.Skip() … def OnGridKeyDown(self, event): keycode = event.GetKeyCode() if keycode in (wx.WXK_TAB, wx.WXK_RETURN): etc The user clicks on a cell to edit a value. We bind to that event. Now we can grab the control ... … and respond to its key down event Now we're away again :-)
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Custom Controls
Option of label display e.g. “Male” not 1
Conditional formatting e.g. all values > 1000 red
Choice of toolkit very important
Can it support what you want to do, or will you hit a wall?
If I wanted to display sparklines or pie charts as cells in a table … could I?
Hard to know whether a good choice until already committed
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Extending the grid further
Lots of steps to get 100% right. My steps are:
Do preparatory clean up (debugging off, demo database tidy).
Make sure I have translations for any new strings I've added.
Make and test the new deb and Windows packages. I use VirtualBox to give me identical install environments each time.
Add the new files to Sourceforge (I wanted to consolidate downloads to help me measure usage).
Add a new release to Launchpad and Freshmeat complete with updated release notes and change log (used Bazaar to push to Launchpad so can browse my commit comments).
Make announcements in both Launchpad and Freshmeat.
Update the project homepage to account for the new download location, new features.
Add a blog item to the project site.
Update release version and release date on Wikipedia.
Revisit any important threads commenting on open source statistics packages.
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Release process
Initially very daunting – where do you start?
Found ShowMeDo video by Austrian developer Horst Jens
His example was a Python project so very similar requirements
Ended up with very detailed step-by-step guidelines for packaging SOFA Statistics
NB installing application for all users, not a given user
Files are put into /usr/share/pyshared/sofa/...
Any files needed by an individual user are transferred during first use of application /home/username/sofa/...
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Making Debian Package (for Ubuntu)
Nullsoft Scriptable Install System. Free. Used by Firefox, OpenOffice etc
Weird language – cross between PHP and assembler
Plenty of documentation etc but best to start and then extend
Issue – file size. Including mysqldb, numpy, wxpython, sqlite, python
Put program in Program Files and user files in Documents and Settingsusernamesofa...
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand NSIS Windows Installer
Running an open source project can be very satisfying
Lots of new learning
A long-term commitment
Lots to do. Not just glamour coding - someone has to “take out the trash”
Phenomenal resources available in open source world – bazaar, loggerhead, nosetests, etc
Hands up if ever considered it (or doing it)
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Final Thoughts
SOFA Statistics (http://www.sofastatistics.com) is more
SOFA Statistics (http://www.sofastatistics.com) is an open source desktop Python statistics application with an emphasis on ease of use, learn as you go, and attractive output. This presentation covers the lead developer's experiences with some of the technical aspects of the project as well as project management issues. less
0 comments
Post a comment