• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Developing and releasing SOFA Statistics
 

Developing and releasing SOFA Statistics

on

  • 13,552 views

SOFA Statistics (http://www.sofastatistics.com) is an open source desktop Python statistics application with an emphasis on ease of use, learn as you go, and attractive output. This presentation ...

SOFA Statistics (http://www.sofastatistics.com) is an open source desktop Python statistics application with an emphasis on ease of use, learn as you go, and attractive output. This presentation covers the lead developer's experiences with some of the technical aspects of the project as well as project management issues.

Statistics

Views

Total Views
13,552
Views on SlideShare
13,547
Embed Views
5

Actions

Likes
0
Downloads
13
Comments
0

2 Embeds 5

http://www.slideshare.net 3
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Developing and releasing SOFA Statistics Developing and releasing SOFA Statistics Presentation Transcript

    • SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand SOFA Statistics Developing and releasing a Python open source application Grant Paton-Simpson sofastatistics.com
    • SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Overview
      • Introducing the SOFA Statistics application
      • How SOFA works with SQL databases
      • Using HTML for output (via wxWebKit)
      • Experience with existing statistics modules
      • wxPython GUI toolkit (esp the grid widget)
      • The release process (esp packaging)
      • In 30 minutes flat out!
    • SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Introducing SOFA Statistics
      • SOFA stands for S tatistics O pen F or A ll
      • A cross platform desktop application for:
        • Making report tables from data (database, spreadsheet, directly entered)
        • Producing charts
        • Running key statistical tests
      • The slogan is “ease of use, learn as you go, and beautiful output”
      • May be useful for specialist statisticians but emphasis on supporting non-specialists, and learning statisticians
      • Currently version 0.8.10 and pushing on towards a version 1.0 release
    • SOFA Architecture ... Linking not importing SQLite MySQL MS Access PostgreSQL SQL Server SOFA Scripts from GUI or by hand (available for automation) HTML output (spreadsheet- friendly)
    • SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Working with SQL
      • Not using an abstraction layer. Wrote my own code using MySQLdb module etc
        • Already experienced with SQL
        • Want control over the information I get about data configuration (e.g character set)
        • Want control over how I interact with the databases for performance reasons
      • Adding other databases, e.g. Oracle, is a process of copying an existing module and changing the implementation
      • SQL databases do things very differently
        • SQLite with data type integrity (like dynamic typing)
        • PostgreSQL and SUM(expression)
    • HTML Report Table Output
      • Tree-based for each dimension (rows, cols)
      • Created an artificial limit of 5000 cells
      • Scales linearly and Python not the bottleneck
      • Rendered locally using wxWebKit
    • Statistics Modules
      • Looked at using existing libraries but ended up using a modified subset of their code
      • Was not my preferred approach. Benefits to plugging in a module:
        • Saving time and effort
        • Less risk (the results of statistical algorithms can wildly diverge because of small floating point errors compounding and multiplying)
        • Any issues found and fixed can help everyone
      • Reasons for creating own code (often based on existing)
        • Standard code didn't return results separated from formatting
        • No option of using decimal instead of floating point maths
        • Half-baked code in some places
        • Keeping the installer file size down
      • But I do use existing libraries to test my code against (using nosetests)
      SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand
    • SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Demonstration of key functionality
      • Background for discussion of the GUI toolkit and other topics
      • NB still lots more functionality to be added
      • Switch over to Jaunty
      • Interactive visualisations using MatPlotLib (and wxWebKit)
        • e.g. showing how a t-test works (ideas from Statistics Without Tears)
        • e.g. impact of altering your sample size
      • Output charting using Raphael JS (SVG & JS)
      • Mac OS X package, more flexible packaging
      • Add ability to import from Calc and SPSS
      • Other databases e.g. Oracle, DB2, Interbase
      • Increase test coverage
      SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Plans
      • More languages e.g. French, Spanish, German
      • ROC, power calculations
      • Overall, not trying to compete with R (or RK Ward)
        • The slogan is “ease of use, learn as you go, and beautiful output”
        • May be useful for specialist statisticians but emphasis on supporting non-specialists, and learning statisticians
        • Focus on making most common needs easy to satisfy
        • Plugin extensions for rest
      SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Plans cont ...
    • wxPython GUI Toolkit
      • Cross-platform and native widgets
      Ubuntu (Dust theme) Windows XP
      • wxPython in Action – Great book
      • Mailing list (with Robin Dunn a regular contributor)
      • Lots of online documentation (but googling and integration of different ideas often required)
      • There is a GUI for making GUIs but I prefer handcoding
        • Clean
        • Can reuse code across different forms
        • Can delegate parts of the GUI e.g. to database plugin modules
      • Lots of sophisticated, configurable widgets
        • Was able to make a data entry table work like I thought it should e.g. new row has column label of … , specific behaviour when tabbing
      • Focus on grid control
      SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Documentation & Community
      • With flexibility and power comes some complexity to handle
      • Only 1100 lines of code to make the data grid you saw in the demonstration (inc validation, ability to add new rows and edit values etc)
      • May be sensible to have more lines of documentation than code in some modules
      • Resolving issues can take you to the edge of what is known/ documented
        • Had data entry working like clockwork in Ubuntu
        • Found out Windows intercepted Tabs and Returns before they could be exposed and reacted to
        • But there was a solution
      SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand wx.Grid
    • SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Example of wx.Grid code self.frame.Bind(wx.grid.EVT_GRID_EDITOR_CREATED, self.OnGridEditorCreated) … def OnGridEditorCreated(self, event): """ Need to bind KeyDown to the control itself e.g. a choice control. wx.WANTS_CHARS makes it work. """ control = event.GetControl() control.WindowStyle |= wx.WANTS_CHARS control.Bind(wx.EVT_KEY_DOWN, self.OnGridKeyDown) event.Skip() … def OnGridKeyDown(self, event): keycode = event.GetKeyCode() if keycode in (wx.WXK_TAB, wx.WXK_RETURN): etc The user clicks on a cell to edit a value. We bind to that event. Now we can grab the control ... … and respond to its key down event Now we're away again :-)
    • SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Custom Controls
      • Option of label display e.g. “Male” not 1
      • Conditional formatting e.g. all values > 1000 red
      • Choice of toolkit very important
        • Can it support what you want to do, or will you hit a wall?
        • If I wanted to display sparklines or pie charts as cells in a table … could I?
      • Hard to know whether a good choice until already committed
      SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Extending the grid further
      • Lots of steps to get 100% right. My steps are:
        • Do preparatory clean up (debugging off, demo database tidy).
        • Make sure I have translations for any new strings I've added.
        • Make and test the new deb and Windows packages. I use VirtualBox to give me identical install environments each time.
        • Add the new files to Sourceforge (I wanted to consolidate downloads to help me measure usage).
        • Add a new release to Launchpad and Freshmeat complete with updated release notes and change log (used Bazaar to push to Launchpad so can browse my commit comments).
        • Make announcements in both Launchpad and Freshmeat.
        • Update the project homepage to account for the new download location, new features.
        • Add a blog item to the project site.
        • Update release version and release date on Wikipedia.
        • Revisit any important threads commenting on open source statistics packages.
      SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Release process
      • Initially very daunting – where do you start?
      • Found ShowMeDo video by Austrian developer Horst Jens
        • His example was a Python project so very similar requirements
        • Ended up with very detailed step-by-step guidelines for packaging SOFA Statistics
      • NB installing application for all users, not a given user
        • Files are put into /usr/share/pyshared/sofa/...
        • Any files needed by an individual user are transferred during first use of application /home/username/sofa/...
      SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Making Debian Package (for Ubuntu)
      • Nullsoft Scriptable Install System. Free. Used by Firefox, OpenOffice etc
      • Weird language – cross between PHP and assembler
      • Plenty of documentation etc but best to start and then extend
      • Issue – file size. Including mysqldb, numpy, wxpython, sqlite, python
      • Put program in Program Files and user files in Documents and Settingsusernamesofa...
      SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand NSIS Windows Installer
      • Running an open source project can be very satisfying
      • Lots of new learning
      • A long-term commitment
      • Lots to do. Not just glamour coding - someone has to “take out the trash”
      • Phenomenal resources available in open source world – bazaar, loggerhead, nosetests, etc
      • Hands up if ever considered it (or doing it)
      SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Final Thoughts