This presentation presents a review of the major programmability features in SPSS 14 and 15 and introduces the new programmability features of SPSS 16.
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
Programmability in spss 14, 15 and 16
1. Programmability in SPSS 14, SPSS
15 and SPSS 16
The Revolution Continues
Jon Peck
Technical Advisor
SPSS
Copyright (c) SPSS Inc, 2007
2. Recap of SPSS 14 Python programmability
Developer Central
New features in SPSS 15 programmability
Writing first-class procedures
Updating the data
New features in SPSS 16 programmability
Interacting with the user
Q & A
Conclusion
Copyright(c)SPSSInc,2007
Agenda
3. "Because of programmability, SPSS 14 is the most important
release since I started using SPSS fifteen years ago."
"I think I am going to like using Python."
"Python and SPSS 14 and later are, IMHO, GREAT!"
"By the way, Python is a great addition to SPSS."
From InfoWorld (April 19, 2007)
"Of all the tools fueling the dynamic-language trend in the enterprise,
general-purpose dynamic languages such as Python and Ruby present
the greatest upside for enhancing developer productivity."
Copyright(c)SPSSInc,2007
Quotations from SPSS Users
4. SPSS provides a powerful engine for statistical
and graphical methods and for data
management.
Python® provides a powerful, elegant, and
easy-to-learn language for controlling and
responding to this engine.
Together they provide a comprehensive system
for serious applications of analytical methods to
data.
Copyright(c)SPSSInc,2007
The Combination of SPSS and
Python
5. SPSS 14.0 provided
Programmability
Multiple datasets
Variable and File Attributes
Programmability read-access to case data
Ability to control SPSS from a Python program
SPSS 15 adds
Read and write case data
Create new variables directly rather than generating syntax
Create pivot tables and text blocks via backend API's
Easier setup
SPSS 16 will add
EXTENSION command for user procedures with SPSS syntax
Dataset features for complex data management
Ability to use R procedures within SPSS through R Plug-In
Copyright(c)SPSSInc,2007
Programmability Features in
SPSS 14, 15, and 16
6. Makes possible easy jobs that respond to datasets, output,
environment
Allows greater generality, more automation
Makes jobs more robust
Allows extending the capabilities of SPSS
Enables better organized and more maintainable code
Facilitates staff specialization
Increases productivity
More fun
Copyright(c)SPSSInc,2007
Programmability Advantages
7. Python extends SPSS via
General programming language
Access to variable dictionary, case data, and output
Access to standard and third-party modules
SPSS Developer Central modules
Module structure for building libraries of code
Runs in "back-end" syntax context (like macro)
SaxBasic scripting runs in "front-end" context
Two modes
Traditional SPSS syntax window
Drive SPSS from Python (external mode)
Optional install (licensed with SPSS Base)
Copyright(c)SPSSInc,2007
Programmability Overview
8. SPSS is not the owner or licensor of the Python
software. Any user of Python must agree to the
terms of the Python license agreement located
on the Python web site. SPSS is not making any
statement about the quality of the Python
program. SPSS fully disclaims all liability
associated with your use of the Python program.
Copyright(c)SPSSInc,2007
Legal Notice
9. Supports implementing various programming
languages
Requires a programmer to implement a new language
VB.NET Plug-In available on Developer Central
Works only in external mode
Copyright(c)SPSSInc,2007
The SPSS Programmability
Software Development Kit
10. Python interpreter embedded within SPSS
SPSS runs in traditional way until BEGIN PROGRAM
command is found
Python collects commands until END PROGRAM
command is found; then runs the program
Python can communicate with SPSS through API's (calls to
functions)
Includes running SPSS syntax inside Python program
Includes creating macro values for later use in syntax
Python can access SPSS output and data
OMS is a key tool
Copyright(c)SPSSInc,2007
How Programmability Works
11. BEGIN PROGRAM.
import spss, spssaux
spssaux.GetSPSSInstallDir("SPSSDIR")
spssaux.OpenDataFile("SPSSDIR/employee data.sav")
# find categorical variables
catVars = spssaux.VariableDict(variableLevel=['nominal',
'ordinal'])
if catVars:
spss.Submit("FREQ " + " ".join(catVars.variables))
# create a macro listing categorical variables
spss.SetMacroValue("!catVars", " ".join(catVars.variables))
END PROGRAM.
DESC !catVars. Run
Copyright(c)SPSSInc,2007
Example:
Summarize Categorical Variables
12. Two modes of operation
SPSS Drives mode (inside): traditional syntax context
BEGIN PROGRAM …program… END PROGRAM
Program in 14, 15, or 16 is in Python or, new in 16, in R
X Drives mode (outside): eXternal program drives SPSS
Python interpreter (or VB.NET)
No SPSS Viewer, Data Editor, or SPSS user interface
Output sent as text to the application – can be suppressed
Has performance advantages
Build programs with an IDE
Even if to be run in traditional mode
Copyright(c)SPSSInc,2007
Programmability Inside or Outside
SPSS
14. Be productive quickly
Get more return as you learn more
Python.org
Python Tutorial
Cheeseshop
over 2200 packages as of April 11, 2007
SPSS Developer Central
SPSS Programming and Data Management, 4th ed, 2006.
Copyright(c)SPSSInc,2007
Python Resources
15. Dive Into Python book or PDF
Practical Python by Magnus Lie Hetland
Extensive examples and discussion of Python
Python Cookbook, 2nd
ed by Martelli, Ravenscroft, &
Ascher
Python in a Nutshell, 2nd
ed by Martelli, O'Reilly
Very clear, comprehensive reference material
wxPython in Action by Rappin and Dunn
Explains user interface building with wxPython
Copyright(c)SPSSInc,2007
Python Books
16. scipy 0.5.2 Scientific Algorithms Library for Python
Scipy.org
scipy is an open source library of scientific tools for
Python. scipy gathers a variety of high level science and
engineering modules together as a single package. scipy
provides modules for statistics, optimization, integration,
linear algebra, Fourier transforms, signal and image
processing, genetic algorithms, ODE solvers, special
functions, and more. scipy requires and supplements
NumPy, which provides a multidimensional array object
and other basic functionality.
Python is becoming a major language for scientific
computing
Copyright(c)SPSSInc,2007
Cheeseshop: scipy
17. SPSS Developer Central is the web home for
developing SPSS applications
Python, .NET, R Integration Plug-Ins
Supplementary modules by SPSS and others
Articles on programmability and graphics
Forums for asking questions and exchanging
information
Programmability Extension SDK
Get Python itself from Python.org or CD
SPSS 14, 15 use 2.4. (2.4.3)
SPSS 16 will use 2.5
Not limited to programmability
GPL graphics
User-contributed code
Key Supplementary
Modules
spssaux
spssdata
New for SPSS 15
trans
extendedTransforms
rake
pls
enhanced tables.py
Copyright(c)SPSSInc,2007
SPSS Developer Central
18. tables.py module on Developer Central can merge two
tables into one.
E.g., Ctables significance tests into main tables
Merge or replace cells with cells from a different table
Flexibly define the join
tables.py can also censor cells, e.g., blank statistics
based on small counts.
Merge example: data on importance of education
qualifications for immigration by region of Europe
CTABLES /TABLE qfimeduBin BY Region
/TITLES TITLE='Qualifications for Immigration'
/COMPARETEST TYPE=PROP
Copyright(c)SPSSInc,2007
Example: Manipulating Output:
Merging Tables
20. BEGIN PROGRAM.
import spss, tables
cmd=r"""CTABLES /TABLE qfimeduBin BY Region
/TITLES TITLE='Qualifications for Immigration'
/COMPARETEST TYPE=PROP"""
tables.mergeLatest(cmd, autofit=False)
END PROGRAM.
Runs Ctables and merges test table into main table
Using default merge behavior
"If it really is this simple this will generate a lot of
excitement for us."
"This is really fantastic."
Copyright(c)SPSSInc,2007
Program to Merge
21. Qualifications for Immigration
Comparisons of Column Proportions
974 376 1024
A B D
533
1361
B D
336 1282
A B D
574
2940
D
974 2720
A B D
1555
3543 1130 2989
B
2038
3585
C
1288
C
2540 2229
A C
1931
C
823
A C
876 1299
A C
0
1
2
3
4
5
Qualification for
immigration:
good educational
qualifications
Count
(A)
Western
Count
(B)
Eastern
Count
(C)
Northern
Count
(D)
Southern
Region of Europe
Results are based on two-sided tests with significance level 0.05. For each
significant pair, the key of the category with the smaller column proportion
appears under the category with the larger column proportion.
Copyright(c)SPSSInc,2007
Merged Output
22. You can extend SPSS capabilities by building new procedures
Or use ones that others have built
Combine SPSS procedures and transformations with Python
logic
Poisson regression (SPSS 14) example using iterated CNLR
New raking procedure built over GENLOG
GENLIN
in SPSS 15
Calculate data aggregates in SPSS and pass to algorithm
coded in Python
Raking procedure starts with AGGREGATE; uses GENLOG
Acquire case data and compute in Python
Use Python standard modules and third-party additions
Partial Least Squares Regression (pls module)
Copyright(c)SPSSInc,2007
Approaches to Creating
New Procedures
23. Common to adapt existing libraries or code for
use as Python extension modules
C, C++, VB, Fortran,...
Python tools and API's to assist
Chap 25 in Python in a Nutshell
Tutorial on extending and embedding the Python
interpreter
Call R programs with SPSS 16
Copyright(c)SPSSInc,2007
Adapt Existing Code Libraries
24. Regression with large number of predictors (even k > N)
Similar to Principal Components but considers dependent
variable simultaneously
Calculates principal components of (y, X) then use regression
on the scores instead of original data
Equivalent to ordinary regression when number of factors
equals number of predictors and one y variable
For more information see An Optimization Perspective on
Kernel Partial Least Squares Regression.pdf.
Copyright(c)SPSSInc,2007
Partial Least Squares Regression
25. Strategy
Fetches data from SPSS
Uses scipy matrix operations to compute results
Third-party module from Cheeseshop
Writes pivot tables to SPSS Viewer
Subject to OMS
SPSS 14 viewer module created pivot table using OLE
automation
SPSS 15 has direct pivot table API's
Saves predicted values to active dataset
Copyright(c)SPSSInc,2007
The pls Module for SPSS 15
26. GET FILE="c:/spss15/tutorial/sample_files/car_sales.sav".
REGRESSION /STATISTICS COEFF R /DEPENDENT sales
/METHOD=ENTER curb_wgt engine_s fuel_cap horsepow
length mpg price resale type wheelbas width .
begin program.
import spss, pls
pls.plsproc("sales", """curb_wgt engine_s fuel_cap horsepow
length mpg price resale type wheelbas width""",
yhat="predsales")
end program.
plsproc defaults to five factors
Copyright(c)SPSSInc,2007
pls Example: REGRESSION vs
PLS
27. PLS with 5 factors
almost equals
regression with 11
variables
Copyright(c)SPSSInc,2007
Results
28. User procedures can be written in Python but specified using SPSS
traditional syntax
User never writes or sees Python code
Used as if a built-in SPSS command
EXTENSION command defines command to SPSS via simple XML file
Python module called with syntax already checked and processed by
SPSS
More general PLS module
PLS y1 y2 y3 BY fac1 fac2 WITH z1 z2 z3
/CRITERIA LATENTFACTORS=2.
Dialog box interface tools in SPSS 17
In the meantime, use wxPython or
something similar
Copyright(c)SPSSInc,2007
SPSS 16 User Procedures
29. "Raking" adjusts sample weights to control totals in n
dimensions
Example: data classified by age and sex with known
population totals or proportions
Calculated by fitting a main effects loglinear model
Various adjustments required
Not a complete solution to reweighting
Not directly available in SPSS
Copyright(c)SPSSInc,2007
Raking Sample Weights
30. Strategy: combine SPSS procedures with Python logic
rake.py (from SPSS Developer Central)
Aggregates data via AGGREGATE to new dataset
Creates new variable with control totals
Applies GENLOG, saving predicted counts
Adjusts predicted counts
Matches back into original dataset
Does not use MATCH FILES or require a SORT command
Written in one (long) day
rake.rake("age sex",
[{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}],
finalweight="finalwt")
Copyright(c)SPSSInc,2007
Raking Module
31. SPSS 14 programmability can wrap SPSS syntax in
Python logic, e.g., generate COMPUTE commands
on the fly
Useful when definitions can be expressed in SPSS syntax
SPSS 15 programmability can
Generate new variables directly
Add new cases directly
Create new datasets from scratch
SPSS 16 has additional dataset capabilities
Copyright(c)SPSSInc,2007
Extending SPSS Transformations
32. trans module facilitates plugging in Python code to
iterate over cases
Runs as an SPSS procedure
Passes the data
Adds variables to the SPSS variable dictionary
Can apply any calculation casewise
Use with
Standard Python functions (e.g., math module)
Any user-written functions or appropriate classes
Functions in extendedTransforms module
Copyright(c)SPSSInc,2007
trans and extendedTransforms
Modules
33. trans strategy
Pass case data through Python code writing
result back to SPSS in new variables
extendedTransforms collection of 12 functions to
apply to SPSS variables, including
Regular expression search/replace
soundex and nysiis functions for phonetic equivalence
Date/time conversions based on patterns
Copyright(c)SPSSInc,2007
trans and extendedTransforms
Modules
34. Pattern matching in text strings
If you use SPSS index or replace, you need these
Standardize string data (Mr, Mr., Herr, Senor,...)
Extract data from loosely structured text
"simvastatin-- PO 80mg TAB" -> "simvastatin", "80"
Patterns can be simple strings (as with SPSS index) or
complex patterns
Pick out variable names with common parts
Can greatly simplify code
Copyright(c)SPSSInc,2007
Python Regular Expressions
Other new SPSS 14 features enhance programmability:
multiple concurrent datasets
variable and file attributes
XML workspace and OMS enhancements
The PythonWin I D E is available from http://starship.python.net/crew/mhammond/win32/Downloads.html. There are many other choices for a Python I D E.
Graphic shows the requested custom table along with the associated table of comparisons of column proportions. Each table has the same set of row and column labels, so the tables can be easily merged.
The graphic shows that the two tables from the original C tables output have been merged into one table. Each cell of the merged table contains the cell contents from the original custom table as well as the cell contents from the table of comparison of column proportions.