SlideShare a Scribd company logo
1 of 35
Programmability in SPSS 14, SPSS
15 and SPSS 16
The Revolution Continues
Jon Peck
Technical Advisor
SPSS
Copyright (c) SPSS Inc, 2007
 Recap of SPSS 14 Python programmability
 Developer Central
 New features in SPSS 15 programmability
 Writing first-class procedures
 Updating the data
 New features in SPSS 16 programmability
 Interacting with the user
 Q & A
 Conclusion
Copyright(c)SPSSInc,2007
Agenda
 "Because of programmability, SPSS 14 is the most important
release since I started using SPSS fifteen years ago."
 "I think I am going to like using Python."
 "Python and SPSS 14 and later are, IMHO, GREAT!"
 "By the way, Python is a great addition to SPSS."
 From InfoWorld (April 19, 2007)
 "Of all the tools fueling the dynamic-language trend in the enterprise,
general-purpose dynamic languages such as Python and Ruby present
the greatest upside for enhancing developer productivity."
Copyright(c)SPSSInc,2007
Quotations from SPSS Users
 SPSS provides a powerful engine for statistical
and graphical methods and for data
management.
 Python® provides a powerful, elegant, and
easy-to-learn language for controlling and
responding to this engine.
 Together they provide a comprehensive system
for serious applications of analytical methods to
data.
Copyright(c)SPSSInc,2007
The Combination of SPSS and
Python
 SPSS 14.0 provided
 Programmability
 Multiple datasets
 Variable and File Attributes
 Programmability read-access to case data
 Ability to control SPSS from a Python program
 SPSS 15 adds
 Read and write case data
 Create new variables directly rather than generating syntax
 Create pivot tables and text blocks via backend API's
 Easier setup
 SPSS 16 will add
 EXTENSION command for user procedures with SPSS syntax
 Dataset features for complex data management
 Ability to use R procedures within SPSS through R Plug-In
Copyright(c)SPSSInc,2007
Programmability Features in
SPSS 14, 15, and 16
 Makes possible easy jobs that respond to datasets, output,
environment
 Allows greater generality, more automation
 Makes jobs more robust
 Allows extending the capabilities of SPSS
 Enables better organized and more maintainable code
 Facilitates staff specialization
 Increases productivity
 More fun
Copyright(c)SPSSInc,2007
Programmability Advantages
 Python extends SPSS via
 General programming language
 Access to variable dictionary, case data, and output
 Access to standard and third-party modules
 SPSS Developer Central modules
 Module structure for building libraries of code
 Runs in "back-end" syntax context (like macro)
 SaxBasic scripting runs in "front-end" context
 Two modes
 Traditional SPSS syntax window
 Drive SPSS from Python (external mode)
 Optional install (licensed with SPSS Base)
Copyright(c)SPSSInc,2007
Programmability Overview
 SPSS is not the owner or licensor of the Python
software. Any user of Python must agree to the
terms of the Python license agreement located
on the Python web site. SPSS is not making any
statement about the quality of the Python
program. SPSS fully disclaims all liability
associated with your use of the Python program.
Copyright(c)SPSSInc,2007
Legal Notice
 Supports implementing various programming
languages
 Requires a programmer to implement a new language
 VB.NET Plug-In available on Developer Central
 Works only in external mode
Copyright(c)SPSSInc,2007
The SPSS Programmability
Software Development Kit
 Python interpreter embedded within SPSS
 SPSS runs in traditional way until BEGIN PROGRAM
command is found
 Python collects commands until END PROGRAM
command is found; then runs the program
 Python can communicate with SPSS through API's (calls to
functions)
 Includes running SPSS syntax inside Python program
 Includes creating macro values for later use in syntax
 Python can access SPSS output and data
 OMS is a key tool
Copyright(c)SPSSInc,2007
How Programmability Works
BEGIN PROGRAM.
import spss, spssaux
spssaux.GetSPSSInstallDir("SPSSDIR")
spssaux.OpenDataFile("SPSSDIR/employee data.sav")
# find categorical variables
catVars = spssaux.VariableDict(variableLevel=['nominal',
'ordinal'])
if catVars:
spss.Submit("FREQ " + " ".join(catVars.variables))
# create a macro listing categorical variables
spss.SetMacroValue("!catVars", " ".join(catVars.variables))
END PROGRAM.
DESC !catVars. Run
Copyright(c)SPSSInc,2007
Example:
Summarize Categorical Variables
 Two modes of operation
 SPSS Drives mode (inside): traditional syntax context
 BEGIN PROGRAM …program… END PROGRAM
 Program in 14, 15, or 16 is in Python or, new in 16, in R
 X Drives mode (outside): eXternal program drives SPSS
 Python interpreter (or VB.NET)
 No SPSS Viewer, Data Editor, or SPSS user interface
 Output sent as text to the application – can be suppressed
 Has performance advantages
 Build programs with an IDE
 Even if to be run in traditional mode
Copyright(c)SPSSInc,2007
Programmability Inside or Outside
SPSS
Copyright(c)SPSSInc,2007
PythonWin IDE Controlling SPSS
(eXternal Mode)
 Be productive quickly
 Get more return as you learn more
 Python.org
 Python Tutorial
 Cheeseshop
 over 2200 packages as of April 11, 2007
 SPSS Developer Central
 SPSS Programming and Data Management, 4th ed, 2006.
Copyright(c)SPSSInc,2007
Python Resources
 Dive Into Python book or PDF
 Practical Python by Magnus Lie Hetland
 Extensive examples and discussion of Python
 Python Cookbook, 2nd
ed by Martelli, Ravenscroft, &
Ascher
 Python in a Nutshell, 2nd
ed by Martelli, O'Reilly
 Very clear, comprehensive reference material
 wxPython in Action by Rappin and Dunn
 Explains user interface building with wxPython
Copyright(c)SPSSInc,2007
Python Books
 scipy 0.5.2 Scientific Algorithms Library for Python
 Scipy.org
 scipy is an open source library of scientific tools for
Python. scipy gathers a variety of high level science and
engineering modules together as a single package. scipy
provides modules for statistics, optimization, integration,
linear algebra, Fourier transforms, signal and image
processing, genetic algorithms, ODE solvers, special
functions, and more. scipy requires and supplements
NumPy, which provides a multidimensional array object
and other basic functionality.
 Python is becoming a major language for scientific
computing
Copyright(c)SPSSInc,2007
Cheeseshop: scipy
 SPSS Developer Central is the web home for
developing SPSS applications
 Python, .NET, R Integration Plug-Ins
 Supplementary modules by SPSS and others
 Articles on programmability and graphics
 Forums for asking questions and exchanging
information
 Programmability Extension SDK
 Get Python itself from Python.org or CD
 SPSS 14, 15 use 2.4. (2.4.3)
 SPSS 16 will use 2.5
 Not limited to programmability
 GPL graphics
 User-contributed code
Key Supplementary
Modules
spssaux
spssdata
New for SPSS 15
trans
extendedTransforms
rake
pls
enhanced tables.py
Copyright(c)SPSSInc,2007
SPSS Developer Central
 tables.py module on Developer Central can merge two
tables into one.
 E.g., Ctables significance tests into main tables
 Merge or replace cells with cells from a different table
 Flexibly define the join
 tables.py can also censor cells, e.g., blank statistics
based on small counts.
 Merge example: data on importance of education
qualifications for immigration by region of Europe
 CTABLES /TABLE qfimeduBin BY Region
/TITLES TITLE='Qualifications for Immigration'
/COMPARETEST TYPE=PROP
Copyright(c)SPSSInc,2007
Example: Manipulating Output:
Merging Tables
Copyright(c)SPSSInc,2007
Ctables Output
BEGIN PROGRAM.
import spss, tables
cmd=r"""CTABLES /TABLE qfimeduBin BY Region
/TITLES TITLE='Qualifications for Immigration'
/COMPARETEST TYPE=PROP"""
tables.mergeLatest(cmd, autofit=False)
END PROGRAM.
 Runs Ctables and merges test table into main table
 Using default merge behavior
 "If it really is this simple this will generate a lot of
excitement for us."
 "This is really fantastic."
Copyright(c)SPSSInc,2007
Program to Merge
Qualifications for Immigration
Comparisons of Column Proportions
974 376 1024
A B D
533
1361
B D
336 1282
A B D
574
2940
D
974 2720
A B D
1555
3543 1130 2989
B
2038
3585
C
1288
C
2540 2229
A C
1931
C
823
A C
876 1299
A C
0
1
2
3
4
5
Qualification for
immigration:
good educational
qualifications
Count
(A)
Western
Count
(B)
Eastern
Count
(C)
Northern
Count
(D)
Southern
Region of Europe
Results are based on two-sided tests with significance level 0.05. For each
significant pair, the key of the category with the smaller column proportion
appears under the category with the larger column proportion.
Copyright(c)SPSSInc,2007
Merged Output
 You can extend SPSS capabilities by building new procedures
 Or use ones that others have built
 Combine SPSS procedures and transformations with Python
logic
 Poisson regression (SPSS 14) example using iterated CNLR
 New raking procedure built over GENLOG
GENLIN
in SPSS 15
 Calculate data aggregates in SPSS and pass to algorithm
coded in Python
 Raking procedure starts with AGGREGATE; uses GENLOG
 Acquire case data and compute in Python
 Use Python standard modules and third-party additions
 Partial Least Squares Regression (pls module)
Copyright(c)SPSSInc,2007
Approaches to Creating
New Procedures
 Common to adapt existing libraries or code for
use as Python extension modules
 C, C++, VB, Fortran,...
 Python tools and API's to assist
 Chap 25 in Python in a Nutshell
 Tutorial on extending and embedding the Python
interpreter
 Call R programs with SPSS 16
Copyright(c)SPSSInc,2007
Adapt Existing Code Libraries
 Regression with large number of predictors (even k > N)
 Similar to Principal Components but considers dependent
variable simultaneously
 Calculates principal components of (y, X) then use regression
on the scores instead of original data
 Equivalent to ordinary regression when number of factors
equals number of predictors and one y variable
 For more information see An Optimization Perspective on
Kernel Partial Least Squares Regression.pdf.
Copyright(c)SPSSInc,2007
Partial Least Squares Regression
 Strategy
 Fetches data from SPSS
 Uses scipy matrix operations to compute results
 Third-party module from Cheeseshop
 Writes pivot tables to SPSS Viewer
 Subject to OMS
 SPSS 14 viewer module created pivot table using OLE
automation
 SPSS 15 has direct pivot table API's
 Saves predicted values to active dataset
Copyright(c)SPSSInc,2007
The pls Module for SPSS 15
GET FILE="c:/spss15/tutorial/sample_files/car_sales.sav".
REGRESSION /STATISTICS COEFF R /DEPENDENT sales
/METHOD=ENTER curb_wgt engine_s fuel_cap horsepow
length mpg price resale type wheelbas width .
begin program.
import spss, pls
pls.plsproc("sales", """curb_wgt engine_s fuel_cap horsepow
length mpg price resale type wheelbas width""",
yhat="predsales")
end program.
 plsproc defaults to five factors
Copyright(c)SPSSInc,2007
pls Example: REGRESSION vs
PLS
 PLS with 5 factors
almost equals
regression with 11
variables
Copyright(c)SPSSInc,2007
Results
 User procedures can be written in Python but specified using SPSS
traditional syntax
 User never writes or sees Python code
 Used as if a built-in SPSS command
 EXTENSION command defines command to SPSS via simple XML file
 Python module called with syntax already checked and processed by
SPSS
 More general PLS module
 PLS y1 y2 y3 BY fac1 fac2 WITH z1 z2 z3
/CRITERIA LATENTFACTORS=2.
 Dialog box interface tools in SPSS 17
 In the meantime, use wxPython or
something similar
Copyright(c)SPSSInc,2007
SPSS 16 User Procedures
 "Raking" adjusts sample weights to control totals in n
dimensions
 Example: data classified by age and sex with known
population totals or proportions
 Calculated by fitting a main effects loglinear model
 Various adjustments required
 Not a complete solution to reweighting
 Not directly available in SPSS
Copyright(c)SPSSInc,2007
Raking Sample Weights
 Strategy: combine SPSS procedures with Python logic
 rake.py (from SPSS Developer Central)
 Aggregates data via AGGREGATE to new dataset
 Creates new variable with control totals
 Applies GENLOG, saving predicted counts
 Adjusts predicted counts
 Matches back into original dataset
 Does not use MATCH FILES or require a SORT command
 Written in one (long) day
rake.rake("age sex",
[{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}],
finalweight="finalwt")
Copyright(c)SPSSInc,2007
Raking Module
 SPSS 14 programmability can wrap SPSS syntax in
Python logic, e.g., generate COMPUTE commands
on the fly
 Useful when definitions can be expressed in SPSS syntax
 SPSS 15 programmability can
 Generate new variables directly
 Add new cases directly
 Create new datasets from scratch
 SPSS 16 has additional dataset capabilities
Copyright(c)SPSSInc,2007
Extending SPSS Transformations
 trans module facilitates plugging in Python code to
iterate over cases
 Runs as an SPSS procedure
 Passes the data
 Adds variables to the SPSS variable dictionary
 Can apply any calculation casewise
 Use with
 Standard Python functions (e.g., math module)
 Any user-written functions or appropriate classes
 Functions in extendedTransforms module
Copyright(c)SPSSInc,2007
trans and extendedTransforms
Modules
 trans strategy
 Pass case data through Python code writing
result back to SPSS in new variables
 extendedTransforms collection of 12 functions to
apply to SPSS variables, including
 Regular expression search/replace
 soundex and nysiis functions for phonetic equivalence
 Date/time conversions based on patterns
Copyright(c)SPSSInc,2007
trans and extendedTransforms
Modules
 Pattern matching in text strings
 If you use SPSS index or replace, you need these
 Standardize string data (Mr, Mr., Herr, Senor,...)
 Extract data from loosely structured text
 "simvastatin-- PO 80mg TAB" -> "simvastatin", "80"
 Patterns can be simple strings (as with SPSS index) or
complex patterns
 Pick out variable names with common parts
 Can greatly simplify code
Copyright(c)SPSSInc,2007
Python Regular Expressions
Copyright(c)SPSSInc,2007
Write to Me!

More Related Content

What's hot

Quadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystemQuadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystemRob Vesse
 
Sempala - Interactive SPARQL Query Processing on Hadoop
Sempala - Interactive SPARQL Query Processing on HadoopSempala - Interactive SPARQL Query Processing on Hadoop
Sempala - Interactive SPARQL Query Processing on HadoopAlexander Schätzle
 
Database API Viewed as a Mathematical Function, Insights into Testing
Database API Viewed as a Mathematical Function, Insights into TestingDatabase API Viewed as a Mathematical Function, Insights into Testing
Database API Viewed as a Mathematical Function, Insights into TestingBrendan Furey
 
Practical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking RevisitedPractical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking RevisitedRob Vesse
 
Pattern: PMML for Cascading and Hadoop
Pattern: PMML for Cascading and HadoopPattern: PMML for Cascading and Hadoop
Pattern: PMML for Cascading and HadoopPaco Nathan
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial IntroductionSakthi Dasans
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingPaco Nathan
 
The History and Use of R
The History and Use of RThe History and Use of R
The History and Use of RAnalyticsWeek
 
ACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
ACM Bay Area Data Mining Workshop: Pattern, PMML, HadoopACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
ACM Bay Area Data Mining Workshop: Pattern, PMML, HadoopPaco Nathan
 
Improving Effeciency with Options in SAS
Improving Effeciency with Options in SASImproving Effeciency with Options in SAS
Improving Effeciency with Options in SASguest2160992
 

What's hot (11)

Quadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystemQuadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystem
 
Sempala - Interactive SPARQL Query Processing on Hadoop
Sempala - Interactive SPARQL Query Processing on HadoopSempala - Interactive SPARQL Query Processing on Hadoop
Sempala - Interactive SPARQL Query Processing on Hadoop
 
Database API Viewed as a Mathematical Function, Insights into Testing
Database API Viewed as a Mathematical Function, Insights into TestingDatabase API Viewed as a Mathematical Function, Insights into Testing
Database API Viewed as a Mathematical Function, Insights into Testing
 
Sap abap
Sap abapSap abap
Sap abap
 
Practical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking RevisitedPractical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking Revisited
 
Pattern: PMML for Cascading and Hadoop
Pattern: PMML for Cascading and HadoopPattern: PMML for Cascading and Hadoop
Pattern: PMML for Cascading and Hadoop
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial Introduction
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
 
The History and Use of R
The History and Use of RThe History and Use of R
The History and Use of R
 
ACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
ACM Bay Area Data Mining Workshop: Pattern, PMML, HadoopACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
ACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
 
Improving Effeciency with Options in SAS
Improving Effeciency with Options in SASImproving Effeciency with Options in SAS
Improving Effeciency with Options in SAS
 

Similar to Programmability in spss 14, 15 and 16

Presentation on spss
Presentation on spssPresentation on spss
Presentation on spssalfiyajamalcj
 
Software for Qualitative and Quantitative Data Analysis
Software for Qualitative and Quantitative Data AnalysisSoftware for Qualitative and Quantitative Data Analysis
Software for Qualitative and Quantitative Data AnalysisAlexandru Caratas Ghenea
 
Use of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing OpportunitiesUse of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing OpportunitiesMaurice Dawson
 
Open source analytics
Open source analyticsOpen source analytics
Open source analyticsAjay Ohri
 
What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2Revolution Analytics
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsEsther Vasiete
 
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0vithakur
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...Robert Grossman
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016Anand Haridass
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Rusif Eyvazli
 
statistical computation using R- report
statistical computation using R- reportstatistical computation using R- report
statistical computation using R- reportKamarudheen KV
 
How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R projectWLOG Solutions
 
Database Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiDatabase Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiOllieShoresna
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studioDerek Kane
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsIgor José F. Freitas
 

Similar to Programmability in spss 14, 15 and 16 (20)

Presentation on spss
Presentation on spssPresentation on spss
Presentation on spss
 
Software for Qualitative and Quantitative Data Analysis
Software for Qualitative and Quantitative Data AnalysisSoftware for Qualitative and Quantitative Data Analysis
Software for Qualitative and Quantitative Data Analysis
 
Use of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing OpportunitiesUse of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing Opportunities
 
Open source analytics
Open source analyticsOpen source analytics
Open source analytics
 
What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
 
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
 
Prakash resume
Prakash resumePrakash resume
Prakash resume
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
 
statistical computation using R- report
statistical computation using R- reportstatistical computation using R- report
statistical computation using R- report
 
How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R project
 
Database Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiDatabase Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wi
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
 
Vedic Calculator
Vedic CalculatorVedic Calculator
Vedic Calculator
 

Recently uploaded

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 

Recently uploaded (20)

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 

Programmability in spss 14, 15 and 16

  • 1. Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007
  • 2.  Recap of SPSS 14 Python programmability  Developer Central  New features in SPSS 15 programmability  Writing first-class procedures  Updating the data  New features in SPSS 16 programmability  Interacting with the user  Q & A  Conclusion Copyright(c)SPSSInc,2007 Agenda
  • 3.  "Because of programmability, SPSS 14 is the most important release since I started using SPSS fifteen years ago."  "I think I am going to like using Python."  "Python and SPSS 14 and later are, IMHO, GREAT!"  "By the way, Python is a great addition to SPSS."  From InfoWorld (April 19, 2007)  "Of all the tools fueling the dynamic-language trend in the enterprise, general-purpose dynamic languages such as Python and Ruby present the greatest upside for enhancing developer productivity." Copyright(c)SPSSInc,2007 Quotations from SPSS Users
  • 4.  SPSS provides a powerful engine for statistical and graphical methods and for data management.  Python® provides a powerful, elegant, and easy-to-learn language for controlling and responding to this engine.  Together they provide a comprehensive system for serious applications of analytical methods to data. Copyright(c)SPSSInc,2007 The Combination of SPSS and Python
  • 5.  SPSS 14.0 provided  Programmability  Multiple datasets  Variable and File Attributes  Programmability read-access to case data  Ability to control SPSS from a Python program  SPSS 15 adds  Read and write case data  Create new variables directly rather than generating syntax  Create pivot tables and text blocks via backend API's  Easier setup  SPSS 16 will add  EXTENSION command for user procedures with SPSS syntax  Dataset features for complex data management  Ability to use R procedures within SPSS through R Plug-In Copyright(c)SPSSInc,2007 Programmability Features in SPSS 14, 15, and 16
  • 6.  Makes possible easy jobs that respond to datasets, output, environment  Allows greater generality, more automation  Makes jobs more robust  Allows extending the capabilities of SPSS  Enables better organized and more maintainable code  Facilitates staff specialization  Increases productivity  More fun Copyright(c)SPSSInc,2007 Programmability Advantages
  • 7.  Python extends SPSS via  General programming language  Access to variable dictionary, case data, and output  Access to standard and third-party modules  SPSS Developer Central modules  Module structure for building libraries of code  Runs in "back-end" syntax context (like macro)  SaxBasic scripting runs in "front-end" context  Two modes  Traditional SPSS syntax window  Drive SPSS from Python (external mode)  Optional install (licensed with SPSS Base) Copyright(c)SPSSInc,2007 Programmability Overview
  • 8.  SPSS is not the owner or licensor of the Python software. Any user of Python must agree to the terms of the Python license agreement located on the Python web site. SPSS is not making any statement about the quality of the Python program. SPSS fully disclaims all liability associated with your use of the Python program. Copyright(c)SPSSInc,2007 Legal Notice
  • 9.  Supports implementing various programming languages  Requires a programmer to implement a new language  VB.NET Plug-In available on Developer Central  Works only in external mode Copyright(c)SPSSInc,2007 The SPSS Programmability Software Development Kit
  • 10.  Python interpreter embedded within SPSS  SPSS runs in traditional way until BEGIN PROGRAM command is found  Python collects commands until END PROGRAM command is found; then runs the program  Python can communicate with SPSS through API's (calls to functions)  Includes running SPSS syntax inside Python program  Includes creating macro values for later use in syntax  Python can access SPSS output and data  OMS is a key tool Copyright(c)SPSSInc,2007 How Programmability Works
  • 11. BEGIN PROGRAM. import spss, spssaux spssaux.GetSPSSInstallDir("SPSSDIR") spssaux.OpenDataFile("SPSSDIR/employee data.sav") # find categorical variables catVars = spssaux.VariableDict(variableLevel=['nominal', 'ordinal']) if catVars: spss.Submit("FREQ " + " ".join(catVars.variables)) # create a macro listing categorical variables spss.SetMacroValue("!catVars", " ".join(catVars.variables)) END PROGRAM. DESC !catVars. Run Copyright(c)SPSSInc,2007 Example: Summarize Categorical Variables
  • 12.  Two modes of operation  SPSS Drives mode (inside): traditional syntax context  BEGIN PROGRAM …program… END PROGRAM  Program in 14, 15, or 16 is in Python or, new in 16, in R  X Drives mode (outside): eXternal program drives SPSS  Python interpreter (or VB.NET)  No SPSS Viewer, Data Editor, or SPSS user interface  Output sent as text to the application – can be suppressed  Has performance advantages  Build programs with an IDE  Even if to be run in traditional mode Copyright(c)SPSSInc,2007 Programmability Inside or Outside SPSS
  • 14.  Be productive quickly  Get more return as you learn more  Python.org  Python Tutorial  Cheeseshop  over 2200 packages as of April 11, 2007  SPSS Developer Central  SPSS Programming and Data Management, 4th ed, 2006. Copyright(c)SPSSInc,2007 Python Resources
  • 15.  Dive Into Python book or PDF  Practical Python by Magnus Lie Hetland  Extensive examples and discussion of Python  Python Cookbook, 2nd ed by Martelli, Ravenscroft, & Ascher  Python in a Nutshell, 2nd ed by Martelli, O'Reilly  Very clear, comprehensive reference material  wxPython in Action by Rappin and Dunn  Explains user interface building with wxPython Copyright(c)SPSSInc,2007 Python Books
  • 16.  scipy 0.5.2 Scientific Algorithms Library for Python  Scipy.org  scipy is an open source library of scientific tools for Python. scipy gathers a variety of high level science and engineering modules together as a single package. scipy provides modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, genetic algorithms, ODE solvers, special functions, and more. scipy requires and supplements NumPy, which provides a multidimensional array object and other basic functionality.  Python is becoming a major language for scientific computing Copyright(c)SPSSInc,2007 Cheeseshop: scipy
  • 17.  SPSS Developer Central is the web home for developing SPSS applications  Python, .NET, R Integration Plug-Ins  Supplementary modules by SPSS and others  Articles on programmability and graphics  Forums for asking questions and exchanging information  Programmability Extension SDK  Get Python itself from Python.org or CD  SPSS 14, 15 use 2.4. (2.4.3)  SPSS 16 will use 2.5  Not limited to programmability  GPL graphics  User-contributed code Key Supplementary Modules spssaux spssdata New for SPSS 15 trans extendedTransforms rake pls enhanced tables.py Copyright(c)SPSSInc,2007 SPSS Developer Central
  • 18.  tables.py module on Developer Central can merge two tables into one.  E.g., Ctables significance tests into main tables  Merge or replace cells with cells from a different table  Flexibly define the join  tables.py can also censor cells, e.g., blank statistics based on small counts.  Merge example: data on importance of education qualifications for immigration by region of Europe  CTABLES /TABLE qfimeduBin BY Region /TITLES TITLE='Qualifications for Immigration' /COMPARETEST TYPE=PROP Copyright(c)SPSSInc,2007 Example: Manipulating Output: Merging Tables
  • 20. BEGIN PROGRAM. import spss, tables cmd=r"""CTABLES /TABLE qfimeduBin BY Region /TITLES TITLE='Qualifications for Immigration' /COMPARETEST TYPE=PROP""" tables.mergeLatest(cmd, autofit=False) END PROGRAM.  Runs Ctables and merges test table into main table  Using default merge behavior  "If it really is this simple this will generate a lot of excitement for us."  "This is really fantastic." Copyright(c)SPSSInc,2007 Program to Merge
  • 21. Qualifications for Immigration Comparisons of Column Proportions 974 376 1024 A B D 533 1361 B D 336 1282 A B D 574 2940 D 974 2720 A B D 1555 3543 1130 2989 B 2038 3585 C 1288 C 2540 2229 A C 1931 C 823 A C 876 1299 A C 0 1 2 3 4 5 Qualification for immigration: good educational qualifications Count (A) Western Count (B) Eastern Count (C) Northern Count (D) Southern Region of Europe Results are based on two-sided tests with significance level 0.05. For each significant pair, the key of the category with the smaller column proportion appears under the category with the larger column proportion. Copyright(c)SPSSInc,2007 Merged Output
  • 22.  You can extend SPSS capabilities by building new procedures  Or use ones that others have built  Combine SPSS procedures and transformations with Python logic  Poisson regression (SPSS 14) example using iterated CNLR  New raking procedure built over GENLOG GENLIN in SPSS 15  Calculate data aggregates in SPSS and pass to algorithm coded in Python  Raking procedure starts with AGGREGATE; uses GENLOG  Acquire case data and compute in Python  Use Python standard modules and third-party additions  Partial Least Squares Regression (pls module) Copyright(c)SPSSInc,2007 Approaches to Creating New Procedures
  • 23.  Common to adapt existing libraries or code for use as Python extension modules  C, C++, VB, Fortran,...  Python tools and API's to assist  Chap 25 in Python in a Nutshell  Tutorial on extending and embedding the Python interpreter  Call R programs with SPSS 16 Copyright(c)SPSSInc,2007 Adapt Existing Code Libraries
  • 24.  Regression with large number of predictors (even k > N)  Similar to Principal Components but considers dependent variable simultaneously  Calculates principal components of (y, X) then use regression on the scores instead of original data  Equivalent to ordinary regression when number of factors equals number of predictors and one y variable  For more information see An Optimization Perspective on Kernel Partial Least Squares Regression.pdf. Copyright(c)SPSSInc,2007 Partial Least Squares Regression
  • 25.  Strategy  Fetches data from SPSS  Uses scipy matrix operations to compute results  Third-party module from Cheeseshop  Writes pivot tables to SPSS Viewer  Subject to OMS  SPSS 14 viewer module created pivot table using OLE automation  SPSS 15 has direct pivot table API's  Saves predicted values to active dataset Copyright(c)SPSSInc,2007 The pls Module for SPSS 15
  • 26. GET FILE="c:/spss15/tutorial/sample_files/car_sales.sav". REGRESSION /STATISTICS COEFF R /DEPENDENT sales /METHOD=ENTER curb_wgt engine_s fuel_cap horsepow length mpg price resale type wheelbas width . begin program. import spss, pls pls.plsproc("sales", """curb_wgt engine_s fuel_cap horsepow length mpg price resale type wheelbas width""", yhat="predsales") end program.  plsproc defaults to five factors Copyright(c)SPSSInc,2007 pls Example: REGRESSION vs PLS
  • 27.  PLS with 5 factors almost equals regression with 11 variables Copyright(c)SPSSInc,2007 Results
  • 28.  User procedures can be written in Python but specified using SPSS traditional syntax  User never writes or sees Python code  Used as if a built-in SPSS command  EXTENSION command defines command to SPSS via simple XML file  Python module called with syntax already checked and processed by SPSS  More general PLS module  PLS y1 y2 y3 BY fac1 fac2 WITH z1 z2 z3 /CRITERIA LATENTFACTORS=2.  Dialog box interface tools in SPSS 17  In the meantime, use wxPython or something similar Copyright(c)SPSSInc,2007 SPSS 16 User Procedures
  • 29.  "Raking" adjusts sample weights to control totals in n dimensions  Example: data classified by age and sex with known population totals or proportions  Calculated by fitting a main effects loglinear model  Various adjustments required  Not a complete solution to reweighting  Not directly available in SPSS Copyright(c)SPSSInc,2007 Raking Sample Weights
  • 30.  Strategy: combine SPSS procedures with Python logic  rake.py (from SPSS Developer Central)  Aggregates data via AGGREGATE to new dataset  Creates new variable with control totals  Applies GENLOG, saving predicted counts  Adjusts predicted counts  Matches back into original dataset  Does not use MATCH FILES or require a SORT command  Written in one (long) day rake.rake("age sex", [{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}], finalweight="finalwt") Copyright(c)SPSSInc,2007 Raking Module
  • 31.  SPSS 14 programmability can wrap SPSS syntax in Python logic, e.g., generate COMPUTE commands on the fly  Useful when definitions can be expressed in SPSS syntax  SPSS 15 programmability can  Generate new variables directly  Add new cases directly  Create new datasets from scratch  SPSS 16 has additional dataset capabilities Copyright(c)SPSSInc,2007 Extending SPSS Transformations
  • 32.  trans module facilitates plugging in Python code to iterate over cases  Runs as an SPSS procedure  Passes the data  Adds variables to the SPSS variable dictionary  Can apply any calculation casewise  Use with  Standard Python functions (e.g., math module)  Any user-written functions or appropriate classes  Functions in extendedTransforms module Copyright(c)SPSSInc,2007 trans and extendedTransforms Modules
  • 33.  trans strategy  Pass case data through Python code writing result back to SPSS in new variables  extendedTransforms collection of 12 functions to apply to SPSS variables, including  Regular expression search/replace  soundex and nysiis functions for phonetic equivalence  Date/time conversions based on patterns Copyright(c)SPSSInc,2007 trans and extendedTransforms Modules
  • 34.  Pattern matching in text strings  If you use SPSS index or replace, you need these  Standardize string data (Mr, Mr., Herr, Senor,...)  Extract data from loosely structured text  "simvastatin-- PO 80mg TAB" -> "simvastatin", "80"  Patterns can be simple strings (as with SPSS index) or complex patterns  Pick out variable names with common parts  Can greatly simplify code Copyright(c)SPSSInc,2007 Python Regular Expressions

Editor's Notes

  1. Other new SPSS 14 features enhance programmability: multiple concurrent datasets variable and file attributes XML workspace and OMS enhancements
  2. The PythonWin I D E is available from http://starship.python.net/crew/mhammond/win32/Downloads.html. There are many other choices for a Python I D E.
  3. Graphic shows the requested custom table along with the associated table of comparisons of column proportions. Each table has the same set of row and column labels, so the tables can be easily merged.
  4. The graphic shows that the two tables from the original C tables output have been merged into one table. Each cell of the merged table contains the cell contents from the original custom table as well as the cell contents from the table of comparison of column proportions.
  5. Jon Peck can now be reached at peck@us.ibm.com.