SlideShare a Scribd company logo
© 2010 IBM Corporation
Business Analytics software
Extending and Customizing IBM SPSS Statistics with R,
Python, and .NET
Jon Peck
Senior Software Engineer, IBM
peck@us.ibm.com
November, 2010
© 2010 IBM Corporation
Business Analytics software
IBM SPSS Statistics
 IBM ® SPSS ® Statistics has an extensive command language (syntax) for data acquisition,
manipulation, and statistical and graphical procedures
 Programmability and
scripting dramatically
extend these built-in
capabilities
 Allow custom user
interfaces and output
to be produced
 Converting large SAS
applications is likely
to require the use of
programmability
2
© 2010 IBM Corporation
Business Analytics software
Agenda
 Programmability introduction
 Four examples
– Automating repetitive work:
applySyntaxToFiles
– Integrating programs and scripting:
SPSSINC MODIFY TABLES
– Adding a procedure from R:
SPSSINC QUANTILE REGRESSION
– Adding a procedure in Python:
SPSSINC TURF
3
© 2010 IBM Corporation
Business Analytics software
Programmability increases your power, flexibility, and
productivity
Generalization
–React flexibly to metadata, results, and the environment
–Benefit: Write fewer similar jobs
Automation
–Embed program logic in jobs
–Benefit: Less manual work
Extension
–Tap existing R or Python statistical modules
–Add your own or extend standard procedures and transformations
– Benefit: More capabilities
Integration
–Connect IBM SPSS Statistics inputs and outputs to other agents
– Benefit: Make IBM SPSS Statistics part of a larger production process
 More productivity and more fun4
© 2010 IBM Corporation
Business Analytics software
IBM SPSS Statistics embeds three programming
languages
Plug-ins let you extend capabilities using
–Python
–R
–.NET languages (Windows only)
Free plug-in downloads
SPSS Developer Central web site provides articles,
SPSS-written modules, plug-ins and user
contributions
–New SPSS Community on IBM myDeveloperWorks
5
© 2010 IBM Corporation
Business Analytics software
GET FILE="c:/data/important.sav".
BEGIN PROGRAM PYTHON.
import spss
print "Hello, IBM"
END PROGRAM.
DESCRIPTIVES ....
Python or R program code goes in the normal
Statistics syntax window
My first Python program
6
© 2010 IBM Corporation
Business Analytics software
 A program in the input stream can communicate
with IBM SPSS Statistics and control it and use
Python or R facilities and modules (internal mode)
spss.Submit("GET FILE='c:/data/cars.sav'.")
A Python or .NET application can embed IBM
SPSS Statistics inside itself (external mode)
–User interface does not appear
There is a lower level C API available in an SDK
Programmability combines SPSS Statistics with
Python, R, or .NET
7
© 2010 IBM Corporation
Business Analytics software
Programmability functionality is fully integrated into
IBM SPSS Statistics
Programs run in the regular syntax stream
Users can define IBM SPSS Statistics syntax
for program and scripts via Extension
mechanism.
Users can create dialog boxes and menus
using the Custom Dialog Builder.
–Not just for extensions or programs
Python and R output appears in the Viewer
–plain text
–pivot tables
–charts
8
© 2010 IBM Corporation
Business Analytics software
Python and R Programmability API's cover these areas
 State information of Statistics
 Get/Set variable dictionary information
 Get/Set data
 Get Viewer output (via xmlworkspace)
 Create tables/charts/text objects in Viewer
 Run Statistics commands (Python only)
9
© 2010 IBM Corporation
Business Analytics software
Python and VB scripting API's cover user interface and output
 Programmability is a backend (SPSS
Processor) domain
 Scripting is mainly a frontend (user
interface, including output) domain
 Managing output Viewer and objects
– tables: formatting, pivoting, editing, …
– objects: visibility, order, titles, outline
text,…
 General user interface control
 Almost anything you can do via the user
interface
 Not available for R
10
© 2010 IBM Corporation
Business Analytics software
 Statistics, graphs, and data management via Statistics
 Two pages of VB.NET code
.NET plug-in embeds Statistics inside another program
Example: Statistical Explorer
11
© 2010 IBM Corporation
Business Analytics software
Python and R are open source software
 Programmability plug-ins are an optional installation
– They are free (but require a Statistics license)
– They make possible tapping the work of the Python and R communities
– Python and R have license agreements
– IBM Non-warrenty license agreement
– For R, GPL license
12
© 2010 IBM Corporation
Business Analytics software
Extension commands eliminate need for user to learn Python
or R
Extension mechanism lets you define IBM SPSS
Statistics-style syntax for programs
IBM SPSS Statistics takes care of validation and parsing
Passes user input to a program in an easy-to-digest form
Automatically loaded when IBM SPSS Statistics starts
–Look to the user like built in commands
Easy to distribute to others
13
© 2010 IBM Corporation
Business Analytics software
Extension Name Description
PLS Partial least squares (P)
PROPOR Confidence intervals for proportions (P)
SPSSINC APRIORI Association rules (R)
SPSSINC BREUSCH PAGAN Residual heteroscedasticity tests (R)
SPSSINC HETCOR Polychoric and polyserial correlation (P+R)
SPSSINC MFP GLM Fractional polynomial generalized linear models (R)
SPSSINC QQPLOT2 Empirical Q-Q plots (R)
SPSSINC QUANTREG Quantile regression (R)
SPSSINC RAKE Adjust weights to control totals (P)
SPSSINC RANFOR & SPSSINC
RANPRED
Random forests (R)
SPSSINC RASCH Rasch models (R)
SPSSINC ROBUST REGR Robust regression (R)
SPSSINC TOBIT REGR Tobit regression (R)
SPSSINC TURF TURF analysis (P)
Some statistical extensions on Dev Central
14
© 2010 IBM Corporation
Business Analytics software
Extension Name Description
FUZZY Case-control exact and approximate matching (P)
GATHERMD Gather data file metadata (P)
HIDECOLS Hide pivot table columns (P)
SCRIPTEX SCRIPT commands with parameters (P)
SETSMACRO Syntax for using variable sets (P)
SPSSINC ANON Anonomize data (P)
SPSSINC COMPARE DATASETS Compare two sav files (P)
SPSSINC CREATE DUMMIES Create dummy variables for categories (P)
SPSSINC GETURI DATA Read data from the Internet (P)
SPSSINC MERGE TABLES Merge two pivot tables (P)
SPSSINC MODIFY OUTPUT Set Viewer outline titling and styling (P)
SPSSINC MODIFY TABLES Set pivot table cell and label styling (P)
SPSSINC TRANS Apply Python functions to cases (P)
SPSSINC TRANSLATE Translate Viewer output (P)
TEXT Create block of text in Viewer (P)
15
Some non-statistical extensions on Dev Central
© 2010 IBM Corporation
Business Analytics software
–Write Python or R functions to implement the
functionality or tap existing packages
• Use input API's to get data to Python or R
• Use output API's to create pivot tables
Can each
be a single
line of code
–For extensions,
• Define the syntax in an xml file
• Use tools in extension.py (Python) or spsspkg (R) to receive
parsed output and pass to implementing function
• New in v18: R version of extension.py
–Use the Custom Dialog Builder to create the interface
• The CDB is not just for extensions
–Test and document!
–Package and distribute
–Contributions to Developer Central are welcome
Documentation is at SPSS Developer Central
You can create and share your own additions to IBM
SPSS Statistics
16
© 2010 IBM Corporation
Business Analytics software
 Example: SPSSINC BREUSCH PAGAN
– implemented using an R package
 SPSSINC_BREUSCH_PAGAN.xml specifies the syntax to the Statistics parser
 The R mapping code in SPSSINC_BREUSCH_PAGAN.R respecifies the syntax and invokes
the executing routine with parsed parameters
– overlaps with xml syntax definition but provides additional features
SPSSINC BREUSCH PAGAN
DEPENDENT = salary ENTER = educ jobcat
/OPTIONS MISSING=LISTWISE
/SAVE RESIDUALSDATASET=resids COEFSDATASET=coefs.
Extension commands: validation and mapping from syntax to Python
or R function parameters is handled for you
17
© 2010 IBM Corporation
Business Analytics software
18
An XML file defines the syntax to the SPSS Universal Parser
© 2010 IBM Corporation
Business Analytics software
19
Python or, in this case, R code gets the parsed syntax, which is
turned into function arguments
© 2010 IBM Corporation
Business Analytics software
Expand the audience by creating IBM SPSS Statistics syntax and
dialog boxes
20
© 2010 IBM Corporation
Business Analytics software
Example I
Generalize and automate work
You have syntax files and need to process
datasets not known in advance every day
applySyntaxToFiles function applies a syntax
file to each file in input specification
21
© 2010 IBM Corporation
Business Analytics software
Apply standard processing to an unknown set of files
Produce processed data and reports
Use programmability to automate routine processes
22
© 2010 IBM Corporation
Business Analytics software
begin program.
import spss, spssaux3
spssaux3.applySyntaxToFiles(inputspec="c:/temp/parts/*.sav",
syntax = "c:/myjobs/dailychecks.sps",
outputdatadir = "c:/temp/processed",
outputfiledir = "c:/temp/processed",
logfile ="c:/temp/processed/report.txt")
end program.
dailychecks.sps could apply data cleaning rules, modify data,
and create reports
Could be run daily through Production Mode or C&DS job
scheduler or used interactively
Extended version available as SPSSINC PROCESS FILES
Use a program to drive processing
23
© 2010 IBM Corporation
Business Analytics software
Example II
Automate dynamic or static formatting of tables
Use integrated scripting for better table
presentation
24
© 2010 IBM Corporation
Business Analytics software
• TableLooks provide static formatting for entire areas of
a table
– data cells
– row and column layers
• You want tables with formatting beyond tableLooks
• Many users copy tables to Excel and manually format
them 
• Basic and Python Scripting provide programmatic way
to do formatting
• SPSSINC MODIFY TABLES provides syntax for
extensive formatting
– Eliminates need to know scripting
– Uses Extension mechanism for programs and Python
scripting
SPSSINC MODIFY TABLES extension command
manipulates table formatting and structure
25
© 2010 IBM Corporation
Business Analytics software
SPSSINC MODIFY TABLES SUBTYPE='Crosstabulation'
DIMENSION=ROWS SELECT='Std. Residual'
/STYLES TEXTSTYLE=BOLD BACKGROUNDCOLOR=255 0 0
APPLYTO='abs(x) >2'.
Use dynamic highlighting to make crosstab table
easier to read
26
© 2010 IBM Corporation
Business Analytics software
 Dialog created with
Custom Dialog Builder
 Generates extension command syntax
 Easy to distribute
Custom dialog boxes are easy to create
27
© 2010 IBM Corporation
Business Analytics software
SPSSINC MODIFY TABLES subtype='variables in the equation'
SELECT="B" "Sig."
/STYLES TEXTCOLOR = 0 0 255
BACKGROUNDCOLOR=0 255 0.
28
Use static formatting to call out parts of a table
© 2010 IBM Corporation
Business Analytics software
SPSSINC MODIFY TABLES SUBTYPE="Custom Table"
SELECT = "Total" DIMENSION=ROWS
/STYLES BACKGROUNDCOLOR=255 255 88
TEXTSTYLE = BOLD
Format CTABLES totals to call them out
29
© 2010 IBM Corporation
Business Analytics software
SPSSINC MODIFY TABLES SUBTYPE='Report' SELECT="<<ALL>>"
/STYLES APPLYTO=DATACELLS TEXTCOLOR=255 255 255
TEXTSTYLE=BOLD
CUSTOMFUNCTION="customstylefunctions.washColumnsBlue".
def washColumnsBlue(obj, i, j, numrows, numcols, section, more):
mincolor=150.
maxcolor=255.
increment = (maxcolor - mincolor)/(numcols-1)
colorvalue = round(mincolor + increment * j)
obj.SetBackgroundColorAt(i,j, RGB((mincolor, mincolor, colorvalue)))
Use custom functions for special effects
30
© 2010 IBM Corporation
Business Analytics software
31
It is possible to get carried away with this
© 2010 IBM Corporation
Business Analytics software
Example III
Extend IBM SPSS Statistics by tapping the work of the R and Python communities
Add R procedures seamlessly to IBM SPSS
Statistics
32
© 2010 IBM Corporation
Business Analytics software
R
R is a programming language for statistics
–leading edge statistics
–many contributed statistics and graphics packages
–free
R is not so easy to learn
–Documentation by experts for experts
–Feels like a complex programming language – because it is
–Syntax is a lot like C
–Error in optim(rho, f, control = control, hessian =
TRUE, method = “BFGS”) :
initial value in ‘vmmin’ is not finite
• Good for programmers(?); bad for users
R holds data in memory
R for SAS and SPSS Users, Bob Muenchen, Addison-
Wesley, 2008
33
© 2010 IBM Corporation
Business Analytics software
R procedures can be accessed from IBM SPSS
Statistics using the R plug-in
The R plug-in makes it easy to use R packages
–IBM SPSS Statistics datasets and Viewer output can be
processed by R using plug-in
–Graphical, text, and table output appear in the Viewer
• Pivot tables can be created with R code
–New IBM SPSS Statistics datasets can be created from R
–R communicates with IBM SPSS Statistics via API's in
plug-in
–Integration requires writing a little R wrapper code
–IBM SPSS Statistics can provide
• dialog box interface
• IBM SPSS Statistics-style syntax
• pivot table output
Plug-in is downloadable from Developer Central
34
© 2010 IBM Corporation
Business Analytics software
Quantile regression models conditional quantiles
Ordinary regression models conditional mean
Median regression is 50th quantile
Estimating quantiles is useful with varying spread,
asymmetries, outliers
Areas of application include
–empirical finance
• value at risk
• mutual fund investment styles
• credit scoring
–school quality
–demand analysis
–others
35
© 2010 IBM Corporation
Business Analytics software
36
SPSS QUANTILE REGRESSION
extension embeds R quantreg package
© 2010 IBM Corporation
Business Analytics software
Pivot tables and plots appear in the Viewer
37
© 2010 IBM Corporation
Business Analytics software
New datasets appear in Data Editor windows
38
© 2010 IBM Corporation
Business Analytics software
Example IV
Extend IBM SPSS Statistics by adding procedures in Python
 TURF analysis
39
© 2010 IBM Corporation
Business Analytics software
TURF Analysis is popular in market research
Total Unduplicated Reach and Frequency (TURF)
Find the highest coverage of positive responses for a small
number of questions
Example: How do you reach the largest audience by
advertising on a few kinds of sports?
• football, cricket, basketball, cycling, ...
Example: What ice cream flavors should you offer in your
shops that have three dispensing machines?
Example: What phone features should you promote?
–multi-line, voicemail, paging, internet ...
Simple FREQUENCIES does not account for overlap
40
© 2010 IBM Corporation
Business Analytics software
Must compute all
possible set unions of
positive responses (up to
a maximum number of
variables).
Each set is a list of case
ID’s with positive
response on a question.
This problem is
computationally
explosive
Calculations for best 10
combinations of variables
Variables Set Union
Calculations
3 4
6 57
12 4070
24 4,540,361
48 8,682,997,422
Is a scripting language like Python too slow?
TURF calculations are demanding
41
© 2010 IBM Corporation
Business Analytics software
Extension command SPSSINC TURF is implemented
in Python
Provides
–Dialog box interface
–IBM SPSS Statistics style syntax
–The computations
–Pivot table output
Fewer than 300 lines of Python code
–Plus dialog box definition
–Plus extension command syntax definition
Executes requests involving a few million set
comparisons in a few minutes
Initial version written in two days
42
© 2010 IBM Corporation
Business Analytics software
Telco
survey
(9 variables
1000 cases)
dialog
created
with
Custom
Dialog
Builder
Analysis of phone data
43
© 2010 IBM Corporation
Business Analytics software
Pivot
table
created
from
Python
code
Best singles are conference calling, call forwarding,
and call waiting
Results show the combination of features – best reach
44
© 2010 IBM Corporation
Business Analytics software
Calculations completed in a few seconds
The best three are not the top three one at a time
45
© 2010 IBM Corporation
Business Analytics software
Python and R integration
Unification of programs and scripts
Custom Dialog Builder
Extensions
SPSS Developer Central is your friend
Where we have been today
46
© 2010 IBM Corporation
Business Analytics software
Questions
?
?47
© 2010 IBM Corporation
Business Analytics software
Programmability increases your power, flexibility, and
productivity with IBM SPSS Statistics
Generalization and automation
–applySyntaxToFiles
–SPSS MODIFY TABLES
Extension
–SPSSINC QUANTREG using R
–SPSSINC TURF using Python
–Many new extension commands available
Integration
–applySyntaxToFiles as part of a process
And it's still more fun
48
© 2010 IBM Corporation
Business Analytics software
Jon K Peck, Ph. D.
Senior Software Engineer
IBM SPSS
peck@us.ibm.com
blog: insideout.spss.com
49
Contact

More Related Content

What's hot

Sample collection, Preservation and its Estimation
Sample collection, Preservation and its EstimationSample collection, Preservation and its Estimation
Sample collection, Preservation and its Estimation
MD Abdul Haleem
 
Physical examination of urine
Physical examination of urinePhysical examination of urine
Physical examination of urine
SUNIL KUMAR PEDDANA
 
1.4 Laboratory Equipment: Names & Uses
1.4 Laboratory Equipment:  Names & Uses1.4 Laboratory Equipment:  Names & Uses
1.4 Laboratory Equipment: Names & Uses
Cheryl Bausman
 
Waste Management at Medical Laboratories
Waste Management at Medical LaboratoriesWaste Management at Medical Laboratories
Waste Management at Medical LaboratoriesRavi Kumudesh
 
Water bath
Water bathWater bath
Blood cell counters
Blood cell countersBlood cell counters
Blood cell counters
Mohamed M. Elsaied
 
Laboratory Equipment - Use of Equipment
Laboratory Equipment - Use of EquipmentLaboratory Equipment - Use of Equipment
Laboratory Equipment - Use of Equipmentcjhiggs
 
Phlebotomy
PhlebotomyPhlebotomy
Phlebotomy
India™
 
Pipetting techniques
Pipetting techniquesPipetting techniques
Pipetting techniques
Dr. Md Ashraf Ali Namaji
 
Iron deficiency anemia medicaldump.com
Iron deficiency anemia   medicaldump.comIron deficiency anemia   medicaldump.com
Iron deficiency anemia medicaldump.commedicaldump
 
Common Glassware used in Lab
Common Glassware used in LabCommon Glassware used in Lab
Common Glassware used in Lab
FazalAbbas45
 
Lab sample collection techniques pathology
Lab sample collection techniques   pathologyLab sample collection techniques   pathology
Lab sample collection techniques pathology
KIRAN KUMAR EPARI
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
Mithilesh Trivedi
 

What's hot (14)

Sample collection, Preservation and its Estimation
Sample collection, Preservation and its EstimationSample collection, Preservation and its Estimation
Sample collection, Preservation and its Estimation
 
Physical examination of urine
Physical examination of urinePhysical examination of urine
Physical examination of urine
 
1.4 Laboratory Equipment: Names & Uses
1.4 Laboratory Equipment:  Names & Uses1.4 Laboratory Equipment:  Names & Uses
1.4 Laboratory Equipment: Names & Uses
 
Waste Management at Medical Laboratories
Waste Management at Medical LaboratoriesWaste Management at Medical Laboratories
Waste Management at Medical Laboratories
 
Compatibility testing
Compatibility testingCompatibility testing
Compatibility testing
 
Water bath
Water bathWater bath
Water bath
 
Blood cell counters
Blood cell countersBlood cell counters
Blood cell counters
 
Laboratory Equipment - Use of Equipment
Laboratory Equipment - Use of EquipmentLaboratory Equipment - Use of Equipment
Laboratory Equipment - Use of Equipment
 
Phlebotomy
PhlebotomyPhlebotomy
Phlebotomy
 
Pipetting techniques
Pipetting techniquesPipetting techniques
Pipetting techniques
 
Iron deficiency anemia medicaldump.com
Iron deficiency anemia   medicaldump.comIron deficiency anemia   medicaldump.com
Iron deficiency anemia medicaldump.com
 
Common Glassware used in Lab
Common Glassware used in LabCommon Glassware used in Lab
Common Glassware used in Lab
 
Lab sample collection techniques pathology
Lab sample collection techniques   pathologyLab sample collection techniques   pathology
Lab sample collection techniques pathology
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 

Viewers also liked

Programmability in spss statistics 17
Programmability in spss statistics 17Programmability in spss statistics 17
Programmability in spss statistics 17
Armand Ruis
 
Programmability in spss 15
Programmability in spss 15Programmability in spss 15
Programmability in spss 15
Armand Ruis
 
Programmability in spss 14, 15 and 16
Programmability in spss 14, 15 and 16Programmability in spss 14, 15 and 16
Programmability in spss 14, 15 and 16
Armand Ruis
 
Programmability in spss 14
Programmability in spss 14Programmability in spss 14
Programmability in spss 14
Armand Ruis
 
Submitting a SPSS Extension To the Community
Submitting a SPSS Extension To the CommunitySubmitting a SPSS Extension To the Community
Submitting a SPSS Extension To the Community
Greg Filla
 
Two-factor Mixed MANOVA with SPSS
Two-factor Mixed MANOVA with SPSSTwo-factor Mixed MANOVA with SPSS
Two-factor Mixed MANOVA with SPSS
J P Verma
 
Docopt, beautiful command-line options for R, user2014
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014
Edwin de Jonge
 
R Statistics
R StatisticsR Statistics
R Statistics
r content
 
Statistics with R
Statistics with RStatistics with R
Statistics with R
Johnson Hsieh
 
Getting Up to Speed with R: Certificate Program in R for Statistical Analysis...
Getting Up to Speed with R: Certificate Program in R for Statistical Analysis...Getting Up to Speed with R: Certificate Program in R for Statistical Analysis...
Getting Up to Speed with R: Certificate Program in R for Statistical Analysis...
Revolution Analytics
 
3 descriptive statistics with R
3 descriptive statistics with R3 descriptive statistics with R
3 descriptive statistics with R
naroranisha
 
Presentation on use of r statistics
Presentation on use of r statisticsPresentation on use of r statistics
Presentation on use of r statistics
Krishna Dhakal
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with RKazuki Yoshida
 
Using R For Statistics
Using R For StatisticsUsing R For Statistics
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statistics
IBM
 
How to use Logistic Regression in GIS using ArcGIS and R statistics
How to use Logistic Regression in GIS using ArcGIS and R statisticsHow to use Logistic Regression in GIS using ArcGIS and R statistics
How to use Logistic Regression in GIS using ArcGIS and R statistics
Omar F. Althuwaynee
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
Syracuse University
 
عرض محاضرة كيفية انشاء المطاعم لرؤوس الاموال الناشئة والمبتدئة
عرض محاضرة كيفية انشاء المطاعم لرؤوس الاموال الناشئة والمبتدئةعرض محاضرة كيفية انشاء المطاعم لرؤوس الاموال الناشئة والمبتدئة
عرض محاضرة كيفية انشاء المطاعم لرؤوس الاموال الناشئة والمبتدئة
Mazen AlDarrab
 

Viewers also liked (20)

Programmability in spss statistics 17
Programmability in spss statistics 17Programmability in spss statistics 17
Programmability in spss statistics 17
 
Programmability in spss 15
Programmability in spss 15Programmability in spss 15
Programmability in spss 15
 
Programmability in spss 14, 15 and 16
Programmability in spss 14, 15 and 16Programmability in spss 14, 15 and 16
Programmability in spss 14, 15 and 16
 
Programmability in spss 14
Programmability in spss 14Programmability in spss 14
Programmability in spss 14
 
Submitting a SPSS Extension To the Community
Submitting a SPSS Extension To the CommunitySubmitting a SPSS Extension To the Community
Submitting a SPSS Extension To the Community
 
spss Help
spss Helpspss Help
spss Help
 
Two-factor Mixed MANOVA with SPSS
Two-factor Mixed MANOVA with SPSSTwo-factor Mixed MANOVA with SPSS
Two-factor Mixed MANOVA with SPSS
 
Seefeld stats r_bio
Seefeld stats r_bioSeefeld stats r_bio
Seefeld stats r_bio
 
Docopt, beautiful command-line options for R, user2014
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014
 
R Statistics
R StatisticsR Statistics
R Statistics
 
Statistics with R
Statistics with RStatistics with R
Statistics with R
 
Getting Up to Speed with R: Certificate Program in R for Statistical Analysis...
Getting Up to Speed with R: Certificate Program in R for Statistical Analysis...Getting Up to Speed with R: Certificate Program in R for Statistical Analysis...
Getting Up to Speed with R: Certificate Program in R for Statistical Analysis...
 
3 descriptive statistics with R
3 descriptive statistics with R3 descriptive statistics with R
3 descriptive statistics with R
 
Presentation on use of r statistics
Presentation on use of r statisticsPresentation on use of r statistics
Presentation on use of r statistics
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
 
Using R For Statistics
Using R For StatisticsUsing R For Statistics
Using R For Statistics
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statistics
 
How to use Logistic Regression in GIS using ArcGIS and R statistics
How to use Logistic Regression in GIS using ArcGIS and R statisticsHow to use Logistic Regression in GIS using ArcGIS and R statistics
How to use Logistic Regression in GIS using ArcGIS and R statistics
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
 
عرض محاضرة كيفية انشاء المطاعم لرؤوس الاموال الناشئة والمبتدئة
عرض محاضرة كيفية انشاء المطاعم لرؤوس الاموال الناشئة والمبتدئةعرض محاضرة كيفية انشاء المطاعم لرؤوس الاموال الناشئة والمبتدئة
عرض محاضرة كيفية انشاء المطاعم لرؤوس الاموال الناشئة والمبتدئة
 

Similar to Extending and customizing ibm spss statistics with python, r, and .net (2)

Use of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing OpportunitiesUse of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing Opportunities
Maurice Dawson
 
Ibm Cognos B Iund Pmfj
Ibm Cognos B Iund PmfjIbm Cognos B Iund Pmfj
Ibm Cognos B Iund Pmfj
Friedel Jonker
 
Highlights of the Telecommunications Event Data Analytics toolkit
Highlights of the Telecommunications Event Data Analytics toolkitHighlights of the Telecommunications Event Data Analytics toolkit
Highlights of the Telecommunications Event Data Analytics toolkit
lisanl
 
Software Archaeology with RDz and RAA
Software Archaeology with RDz and RAASoftware Archaeology with RDz and RAA
Software Archaeology with RDz and RAA
Strongback Consulting
 
Cognos CIO CEE 2010 Prague CZE
Cognos CIO CEE 2010 Prague CZECognos CIO CEE 2010 Prague CZE
Cognos CIO CEE 2010 Prague CZEStepan Kutaj
 
Large Scale Production DITA landscape @SAP
Large Scale Production DITA landscape @SAPLarge Scale Production DITA landscape @SAP
Large Scale Production DITA landscape @SAP
Youssef Bennani
 
IBM Performance Optimizaiton Toolkit for Rational Application Developer
IBM Performance Optimizaiton Toolkit for Rational Application DeveloperIBM Performance Optimizaiton Toolkit for Rational Application Developer
IBM Performance Optimizaiton Toolkit for Rational Application Developer
Ashish Patel
 
Fist Global Initiative Presentation
Fist Global Initiative PresentationFist Global Initiative Presentation
Fist Global Initiative PresentationShan Kane
 
Webinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroWebinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence Intro
SpagoWorld
 
InduSoft Web Studio Driver Overview – SITIA and ABCIP
InduSoft Web Studio Driver Overview – SITIA and ABCIPInduSoft Web Studio Driver Overview – SITIA and ABCIP
InduSoft Web Studio Driver Overview – SITIA and ABCIP
AVEVA
 
3158 - Cloud Infrastructure & It Optimization - Application Performance Manag...
3158 - Cloud Infrastructure & It Optimization - Application Performance Manag...3158 - Cloud Infrastructure & It Optimization - Application Performance Manag...
3158 - Cloud Infrastructure & It Optimization - Application Performance Manag...
Sandeep Chellingi
 
P6 upgrade paths - Oracle Primavera P6 Collaborate 14
P6 upgrade paths  - Oracle Primavera P6 Collaborate 14P6 upgrade paths  - Oracle Primavera P6 Collaborate 14
P6 upgrade paths - Oracle Primavera P6 Collaborate 14
p6academy
 
Oracle Primavera P6 partner programs
Oracle Primavera P6 partner programsOracle Primavera P6 partner programs
Oracle Primavera P6 partner programs
Mark Kromer
 
Fox formula in sap bi integrated planning
Fox formula in sap bi integrated planningFox formula in sap bi integrated planning
Fox formula in sap bi integrated planning
Venkatesh Yellamelli
 
Creating attachments to work items or to user decisions in workflows
Creating attachments to work items or to user decisions in workflowsCreating attachments to work items or to user decisions in workflows
Creating attachments to work items or to user decisions in workflows
Hicham Khallouki
 
SAP performance testing & engineering courseware v01
SAP performance testing & engineering courseware v01SAP performance testing & engineering courseware v01
SAP performance testing & engineering courseware v01
Argos
 
ITCamp 2018 - Andrea Martorana Tusa - Failure prediction for manufacturing in...
ITCamp 2018 - Andrea Martorana Tusa - Failure prediction for manufacturing in...ITCamp 2018 - Andrea Martorana Tusa - Failure prediction for manufacturing in...
ITCamp 2018 - Andrea Martorana Tusa - Failure prediction for manufacturing in...
ITCamp
 
Webinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroWebinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence Intro
SpagoWorld
 

Similar to Extending and customizing ibm spss statistics with python, r, and .net (2) (20)

Use of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing OpportunitiesUse of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing Opportunities
 
Ibm Cognos B Iund Pmfj
Ibm Cognos B Iund PmfjIbm Cognos B Iund Pmfj
Ibm Cognos B Iund Pmfj
 
Highlights of the Telecommunications Event Data Analytics toolkit
Highlights of the Telecommunications Event Data Analytics toolkitHighlights of the Telecommunications Event Data Analytics toolkit
Highlights of the Telecommunications Event Data Analytics toolkit
 
Software Archaeology with RDz and RAA
Software Archaeology with RDz and RAASoftware Archaeology with RDz and RAA
Software Archaeology with RDz and RAA
 
Cognos CIO CEE 2010 Prague CZE
Cognos CIO CEE 2010 Prague CZECognos CIO CEE 2010 Prague CZE
Cognos CIO CEE 2010 Prague CZE
 
Large Scale Production DITA landscape @SAP
Large Scale Production DITA landscape @SAPLarge Scale Production DITA landscape @SAP
Large Scale Production DITA landscape @SAP
 
IBM Performance Optimizaiton Toolkit for Rational Application Developer
IBM Performance Optimizaiton Toolkit for Rational Application DeveloperIBM Performance Optimizaiton Toolkit for Rational Application Developer
IBM Performance Optimizaiton Toolkit for Rational Application Developer
 
Fist Global Initiative Presentation
Fist Global Initiative PresentationFist Global Initiative Presentation
Fist Global Initiative Presentation
 
Webinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroWebinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence Intro
 
resume
resumeresume
resume
 
resume
resumeresume
resume
 
InduSoft Web Studio Driver Overview – SITIA and ABCIP
InduSoft Web Studio Driver Overview – SITIA and ABCIPInduSoft Web Studio Driver Overview – SITIA and ABCIP
InduSoft Web Studio Driver Overview – SITIA and ABCIP
 
3158 - Cloud Infrastructure & It Optimization - Application Performance Manag...
3158 - Cloud Infrastructure & It Optimization - Application Performance Manag...3158 - Cloud Infrastructure & It Optimization - Application Performance Manag...
3158 - Cloud Infrastructure & It Optimization - Application Performance Manag...
 
P6 upgrade paths - Oracle Primavera P6 Collaborate 14
P6 upgrade paths  - Oracle Primavera P6 Collaborate 14P6 upgrade paths  - Oracle Primavera P6 Collaborate 14
P6 upgrade paths - Oracle Primavera P6 Collaborate 14
 
Oracle Primavera P6 partner programs
Oracle Primavera P6 partner programsOracle Primavera P6 partner programs
Oracle Primavera P6 partner programs
 
Fox formula in sap bi integrated planning
Fox formula in sap bi integrated planningFox formula in sap bi integrated planning
Fox formula in sap bi integrated planning
 
Creating attachments to work items or to user decisions in workflows
Creating attachments to work items or to user decisions in workflowsCreating attachments to work items or to user decisions in workflows
Creating attachments to work items or to user decisions in workflows
 
SAP performance testing & engineering courseware v01
SAP performance testing & engineering courseware v01SAP performance testing & engineering courseware v01
SAP performance testing & engineering courseware v01
 
ITCamp 2018 - Andrea Martorana Tusa - Failure prediction for manufacturing in...
ITCamp 2018 - Andrea Martorana Tusa - Failure prediction for manufacturing in...ITCamp 2018 - Andrea Martorana Tusa - Failure prediction for manufacturing in...
ITCamp 2018 - Andrea Martorana Tusa - Failure prediction for manufacturing in...
 
Webinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroWebinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence Intro
 

Recently uploaded

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 

Recently uploaded (20)

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 

Extending and customizing ibm spss statistics with python, r, and .net (2)

  • 1. © 2010 IBM Corporation Business Analytics software Extending and Customizing IBM SPSS Statistics with R, Python, and .NET Jon Peck Senior Software Engineer, IBM peck@us.ibm.com November, 2010
  • 2. © 2010 IBM Corporation Business Analytics software IBM SPSS Statistics  IBM ® SPSS ® Statistics has an extensive command language (syntax) for data acquisition, manipulation, and statistical and graphical procedures  Programmability and scripting dramatically extend these built-in capabilities  Allow custom user interfaces and output to be produced  Converting large SAS applications is likely to require the use of programmability 2
  • 3. © 2010 IBM Corporation Business Analytics software Agenda  Programmability introduction  Four examples – Automating repetitive work: applySyntaxToFiles – Integrating programs and scripting: SPSSINC MODIFY TABLES – Adding a procedure from R: SPSSINC QUANTILE REGRESSION – Adding a procedure in Python: SPSSINC TURF 3
  • 4. © 2010 IBM Corporation Business Analytics software Programmability increases your power, flexibility, and productivity Generalization –React flexibly to metadata, results, and the environment –Benefit: Write fewer similar jobs Automation –Embed program logic in jobs –Benefit: Less manual work Extension –Tap existing R or Python statistical modules –Add your own or extend standard procedures and transformations – Benefit: More capabilities Integration –Connect IBM SPSS Statistics inputs and outputs to other agents – Benefit: Make IBM SPSS Statistics part of a larger production process  More productivity and more fun4
  • 5. © 2010 IBM Corporation Business Analytics software IBM SPSS Statistics embeds three programming languages Plug-ins let you extend capabilities using –Python –R –.NET languages (Windows only) Free plug-in downloads SPSS Developer Central web site provides articles, SPSS-written modules, plug-ins and user contributions –New SPSS Community on IBM myDeveloperWorks 5
  • 6. © 2010 IBM Corporation Business Analytics software GET FILE="c:/data/important.sav". BEGIN PROGRAM PYTHON. import spss print "Hello, IBM" END PROGRAM. DESCRIPTIVES .... Python or R program code goes in the normal Statistics syntax window My first Python program 6
  • 7. © 2010 IBM Corporation Business Analytics software  A program in the input stream can communicate with IBM SPSS Statistics and control it and use Python or R facilities and modules (internal mode) spss.Submit("GET FILE='c:/data/cars.sav'.") A Python or .NET application can embed IBM SPSS Statistics inside itself (external mode) –User interface does not appear There is a lower level C API available in an SDK Programmability combines SPSS Statistics with Python, R, or .NET 7
  • 8. © 2010 IBM Corporation Business Analytics software Programmability functionality is fully integrated into IBM SPSS Statistics Programs run in the regular syntax stream Users can define IBM SPSS Statistics syntax for program and scripts via Extension mechanism. Users can create dialog boxes and menus using the Custom Dialog Builder. –Not just for extensions or programs Python and R output appears in the Viewer –plain text –pivot tables –charts 8
  • 9. © 2010 IBM Corporation Business Analytics software Python and R Programmability API's cover these areas  State information of Statistics  Get/Set variable dictionary information  Get/Set data  Get Viewer output (via xmlworkspace)  Create tables/charts/text objects in Viewer  Run Statistics commands (Python only) 9
  • 10. © 2010 IBM Corporation Business Analytics software Python and VB scripting API's cover user interface and output  Programmability is a backend (SPSS Processor) domain  Scripting is mainly a frontend (user interface, including output) domain  Managing output Viewer and objects – tables: formatting, pivoting, editing, … – objects: visibility, order, titles, outline text,…  General user interface control  Almost anything you can do via the user interface  Not available for R 10
  • 11. © 2010 IBM Corporation Business Analytics software  Statistics, graphs, and data management via Statistics  Two pages of VB.NET code .NET plug-in embeds Statistics inside another program Example: Statistical Explorer 11
  • 12. © 2010 IBM Corporation Business Analytics software Python and R are open source software  Programmability plug-ins are an optional installation – They are free (but require a Statistics license) – They make possible tapping the work of the Python and R communities – Python and R have license agreements – IBM Non-warrenty license agreement – For R, GPL license 12
  • 13. © 2010 IBM Corporation Business Analytics software Extension commands eliminate need for user to learn Python or R Extension mechanism lets you define IBM SPSS Statistics-style syntax for programs IBM SPSS Statistics takes care of validation and parsing Passes user input to a program in an easy-to-digest form Automatically loaded when IBM SPSS Statistics starts –Look to the user like built in commands Easy to distribute to others 13
  • 14. © 2010 IBM Corporation Business Analytics software Extension Name Description PLS Partial least squares (P) PROPOR Confidence intervals for proportions (P) SPSSINC APRIORI Association rules (R) SPSSINC BREUSCH PAGAN Residual heteroscedasticity tests (R) SPSSINC HETCOR Polychoric and polyserial correlation (P+R) SPSSINC MFP GLM Fractional polynomial generalized linear models (R) SPSSINC QQPLOT2 Empirical Q-Q plots (R) SPSSINC QUANTREG Quantile regression (R) SPSSINC RAKE Adjust weights to control totals (P) SPSSINC RANFOR & SPSSINC RANPRED Random forests (R) SPSSINC RASCH Rasch models (R) SPSSINC ROBUST REGR Robust regression (R) SPSSINC TOBIT REGR Tobit regression (R) SPSSINC TURF TURF analysis (P) Some statistical extensions on Dev Central 14
  • 15. © 2010 IBM Corporation Business Analytics software Extension Name Description FUZZY Case-control exact and approximate matching (P) GATHERMD Gather data file metadata (P) HIDECOLS Hide pivot table columns (P) SCRIPTEX SCRIPT commands with parameters (P) SETSMACRO Syntax for using variable sets (P) SPSSINC ANON Anonomize data (P) SPSSINC COMPARE DATASETS Compare two sav files (P) SPSSINC CREATE DUMMIES Create dummy variables for categories (P) SPSSINC GETURI DATA Read data from the Internet (P) SPSSINC MERGE TABLES Merge two pivot tables (P) SPSSINC MODIFY OUTPUT Set Viewer outline titling and styling (P) SPSSINC MODIFY TABLES Set pivot table cell and label styling (P) SPSSINC TRANS Apply Python functions to cases (P) SPSSINC TRANSLATE Translate Viewer output (P) TEXT Create block of text in Viewer (P) 15 Some non-statistical extensions on Dev Central
  • 16. © 2010 IBM Corporation Business Analytics software –Write Python or R functions to implement the functionality or tap existing packages • Use input API's to get data to Python or R • Use output API's to create pivot tables Can each be a single line of code –For extensions, • Define the syntax in an xml file • Use tools in extension.py (Python) or spsspkg (R) to receive parsed output and pass to implementing function • New in v18: R version of extension.py –Use the Custom Dialog Builder to create the interface • The CDB is not just for extensions –Test and document! –Package and distribute –Contributions to Developer Central are welcome Documentation is at SPSS Developer Central You can create and share your own additions to IBM SPSS Statistics 16
  • 17. © 2010 IBM Corporation Business Analytics software  Example: SPSSINC BREUSCH PAGAN – implemented using an R package  SPSSINC_BREUSCH_PAGAN.xml specifies the syntax to the Statistics parser  The R mapping code in SPSSINC_BREUSCH_PAGAN.R respecifies the syntax and invokes the executing routine with parsed parameters – overlaps with xml syntax definition but provides additional features SPSSINC BREUSCH PAGAN DEPENDENT = salary ENTER = educ jobcat /OPTIONS MISSING=LISTWISE /SAVE RESIDUALSDATASET=resids COEFSDATASET=coefs. Extension commands: validation and mapping from syntax to Python or R function parameters is handled for you 17
  • 18. © 2010 IBM Corporation Business Analytics software 18 An XML file defines the syntax to the SPSS Universal Parser
  • 19. © 2010 IBM Corporation Business Analytics software 19 Python or, in this case, R code gets the parsed syntax, which is turned into function arguments
  • 20. © 2010 IBM Corporation Business Analytics software Expand the audience by creating IBM SPSS Statistics syntax and dialog boxes 20
  • 21. © 2010 IBM Corporation Business Analytics software Example I Generalize and automate work You have syntax files and need to process datasets not known in advance every day applySyntaxToFiles function applies a syntax file to each file in input specification 21
  • 22. © 2010 IBM Corporation Business Analytics software Apply standard processing to an unknown set of files Produce processed data and reports Use programmability to automate routine processes 22
  • 23. © 2010 IBM Corporation Business Analytics software begin program. import spss, spssaux3 spssaux3.applySyntaxToFiles(inputspec="c:/temp/parts/*.sav", syntax = "c:/myjobs/dailychecks.sps", outputdatadir = "c:/temp/processed", outputfiledir = "c:/temp/processed", logfile ="c:/temp/processed/report.txt") end program. dailychecks.sps could apply data cleaning rules, modify data, and create reports Could be run daily through Production Mode or C&DS job scheduler or used interactively Extended version available as SPSSINC PROCESS FILES Use a program to drive processing 23
  • 24. © 2010 IBM Corporation Business Analytics software Example II Automate dynamic or static formatting of tables Use integrated scripting for better table presentation 24
  • 25. © 2010 IBM Corporation Business Analytics software • TableLooks provide static formatting for entire areas of a table – data cells – row and column layers • You want tables with formatting beyond tableLooks • Many users copy tables to Excel and manually format them  • Basic and Python Scripting provide programmatic way to do formatting • SPSSINC MODIFY TABLES provides syntax for extensive formatting – Eliminates need to know scripting – Uses Extension mechanism for programs and Python scripting SPSSINC MODIFY TABLES extension command manipulates table formatting and structure 25
  • 26. © 2010 IBM Corporation Business Analytics software SPSSINC MODIFY TABLES SUBTYPE='Crosstabulation' DIMENSION=ROWS SELECT='Std. Residual' /STYLES TEXTSTYLE=BOLD BACKGROUNDCOLOR=255 0 0 APPLYTO='abs(x) >2'. Use dynamic highlighting to make crosstab table easier to read 26
  • 27. © 2010 IBM Corporation Business Analytics software  Dialog created with Custom Dialog Builder  Generates extension command syntax  Easy to distribute Custom dialog boxes are easy to create 27
  • 28. © 2010 IBM Corporation Business Analytics software SPSSINC MODIFY TABLES subtype='variables in the equation' SELECT="B" "Sig." /STYLES TEXTCOLOR = 0 0 255 BACKGROUNDCOLOR=0 255 0. 28 Use static formatting to call out parts of a table
  • 29. © 2010 IBM Corporation Business Analytics software SPSSINC MODIFY TABLES SUBTYPE="Custom Table" SELECT = "Total" DIMENSION=ROWS /STYLES BACKGROUNDCOLOR=255 255 88 TEXTSTYLE = BOLD Format CTABLES totals to call them out 29
  • 30. © 2010 IBM Corporation Business Analytics software SPSSINC MODIFY TABLES SUBTYPE='Report' SELECT="<<ALL>>" /STYLES APPLYTO=DATACELLS TEXTCOLOR=255 255 255 TEXTSTYLE=BOLD CUSTOMFUNCTION="customstylefunctions.washColumnsBlue". def washColumnsBlue(obj, i, j, numrows, numcols, section, more): mincolor=150. maxcolor=255. increment = (maxcolor - mincolor)/(numcols-1) colorvalue = round(mincolor + increment * j) obj.SetBackgroundColorAt(i,j, RGB((mincolor, mincolor, colorvalue))) Use custom functions for special effects 30
  • 31. © 2010 IBM Corporation Business Analytics software 31 It is possible to get carried away with this
  • 32. © 2010 IBM Corporation Business Analytics software Example III Extend IBM SPSS Statistics by tapping the work of the R and Python communities Add R procedures seamlessly to IBM SPSS Statistics 32
  • 33. © 2010 IBM Corporation Business Analytics software R R is a programming language for statistics –leading edge statistics –many contributed statistics and graphics packages –free R is not so easy to learn –Documentation by experts for experts –Feels like a complex programming language – because it is –Syntax is a lot like C –Error in optim(rho, f, control = control, hessian = TRUE, method = “BFGS”) : initial value in ‘vmmin’ is not finite • Good for programmers(?); bad for users R holds data in memory R for SAS and SPSS Users, Bob Muenchen, Addison- Wesley, 2008 33
  • 34. © 2010 IBM Corporation Business Analytics software R procedures can be accessed from IBM SPSS Statistics using the R plug-in The R plug-in makes it easy to use R packages –IBM SPSS Statistics datasets and Viewer output can be processed by R using plug-in –Graphical, text, and table output appear in the Viewer • Pivot tables can be created with R code –New IBM SPSS Statistics datasets can be created from R –R communicates with IBM SPSS Statistics via API's in plug-in –Integration requires writing a little R wrapper code –IBM SPSS Statistics can provide • dialog box interface • IBM SPSS Statistics-style syntax • pivot table output Plug-in is downloadable from Developer Central 34
  • 35. © 2010 IBM Corporation Business Analytics software Quantile regression models conditional quantiles Ordinary regression models conditional mean Median regression is 50th quantile Estimating quantiles is useful with varying spread, asymmetries, outliers Areas of application include –empirical finance • value at risk • mutual fund investment styles • credit scoring –school quality –demand analysis –others 35
  • 36. © 2010 IBM Corporation Business Analytics software 36 SPSS QUANTILE REGRESSION extension embeds R quantreg package
  • 37. © 2010 IBM Corporation Business Analytics software Pivot tables and plots appear in the Viewer 37
  • 38. © 2010 IBM Corporation Business Analytics software New datasets appear in Data Editor windows 38
  • 39. © 2010 IBM Corporation Business Analytics software Example IV Extend IBM SPSS Statistics by adding procedures in Python  TURF analysis 39
  • 40. © 2010 IBM Corporation Business Analytics software TURF Analysis is popular in market research Total Unduplicated Reach and Frequency (TURF) Find the highest coverage of positive responses for a small number of questions Example: How do you reach the largest audience by advertising on a few kinds of sports? • football, cricket, basketball, cycling, ... Example: What ice cream flavors should you offer in your shops that have three dispensing machines? Example: What phone features should you promote? –multi-line, voicemail, paging, internet ... Simple FREQUENCIES does not account for overlap 40
  • 41. © 2010 IBM Corporation Business Analytics software Must compute all possible set unions of positive responses (up to a maximum number of variables). Each set is a list of case ID’s with positive response on a question. This problem is computationally explosive Calculations for best 10 combinations of variables Variables Set Union Calculations 3 4 6 57 12 4070 24 4,540,361 48 8,682,997,422 Is a scripting language like Python too slow? TURF calculations are demanding 41
  • 42. © 2010 IBM Corporation Business Analytics software Extension command SPSSINC TURF is implemented in Python Provides –Dialog box interface –IBM SPSS Statistics style syntax –The computations –Pivot table output Fewer than 300 lines of Python code –Plus dialog box definition –Plus extension command syntax definition Executes requests involving a few million set comparisons in a few minutes Initial version written in two days 42
  • 43. © 2010 IBM Corporation Business Analytics software Telco survey (9 variables 1000 cases) dialog created with Custom Dialog Builder Analysis of phone data 43
  • 44. © 2010 IBM Corporation Business Analytics software Pivot table created from Python code Best singles are conference calling, call forwarding, and call waiting Results show the combination of features – best reach 44
  • 45. © 2010 IBM Corporation Business Analytics software Calculations completed in a few seconds The best three are not the top three one at a time 45
  • 46. © 2010 IBM Corporation Business Analytics software Python and R integration Unification of programs and scripts Custom Dialog Builder Extensions SPSS Developer Central is your friend Where we have been today 46
  • 47. © 2010 IBM Corporation Business Analytics software Questions ? ?47
  • 48. © 2010 IBM Corporation Business Analytics software Programmability increases your power, flexibility, and productivity with IBM SPSS Statistics Generalization and automation –applySyntaxToFiles –SPSS MODIFY TABLES Extension –SPSSINC QUANTREG using R –SPSSINC TURF using Python –Many new extension commands available Integration –applySyntaxToFiles as part of a process And it's still more fun 48
  • 49. © 2010 IBM Corporation Business Analytics software Jon K Peck, Ph. D. Senior Software Engineer IBM SPSS peck@us.ibm.com blog: insideout.spss.com 49 Contact