SPSS Modeler 16 What's New!?
IBM SPSS Modeler is a comprehensive predictive analytics platform, designed to bring predictive intelligence to everyday business problems, enabling front-line employees or systems to make more effective decisions and improve outcomes. Modeler scales from desktop installations through to larger deployments that are integrated within operational systems and provides a range of advanced analytics including text analytics, entity analytics, social network analysis, automated modeling and data preparation in addition to decision management and optimization
IBM SPSS Modeler is a comprehensive predictive analytics platform, designed to bring predictive intelligence to everyday business problems, enabling front-line employees or systems to make more effective decisions and improve outcomes. Modeler scales from desktop installations through to larger deployments that are integrated within operational systems and provides a range of advanced analytics including text analytics, entity analytics, social network analysis, automated modeling and data preparation in addition to decision management and optimization.
This enables organizations to improve business processes and help people or systems consistently make the right decisions by delivering recommended actions at the point of impact. The result is a rapid ROI and the ability to proactively and repeatedly reduce costs and increase productivity.
With the addition of Modeler Gold you will see there are now 3 editions of Modeler available.
There has also been an intentional change to the order you will see the editions listed. Rather then adding capabilities and building from Professional to Premium to Gold, capabilities are being removed from Gold to Premium and then Professional.
This is to assist with setting the vision of building and deploying predictive analytics and also to make Gold the starting point for prospects.
The themes for this release of Modeler are Broader Analytics, Faster Results and Flexible Deployment
If you want to do a job properly, you use an appropriate tool for the job. For example a power tool with extensive attachments and fittings that make it better equipped to do the job you need to fix, like cutting a pipe, making a hole in the wall, or sanding a rough edge.
Not all analytic requirements are the same, the Broader Analytics theme looks at additional analytic capabilities that are new to this release, designed to address a wider range of problems you may be facing.
R is growing in popularity.
Results for the most recent Rexer Analytics Annual Data Mining Survey had R usage at approximately 70% of respondents. So if you haven’t come across it at an opportunity yet, you are either very new, or not trying hard enough.
Though its popularity it is not the only tool they are using, and in most cases, not their primary tool either!
For those that aren’t familiar with R, it is an open source programming language used for analytics.
As its programming based, it doesn’t have the sexy GUI of Modeler, but what it does have is a large dedicated user base that is constantly creating and publishing R procedures or packages. Version 16 of Modeler allows users to build and score models through the GUI, which means that the potential number of algorithms in Modeler just got a whole lot larger.
And with the use of vendor provided R extensions, can allow R to be scaled inside the database which will improve performance.
Finally if there are specific R outputs that are required, they can be generated also from Modeler.
Programming R is not going to appeal to everyone. They may not have the skill, they may prefer to use a GUI, or they may have used it that much at university or college that they just don’t like it.
So the custom dialog builder will be right up their alley.
Similar to the capability that was added in Statistics many versions ago, you can now create a dialog box and use it for R.
This means that not only are we appealing those users that want to use R in addition to Modeler, Modeler becomes appealing to those that could or need to use R but can’t.
This means that when you encounter R in an account, rather than viewing it as your competition and trying to displace it - embrace it.
We are a leader in deploying predictive analytics - adding R just strengthens the breadth of analytics that we can use - we make it very easy to use R and very scalable. On top of that, our other features enable you to do more with R - i.e. ease of data preparation - leverage ensembles, etc. And we have 40+ years of proven technology. The argument that we compete with R can really be used with a lot of what we do - you could say we compete with Netezza, etc - but our focus is analytics and allowing customers to build and deploy the best analytics - this allows them to pick and choose between our algorithms, netezza, R, etc and when you add Analytic Server to the mix, you can scale R to big data... to add the most business value.
The reason Simulation is important is that models require input data – and in many cases, that data doesn’t exist – or is uncertain.
This allows you to simulate scenarios against simulated data – we’ll provide ways to generate the simulated data, or fit the distributions from an existing data source, and then to evaluate what the simulation is telling you .
Example Use Case:
We don’t know how many sales we are likely to make next year, so by analysing historical data, we can simulate potential orders for next year which will allow us to compare our revenue target and see that given the 12 months worth of orders we are 99% likely to miss out target. So we can then determine corrective action now rather than being reactive as the year progresses.
As well as resolving individual entities this can now identify n-degree relationships between entities.
STBS are basically bins of location and timestamp data that allow you to monitor times and places where entities dwell – could be used for example in tracking shipments in time and space. This is one of a series of new things we will be doing in relation to geospatial and temporal-spatial.
There are some more improvements to Modeler that we have listed here but want to keep the focus on the key components so I am not going to go through them in detail.
Essentially, there are some data prep enhancements to support aggregations of various types (UDA – User defined aggregations and Windowed user defined aggregations) for summarizing data from a database source.
From an Entity Analytics perspective I mentioned Unleashed earlier – so worth reiterating that with the Unleashed you are able to look at n-degree relationships between entities and without Unleashed, you can anonymise data within the repository for sensitive data and the persist searches will allow streaming records to be written to the repository.
That also comes into play with the distinct node enhancements which will now provide more options on how to combine records when multiple records are encountered.
Modeler Advantage which is the modeling application available through ADM – adds some additional modeling options as well as a tree viewer. This will give those using the 3 click modeling some additional insight into how a model is working rather than just the Ta dah its done here is your model.
The last point I want to bring to your attention is the Space Time Boxes. With the increase in devices tracking our location, where we have (or haven’t) been there is a lot more interest in geo-spacial analytics. This will allow Modeler to use latitude and longitude coordinates along with a timestamp, to determine where an entity is moving and how fast they are getting there, as well as identifying hangouts, or areas in which something is dwelling for a period of time.
As an example, a taxi company may monitor where there fleet is traveling and identify hangouts where they are idle, and not earning a fare. By combining this with mobile phone data on people providing location info via an app as to where they are waiting for a taxi. The company can send alerts to those idle drivers to direct them to where they need to be.
Another example could be identifying where multiple phone users are located at the same address to bundle services, or perhaps targeted offers to an individual whose commute home takes them past a restaurant chain.
Faster Results is the next theme, which deals with improvements primarily related to performance which is a good thing they are faster, because I don’t know how we’d go trying to sell the idea that we are slowing things down….
Analytics is an iterative task. Often an analyst would like to produce the same piece of output, say a chart and explore the relationship between a number of variables for example how does customer attrition vary according to gender, marital status, age group, income group etc.. Which presents 2 options.
1- a number of nodes added to do each task, which depending on the number of options will be quite time consuming. Or
2- creating some script to generate the required output.
To address this, version 16 adds the ability to do this from the UI which will appeal to most of the users that aren’t programmers. In addition to the looping capabilities, there is greater control and flexibility around execution.
While this is a big step forward in the way of automation from the user interface, it’s not going to address every requirement, so scripting may be needed in some cases. The good news there, is that we are standardising on the language across portfolio with Python. And there is a even a paste button in the dialog box for the looping and conditional execution to enable you to work with that code as a start point.
Shifting gears a little to look at the deployment side within with in Decision Management. By evaluating Business Scenarios, you are able to understand the impact of combining rules and models on overall results. This allows fine tuning to take place and compare different options side by side and vizualise the effect before you put that scenario into production.
I mentioned the Python scripting a few slides ago, so wont cover it again. We released Scoring Adaptors last year for Teradata, Netezza and DB2 on Z. Text is now added as well as an adapter for DB2 on Lunix, Unix, Windows. For those not familiar with Scoring Adaptors, essentially they allow for predictive models to be pushed back to the database similar to SQL pushback which enables them to be built quicker and also scored within database reducing the movement of data.
Netezza, aka Pure System for Analytics adds some model management capabilities and a new 2step algorithm.
And the last thing to note is the Streaming Time Series node. This simplifies the process of building and scoring a time series model to a single step which will allow for efficiencies in real time deployment. So think deployment within an InfoSphere Stream, within a C&DS scoring service or Solution Publisher.
The last of the 3 themes is Flexible Deployment, so integration with other IBM products, new platform support and new Decision Management applications
Synergy with other IBM brands and particularly within the BA portfolio are high on the agenda for Modeler. We have had integration with Cognos BI for a while now and have been numerous requests to integrate with TM1.
Well pop the champagne its here!
Why do you care? TM1 is used for planning and that can involve forecasting. I don’t want to get indepth here and stuck on semantics over forecasting and projecting etc
So I’ll keep it short. We can access a TM1 cube as a datasource do some stuff to the data and put it back in a TM1 cube.
This means that as a seller, you now have bunch of prospects who use TM1 that can potentially benefit from adding analytics directly into what they are already doing to make it better. And for new accounts partner with your PM sellers or if you are a portfolio rep, sell a complete story that will differentiate you against traditional competitors by including Predictive Analytics to the deal.
Decision Management includes a number of applications, which could be described as a template or framework to get a project underway to address a particular problem.
These were Customer interactions, campaign optimization and claims management.
There are now 2 new applications Predictive Maintenance which is designed to make the best use of resources by predicting failures and optimizing the scheduling of service to minimize down time and maximize the use of assets/resources given constraints.
And
Demand Optimization, where by understanding the demand at an outlet level, supplies can be predicted to allocate a quantity to the outlet which will minimize sellouts and restocking and maximize profit.
I wont go over the additional items listed here as they are more technical in nature and will be covered in the technical side of the enablement and want to keep this at the right level.
So to summarize the themes I have gone through…
This slide will give you a nice summary of the take-aways for the version 16 release.
Broader Analytics – with the R integration and custom dialog boxes and simulation
Faster Results – with the extended programmability and business scenario analysis
Flexible Deployment – TM1 integration and new DM applications for predictive maintenance and demand optimization