Gain Insight and Improve Performance with Data Mining
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Gain Insight and Improve Performance with Data Mining

on

  • 871 views

 

Statistics

Views

Total Views
871
Views on SlideShare
871
Embed Views
0

Actions

Likes
0
Downloads
6
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Gain Insight and Improve Performance with Data Mining Document Transcript

  • 1. Clementine® 11.0 – Specifications Gain Insight and Improve Performance with Data Mining Data mining provides organizations with a clearer view of Clementine is the leading data mining workbench, popular current conditions and deeper insight into future events. With worldwide with data miners and business users alike. By the Clementine data mining workbench from SPSS Inc., using Clementine, you can: your organization can conduct data mining that incorporates n Easily access, prepare, and integrate numeric data many types of data, resulting in outstanding insight into and also text, Web, and survey data your customers and every aspect of your operations. n Rapidly build and validate models, using the most advanced statistical and machine-learning techniques Clementine enables your organization to strengthen available performance in a number of areas. For example, you can n Efficiently deploy insight and predictive models on improve customer acquisition and retention, increase a scheduled basis or in real time to the people that customer lifetime value, detect and minimize risk and make decisions and recommendations, and the fraud, reduce cycle time while maintaining quality in systems that support them product development, and support scientific research. n Quickly achieve better ROI and time-to-solution by leveraging advanced performance and scalability Use the predictive insight gained through Clementine features to guide customer interactions in real time and share n Securely transmit sensitive data for data mining that insight throughout your organization. Solve business applications where security is critical problems faster, thanks to Clementine’s powerful new data preparation and predictive modeling capabilities. And leverage high-performance hardware to get answers more quickly.
  • 2. Extend the benefits of data mining throughout your entire algorithms for clustering, classification, association, enterprise by using Clementine with SPSS Predictive and prediction, as well as new algorithms for automated Enterprise Services. SPSS Predictive Enterprise Services multiple modeling, time series forecasting, and interactive enables you to centralize the storage and management rule building. of data mining models and all associated processes. You can control the versioning of your predictive models, Optimize your current information technologies audit who uses and modifies them, provide full user Clementine is an open, standards-based solution. It authentication, automate the process of updating your integrates with your organization’s existing information models, and schedule model execution. As a result, your systems, both when accessing data and when deploying predictive models become real business assets—and results. You don’t need to move data into and out of a your organization gains the highest possible return on proprietary format. This helps you conserve staff and your data mining investment. network resources, deliver results faster, and reduce infrastructure costs. Clementine offers a number of unique capabilities that make it an ideal choice for today’s data-rich organizations. With Clementine, you can access data in virtually any type of database, spreadsheet, or flat file. In addition, Streamline the data mining process Clementine performs in-database modeling and scoring Clementine’s intuitive graphical interface enables analysts in major databases offered by IBM®, Oracle®, and to visualize every step of the data mining process as Microsoft®, which improves the speed and scalability of part of a “stream.” By interacting with streams, analysts these operations. Clementine can deploy results to any and business users can collaborate in adding business of SPSS’ predictive applications, as well as to solutions knowledge to the data mining process. Because data provided by other companies. This interoperability helps miners can focus on knowledge discovery rather than your IT department meet internal customer needs while on technical tasks like writing code, they can pursue ensuring that your organization gains even greater value “train-of-thought” analysis, explore the data more deeply, from your information technology investments. and uncover additional hidden relationships. Leverage all your data for improved models From this visual interface, you can easily access and Only with Clementine can you directly and easily access integrate data from textual sources, data from Web logs, text, Web, and survey data, and integrate these additional and data from Dimensions survey research products. ™ types of data in your predictive models. SPSS customers No other data mining solution offers this versatility. have found that using additional types of data increases the “lift” or accuracy of predictive models, leading to more Choose from an unparalleled breadth of techniques useful recommendations and improved outcomes. Clementine offers a broad range of data mining techniques that are designed to meet the needs of every data mining With the fully integrated Text Mining for Clementine® application, producing the world’s most popular data module, you can extract concepts and opinions from any mining workbench. You can choose from a number of type of text—such as internal reports, call center notes, customer e-mails, media or journal articles, blogs, and more. And, with Web Mining for Clementine®, you can discover patterns in the behavior of visitors to your Web site. 2
  • 3. Clementine provides data mining scalability by using a three-tiered architecture, as shown in this diagram. The Clementine Client tier (shown at the bottom) passes stream description language (SDL) to Clementine Server. Clementine Server then analyzes particular tasks to determine which it can execute in the database. After the database runs the tasks that it can process, it passes only the relevant aggregated tables to Clementine Server. Direct access to survey data in Dimensions products Runtime™ component, you can deploy data mining streams enables you to include demographic, attitudinal, and to your database to run in the background of other behavioral information in your models—rounding out your operational processes. Scores or recommendations are understanding of the people or organizations you serve. created without interrupting daily business tasks. You can also export information about the steps in the process, Deploy data mining results efficiently such as data access, modeling, and post-processing. Clementine is scalable—capable of cost effectively This option gives you greater flexibility in how you process analyzing as few as several thousand records or as many models and integrate predictive insight with other systems. as hundreds of millions. It uses a three-tiered architecture Or you may choose to export Clementine models to other to manage modeling and scoring with maximum efficiency. applications in industry-standard Predictive Model Markup Language (PMML). Clementine’s deployment options enhance its scalability by enabling your organization to deliver results to suit your Through its integration with all of SPSS’ predictive particular requirements. For example, through Clementine’s applications and its openness to other systems, batch mode feature or the Clementine Solution Publisher Clementine supports the delivery of recommendations to managers, customer-contact staff, and the systems supporting your operations. 3
  • 4. Follow a proven, repeatable process During every phase of the data mining process, Clementine supports the de facto industry standard, the CRoss-Industry Business Data Understanding Understanding Standard Process for Data Mining (CRISP-DM). This means your company can focus on solving business problems through data mining, rather than on reinventing a new Data Preparation process for every project. Individual Clementine projects Deployment can be efficiently organized using the CRISP-DM project Modeling manager. Or, with SPSS Predictive Enterprise Services, Data you can support the CRISP-DM process enterprise wide. What’s new in Clementine 11.0 Evaluation With this release, SPSS continues its commitment to delivering a data mining solution that offers the greatest possible efficiency and flexibility in the development The CRISP-DM process, as shown in this diagram, enables data miners to implement efficient data mining projects that yield measurable business results. and deployment of predictive models. Clementine 11.0 features extensive enhancements in four Enhanced data preparation tools key areas to help your organization improve performance, Extensive enhancements to Clementine 11.0’s data productivity, and return on your data mining investment. preparation capabilities help your organization ensure better results by providing more efficient approaches to Improved reporting features cleaning and transforming data for statistical analysis. Clementine 11.0 includes enhanced reporting features that ■ Correct data quality issues such as outliers and missing enable your organization’s analysts to illustrate results values more easily via a unified view of data graphically and communicate them quickly and clearly to ■ Streamline the transformation process by visualizing executive management: and selecting from multiple transformations ■ Generate report-quality graphics via a new graphics ■ Access SPSS scripting capabilities and syntax for engine greater flexibility in data management ■ Control the appearance of graphs using new graph editing tools ■ Leverage SPSS output from within Clementine, including extensive reports and high-quality graphics 4
  • 5. New data mining algorithms High-performance architecture More than a dozen new algorithms allow your analysts Gain significant improvements in security, scalability, to perform processes such as modeling, regression, integration, and performance as a result of major and forecasting faster and with greater precision, enhancements to the Clementine 11.0 architecture. permitting more sophisticated statistical analysis and ■ Take advantage of secure socket layer (SSL) encryption more accurate results. for applications involving sensitive data ■ Easily automate building, evaluation, and automation ■ Enjoy greater flexibility and interoperability through of multiple models, eliminating repetitive tasks improved integration with other systems and architectures ■ Create rule-based predictive applications interactively and extended support for industry standards such as that include your business knowledge PMML 3.1 ■ Employ many new in-database algorithms for SQL and ■ Use parallel processing to leverage high-performance Oracle to leverage your investments in these database hardware and achieve faster time-to-solution and systems higher ROI ■ Employ more of SPSS’ powerful functionality via tighter integration with SPSS for Windows Features Data understanding Data preparation Clementine’s main features are described ■ Obtain a comprehensive first look at your ■ Access data below in terms of the CRISP-DM process. data using Clementine’s data audit node – Structured (tabular) data ■ View data quickly through graphs, summary ■ Access ODBC-compliant data sources Business understanding statistics, or an assessment of data quality with the included SPSS Data Access Clementine’s visual interface makes it easy ■ Create basic graph types, such as histograms, Pack. Drivers in this middleware pack for your organization to apply business distributions, line plots, and point plots support IBM DB2®, Oracle, Microsoft knowledge to data mining projects. In ■ Edit your graphs to communicate results SQL Server™, Informix®, and Sybase® addition, optional business-specific more clearly databases. Clementine Application Templates (CATs) ■ Employ additional graph types, such as ■ Import delimited and fixed-width text are available to help you get results faster. box plots, heat maps, scatterplot matrices, files, any SPSS® file, and SAS® 6, 7, 8, CATs ship with sample data that can be linkage analysis plots, and more, through and 9 files installed as flat files or as tables in a Advanced Visualization for Clementine ■ Specify worksheets and data ranges relational database schema. ■ Use association detection when analyzing when accessing data in Excel ■ CRM CAT* Web data – Unstructured (textual) data ■ Telco CAT* ■ Interact with data by selecting a region of ■ Automatically extract concepts from ■ Fraud CAT* a graph and see the selected information any type of text by using Text Mining ■ Microarray CAT* in a table; or use the information in a later for Clementine* ■ Web Mining CAT* (requires the purchase phase of your analysis – Web site data of Web Mining for Clementine) ■ Automatically extract Web site events from Web logs using Web Mining for Clementine* Features subject to change based on final product release. Symbol indicates a new feature. * Separately priced modules 5
  • 6. Features (continued) Modeling – Feature Selection node for identifying – Survey data ■ Mine data in the database where it resides fields that are most and least important ■ Directly access data stored in the with in-database modeling. Support: to an analysis Dimensions Data Model or in the data – IBM DB2 Enterprise Edition 8.2: decision – Additional statistics available at the files of Dimensions* products tree, association, regression, and Means and Matrix nodes – Data export demographic clustering techniques ■ Use clustering and segmentation techniques ■ Work with delimited and fixed-width – Oracle 10g: decision tree, O-cluster, – Analyze the impact of variables and build text files, ODBC, Microsoft Excel, KMeans, NMF, Apriori, and MDL, in models around them via generalized SPSS, and SAS 6, 7, 8, and 9 files addition to Naïve Bayes, Adaptive Bayes, linear model (GLM) techniques ■ Export in XLS format through the and Support Vector Machines (SVM) – Kohonen networks, K-means, and Excel Output Node – Microsoft SQL Server Analysis Services: TwoStep ■ Choose from various data-cleaning options decision tree, clustering, association ■ View cluster characteristics with a – Remove or replace invalid data rules, Naïve Bayes, linear regression, graphical viewer – Use predictive modeling to impute neural network, and logistic regression ■ Choose from several association detection missing values ■ Use predictive and classification techniques algorithms – Automatically generate operations for – Neural networks (multi-layer perceptrons – GRI, Apriori, sequence, and CARMA the detection and treatment of outliers using error back-propagation and radial algorithms and extremes basis function) ■ Score data using models generated ■ Manipulate data ■ Browse the importance of the by association detection algorithms – Work with complete record and field predictors ■ Filter, sort, and create subsets of operations, including: – Decision trees and rule induction association models using the ■ Field filtering, naming, derivation, techniques, including CHAID, exhaustive association model viewer binning, re-categorization, value CHAID, QUEST, and C&RT ■ Employ data reduction techniques replacement, and field reordering ■ New decision list algorithm for – Factor analysis and principal ■ Record selection, sampling, merging interactive rule building components analysis (through inner joins, full outer joins, ■ Browse and create splits in decision ■ View model equation and advanced partial outer joins, and anti-joins), and trees interactively statistical output concatenation; sorting, aggregation, – Rule induction techniques in C5.0 ■ Combine models through meta-modeling and balancing – Linear regression, logistic regression, – Multiple models can be combined, or ■ Data restructuring, including and multinomial logistic regression one model can be used to build a transposition ■ View model equations and advanced second model ■ Splitting numerical records into statistical output – Automate the creation and evaluation sub-ranges optimized for prediction ■ Use binomial logistic regression of multiple models using binary ■ Extensive string functions: string models and diagnostics for classifier algorithm creation, substitution, search and optimal modeling ■ Import PMML-generated models created in matching, whitespace removal, ■ Identify variables separating other tools such as AnswerTree® and SPSS and truncation categories using discriminant analysis for Windows® ■ Preparing data for time-series analysis – Anomaly Detection node for directly ■ Use Clementine External Module Interfaces with the Time Plot node detecting unusual cases (CEMI) for custom algorithms – Partition data into training, test, and – Purchase add-on tools from the validation datasets Clementine Partner Plus Program – Transform data automatically for ■ Generate and automatically select time- multiple variables series forecasting models ■ Visualization of standard transformations ■ Access data management and transformations performed in SPSS directly from Clementine Features subject to change based on final product release. Symbol indicates a new feature. * Separately priced modules 6
  • 7. Evaluation ■ Clementine Batch – Reuse the most effective streams and ■ Easily evaluate models using lift, gains, – Automate production tasks while working models to improve processes and profit, and response graphs outside the user interface increase the accuracy of results – Use a one-step process that shortens ■ Automate Clementine processes from – Search on input variables, target project time when evaluating multiple other applications or scheduling variables, model types, notes, keywords, models systems authors, and other types of metadata – Define hit conditions and scoring ■ Generate encoded passwords – Automate Clementine model expressions to interpret model ■ Call Clementine processes via the refreshing from within SPSS Predictive performance command line Enterprise Services ■ Analyze overall model accuracy with ■ Scripting ■ Ensure reliable results by controlling coincidence matrices and other automatic – Use command-line scripts or scripts versions of predictive models evaluation tools associated with Clementine streams to – Automatically assign versions to ■ Control the appearance of graphs using automate repetitive tasks streams and other objects built-in graph editing tool ■ Export generated models as PMML 3.1 – Protect streams from being overwritten ■ Access SPSS statistics and reporting – Perform in-database scoring, which through automatic versioning tools from Clementine, including reports eliminates the need for—and costs and graphics associated with—transferring data Scalability to client machines or performing ■ Employ in-database mining to leverage Deployment calculations there high-performance database implementations Clementine offers a choice of deployment – Deploy Clementine PMML models to ■ Use in-database modeling to build capabilities to meet your organization’s needs. IBM DB2 Intelligent Miner™ Visualization models in the database using leading ■ Clementine Solution Publisher (optional*) and Intelligent Miner Scoring database technologies – Automate the export of all operations, ■ Use the bulk-loading capabilities of your – Additional vendor-supplied algorithms including data access, data manipulation, database to improve performance included for Microsoft SQL Server 2005, text mining, model scoring—including ■ Secure client-server communication IBM DB DWE, and Oracle Data Mining combinations of models—and post- – Transmit sensitive data securely between ■ Support multi-threaded processing processing Clementine Client and Clementine Server during sorting and model building, in – Use a runtime environment for executing through SSL encryption environments with multiple CPUs or image files on target platforms multi-core CPUs ■ Automatically export Clementine streams to SPSS Predictive Enterprise Services** ■ Minimize data transfer: Clementine SPSS predictive analytics applications (optional*) pulls data only as needed from your – Combine exported Clementine streams ■ Centralize data mining projects to leverage data warehouse and passes only with predictive models, business rules, organizational knowledge relevant results to the client and exclusions to optimize customer – Save streams, models, and other objects ■ Leverage high-performance hardware, interactions in a central, searchable repository experience quicker time-to solution, ■ Cleo (optional*) – Save entire Clementine projects, and achieve greater ROI through parallel – Implement a Web-based solution for including all process steps execution of streams and multiple models rapid model deployment – Group streams in folders, and secure – Enable multiple users to simultaneously folders and streams by user or user access and immediately score single groups records, multiple records, or an entire – Provide permission-based access to database, through a customizable protect the privacy of sensitive browser-based interface information 7
  • 8. For more than 25 years, U.S.-based National Instruments has revolutionized the way engineers and scientists in industry, government, and academia approach measurement and automation. The company sells virtual instrumentation software and hardware in a wide range of industries, from automotive to consumer electronics. As National Instruments began to rapidly expand from its mid-sized roots, it recognized a need for more sophisticated data mining software to keep up with accompanying growth in sales leads and customers. Specifically, the company sought to identify the best prospects for its salespeople to target and segment customers for more strategic marketing programs. “ Our customer base was so broad and diverse across geography, industry, and even from customer to customer, that we didn’t know if it could be segmented. By using Clementine, we now have a robust profile of each of our market segments, which allows us to use direct marketing to customize messages delivered to each one.” – Amanda Jensen Analytics Manager Business Intelligence Department National Instruments Corporation The cornerstone of the Predictive Enterprise Thanks to its integration with SPSS predictive applications Clementine makes data predictive and facilitates the and other information systems, Clementine enables you delivery of predictive insight to the people in your to guide daily decisions and recommendations, as well organization who make decisions and the systems as long-range planning, with insight into current and that support daily customer interactions. future conditions. You can accomplish this securely and efficiently, across your entire enterprise, with SPSS If your organization has any data stored in text files or Predictive Enterprise Services. in Web logs—as many do—you can leverage it by using Text Mining for Clementine or Web Mining for Clementine. Clementine's extensive capabilities are supported by the And you can understand the attitudes and beliefs that most advanced statistical and machine learning techniques lie behind behavior—why customers make the choices available. For greater value, it is open to operation and they do—by incorporating survey data from any of SPSS’ integration with your current information systems. Dimensions survey research products. To learn more, please visit www.spss.com. For SPSS office locations and telephone numbers, go to www.spss.com/worldwide. SPSS is a registered trademark and the other SPSS products named are trademarks of SPSS Inc. All other names are trademarks of their respective owners. © 2006 SPSS Inc. All rights reserved. CLM11SPC-1206