Once you have started learning about predictive algorithms, and the basic knowledge discovery in databases process, what is the next level of detail to learn for a consulting project?
* Give examples of the many model training parameters
* Track results in a "model notebook"
* Use a model metric that combines both accuracy and generalization to rank models
* How to strategically search over the model training parameters - use a gradient descent approach
* One way to describe an arbitrarily complex predictive system is by using sensitivity analysis
Tales from an ip worker in consulting and softwareGreg Makowski
Discussion around intellectual property, leverage over consulting projects to build vertical application software. In my use case, data mining, artificial intelligence and intelligence augmentation are part of the value add. Also, discuss software frameworks, open source software and clauses on prior inventions in hiring contracts
Tales from an ip worker in consulting and softwareGreg Makowski
Discussion around intellectual property, leverage over consulting projects to build vertical application software. In my use case, data mining, artificial intelligence and intelligence augmentation are part of the value add. Also, discuss software frameworks, open source software and clauses on prior inventions in hiring contracts
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
First public meetup at Twitter Seattle, for Seattle DAML:
http://www.meetup.com/Seattle-DAML/events/159043422/
We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.
If you are curious what is ML all about, this is a gentle introduction to Machine Learning and Deep Learning. This includes questions such as why ML/Data Analytics/Deep Learning ? Intuitive Understanding o how they work and some models in detail. At last I share some useful resources to get started.
Explainable AI - making ML and DL models more interpretableAditya Bhattacharya
Abstract –
Although industries have started to adopt AI and Machine Learning in almost every sector to solve complex business problems, but are these models always trustworthy? Machine Learning models are not any oracle but rather are scientific methods and mathematical models which best describes the data. But science is all about explaining complex natural phenomena in the simplest way possible! So, can we make ML and DL models more interpretable, so that any business user can understand these models and trust the results of these models?
In order to find out the answer, please join me in this session, in which I will take about concepts of Explainable AI and discuss its necessity and principles which help us demystify black-box AI models. I will be discussing about popular approaches like Feature Importance, Key Influencers, Decomposition trees used in classical Machine Learning interpretable. We will discuss about various techniques used for Deep Learning model interpretations like Saliency Maps, Grad-CAMs, Visual Attention Maps and finally go through more details about frameworks like LIME, SHAP, ELI5, SKATER, TCAV which helps us to make Machine Learning and Deep Learning models more interpretable, trustworthy and useful!
Data Science, Machine Learning and Neural NetworksBICA Labs
Lecture briefly overviewing state of the art of Data Science, Machine Learning and Neural Networks. Covers main Artificial Intelligence technologies, Data Science algorithms, Neural network architectures and cloud computing facilities enabling the whole stack.
Innovations in technology has revolutionized financial services to an extent that large financial institutions like Goldman Sachs are claiming to be technology companies! It is no secret that technological innovations like Data science and AI are changing fundamentally how financial products are created, tested and delivered. While it is exciting to learn about technologies themselves, there is very little guidance available to companies and financial professionals should retool and gear themselves towards the upcoming revolution.
In this master class, we will discuss key innovations in Data Science and AI and connect applications of these novel fields in forecasting and optimization. Through case studies and examples, we will demonstrate why now is the time you should invest to learn about the topics that will reshape the financial services industry of the future!
Topic
- Frontier topics in Optimization
Data Science in the Real World: Making a Difference Srinath Perera
We use the terms “Big Data” and “Data Science” for use of data processing to make sense of the world around us. Spanning many fields, Big Data brings together technologies like Distributed Systems, Machine Learning, Statistics, and Internet of Things together. It is a multi-billion-dollar industry including use cases like targeted advertising, fraud detection, product recommendations, and market surveys. With new technologies like Internet of Things (IoT), these use cases are expanding to scenarios like Smart Cities, Smart health, and Smart Agriculture.
These usecases use basic analytics, advanced statistical methods, and predictive technologies like Machine Learning. However, it is not just about crunching the data. Some usecases like Urban Planning can be slow, and there is enough time to process the data. However, with use cases like traffic, patient monitoring, surveillance the the value of results degrades much faster with time and needs results within milliseconds to seconds. Collecting data from many sources, cleaning them up, processing them using computation clusters, and doing all these fast is a major challenge.
This talk will discuss motivation behind big data and data science and how it can make a difference. Then it will discuss the challenges, systems, and methodologies for implementing and sustaining a data science pipeline.
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
Credit scoring has been used to categorize customers based on various characteristics to evaluate their credit worthiness. Increasingly, machine learning techniques are being deployed for customer segmentation, classification and scoring.
In this talk, we will discuss various machine learning techniques that can be used for credit risk applications. Through a case study built in R, we will illustrate the nuances of working with practical datasets which includes categorical and numerical data, different techniques that can be used to evaluate and explore customer profiles, visualizing high dimensional datasets and machine learning techniques for customer segmentation.
This talk should be of interest to practicing quants and data scientists who are interested in applying machine learning techniques for credit risk and scoring applications.
Valencian Summer School 2015
Day 2
Lecture 11
The Future of Machine Learning
José David Martín-Guerrero (IDAL, UV)
https://bigml.com/events/valencian-summer-school-in-machine-learning-2015
Scaling AI in production using PyTorchgeetachauhan
Slides from my talk at MLOps World' 21
Deploying AI models in production and scaling the ML services is still a big challenge. In this talk we will cover details of how to deploy your AI models, best practices for the deployment scenarios, and techniques for performance optimization and scaling the ML services. Come join us to learn how you can jumpstart the journey of taking your PyTorch models from Research to production.
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
First public meetup at Twitter Seattle, for Seattle DAML:
http://www.meetup.com/Seattle-DAML/events/159043422/
We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.
If you are curious what is ML all about, this is a gentle introduction to Machine Learning and Deep Learning. This includes questions such as why ML/Data Analytics/Deep Learning ? Intuitive Understanding o how they work and some models in detail. At last I share some useful resources to get started.
Explainable AI - making ML and DL models more interpretableAditya Bhattacharya
Abstract –
Although industries have started to adopt AI and Machine Learning in almost every sector to solve complex business problems, but are these models always trustworthy? Machine Learning models are not any oracle but rather are scientific methods and mathematical models which best describes the data. But science is all about explaining complex natural phenomena in the simplest way possible! So, can we make ML and DL models more interpretable, so that any business user can understand these models and trust the results of these models?
In order to find out the answer, please join me in this session, in which I will take about concepts of Explainable AI and discuss its necessity and principles which help us demystify black-box AI models. I will be discussing about popular approaches like Feature Importance, Key Influencers, Decomposition trees used in classical Machine Learning interpretable. We will discuss about various techniques used for Deep Learning model interpretations like Saliency Maps, Grad-CAMs, Visual Attention Maps and finally go through more details about frameworks like LIME, SHAP, ELI5, SKATER, TCAV which helps us to make Machine Learning and Deep Learning models more interpretable, trustworthy and useful!
Data Science, Machine Learning and Neural NetworksBICA Labs
Lecture briefly overviewing state of the art of Data Science, Machine Learning and Neural Networks. Covers main Artificial Intelligence technologies, Data Science algorithms, Neural network architectures and cloud computing facilities enabling the whole stack.
Innovations in technology has revolutionized financial services to an extent that large financial institutions like Goldman Sachs are claiming to be technology companies! It is no secret that technological innovations like Data science and AI are changing fundamentally how financial products are created, tested and delivered. While it is exciting to learn about technologies themselves, there is very little guidance available to companies and financial professionals should retool and gear themselves towards the upcoming revolution.
In this master class, we will discuss key innovations in Data Science and AI and connect applications of these novel fields in forecasting and optimization. Through case studies and examples, we will demonstrate why now is the time you should invest to learn about the topics that will reshape the financial services industry of the future!
Topic
- Frontier topics in Optimization
Data Science in the Real World: Making a Difference Srinath Perera
We use the terms “Big Data” and “Data Science” for use of data processing to make sense of the world around us. Spanning many fields, Big Data brings together technologies like Distributed Systems, Machine Learning, Statistics, and Internet of Things together. It is a multi-billion-dollar industry including use cases like targeted advertising, fraud detection, product recommendations, and market surveys. With new technologies like Internet of Things (IoT), these use cases are expanding to scenarios like Smart Cities, Smart health, and Smart Agriculture.
These usecases use basic analytics, advanced statistical methods, and predictive technologies like Machine Learning. However, it is not just about crunching the data. Some usecases like Urban Planning can be slow, and there is enough time to process the data. However, with use cases like traffic, patient monitoring, surveillance the the value of results degrades much faster with time and needs results within milliseconds to seconds. Collecting data from many sources, cleaning them up, processing them using computation clusters, and doing all these fast is a major challenge.
This talk will discuss motivation behind big data and data science and how it can make a difference. Then it will discuss the challenges, systems, and methodologies for implementing and sustaining a data science pipeline.
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
Credit scoring has been used to categorize customers based on various characteristics to evaluate their credit worthiness. Increasingly, machine learning techniques are being deployed for customer segmentation, classification and scoring.
In this talk, we will discuss various machine learning techniques that can be used for credit risk applications. Through a case study built in R, we will illustrate the nuances of working with practical datasets which includes categorical and numerical data, different techniques that can be used to evaluate and explore customer profiles, visualizing high dimensional datasets and machine learning techniques for customer segmentation.
This talk should be of interest to practicing quants and data scientists who are interested in applying machine learning techniques for credit risk and scoring applications.
Valencian Summer School 2015
Day 2
Lecture 11
The Future of Machine Learning
José David Martín-Guerrero (IDAL, UV)
https://bigml.com/events/valencian-summer-school-in-machine-learning-2015
Scaling AI in production using PyTorchgeetachauhan
Slides from my talk at MLOps World' 21
Deploying AI models in production and scaling the ML services is still a big challenge. In this talk we will cover details of how to deploy your AI models, best practices for the deployment scenarios, and techniques for performance optimization and scaling the ML services. Come join us to learn how you can jumpstart the journey of taking your PyTorch models from Research to production.
Production model lifecycle management 2016 09Greg Makowski
This talk covers going over the various stages of building data mining models, putting them into production and eventually replacing them. A common theme throughout are three attributes of predictive models: accuracy, generalization and description. I assert you can have it all, and having all three is important for managing the lifecycle. A subtle point is that this is a step to developing embedded, automated data mining systems which can figure out themselves when they need to be updated.
SFbayACM ACM Data Science Camp 2015 10 24Greg Makowski
This is the slide deck for the 7th annual ACM Data Science Camp. It is an unconference, with content generated by the audience. For the primary event site, see http://www.sfbayacm.org/event/silicon-valley-data-science-camp-2015
This presentation is a summary of section 2 (of 6) of the book "The 360º Leader" by best-selling author John C Maxwell. Challenges and solutions include:
* Tension (the pressure of being caught in the middle),
* Frustration (following an ineffective leader),
* Multi-Hat (one person – demands and expectations from all quarters),
* Ego (being hidden in the middle),
* Fulfillment (stuck in the middle, when would rather be in front),
* Vision (how to champion it when you did not create it),
* Influence (influencing others whom you do not manage).
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Greg Makowski
This talk covers 4 configurations of deep learning to solve different types of application needs. Also, strategies for speed up and real-time scoring are discussed.
LeanUX (lean user experience) experimentation has mostly focused on "A/B" testing. This presentation reviews how full and half factorial design of experiments might be used in Lean User Experience design.
This presentation covers material from John Maxwell's book, "The 360 Degree Leader." Specifically, the first of six sections is presented, including "The 7 Myths of Leading from the Middle of an Organization" and "5 Levels of Leadership Development."
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsGreg Makowski
http://www.meetup.com/SF-Bay-ACM/events/227480571/
(see also YouTube for a recording of the presentation)
The talk will cover a brief review of neural network basics and the following types of neural network deep learning:
* autocorrelational - unsupervised learning for extracting features. He will describe how additional layers build complexity in the feature extraction.
* convolutional - how to detect shift invariant patterns in various data sources. Horizontal shift invariant detection applies to signals like speech recognition or IoT data. Horizontal and vertical shift invariance applies to images or videos, for faces or self driving cars
* discuss details of applying deep net systems for continuous or real time scoring
* reinforcement learning or Q Learning - such as learning how to play Atari video games
* continuous space word models - such as word2vec, skipgram training, NLP understanding and translation
Application of Design of Experiments (DOE) using Dr.Taguchi -Orthogonal Array...Karthikeyan Kannappan
The Taguchi method involves reducing the variation in a process through robust design of experiments. The experimental design proposed by Taguchi involves using orthogonal arrays to organize the parameters affecting the process and the levels at which they should be varies. Instead of having to test all possible combinations like the factorial design, the Taguchi method tests pairs of combinations. The Taguchi arrays can be derived or looked up. Small arrays can be drawn out manually; large arrays can be derived from deterministic algorithms. Generally, arrays can be found online. The arrays are selected by the number of parameters (variables) and the number of levels (states).
In this paper, the specific steps involved in the application of the Taguchi method will be described with example.
Three case studies deploying cluster analysisGreg Makowski
Three case studies are discussed, that include cluster analysis as a component.
1) Customer description for a credit card attrition model, to describe how to talk to customers.
2) Hotel price optimization. Use clusters to find subsets of similar behavior, and optimize prices within each cluster. Use a neural net as the objective function.
3) Retail supply chain, planning replenishment using 52 week demand curves using thousands of seasonal "profiles" or clusters.
Документационное управление проектированием
Реестр объектов проектирования
Опыт автоматизации компаний, осуществляющих управление проектированием и строительством комплексных объектов или застройку территорий
Powering Realtime Decision Engines in Finance and Healthcare using Open Sour...Greg Makowski
http://www.kdd.org/kdd2015/industry-gov-talks.html
Financial services and healthcare companies could be the biggest beneficiaries of big data. Their realtime decision engines can be vastly improved by leveraging the latest advances in big data analytics. However, these companies are challenged in leveraging Open Software Systems (OSS). This presentation covers how, in collaboration with financial services and healthcare institutions, we built an OSS project to deliver a realtime decisioning engine for their respective applications. I will address two key issues. First, I will describe the strategy behind our hiring process to attract millennial big data developers and the results of this endeavor. Second, I will recount the collaboration effort that we had with our large clients and the various milestones we achieved during that process. I will explain the goals regarding big data analysis that our large clients presented to us and how we accomplished those goals. In particular, I will discuss how we leveraged open source to deliver a realtime decisioning software product called Kamanja to these institutions. An advantage of developing applications in Kamanja is that it is already integrated with Hadoop, Kafka for realtime data streaming, HBase and Cassandra for NoSQL data storage. I will talk about how these companies benefited from Kamanja and some of challenges we had in the design of this software. I will provide quantifiable improvements in key metrics driven by Kamanja and interesting, unsolved problems/challenges that need to be addressed for faster and wider adoption of OSS by these companies.
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsGreg Makowski
This is a first presentation of Kamanja, a new open-source real-time software product, which integrates with other big-data systems. See also links: http://www.meetup.com/SF-Bay-ACM/events/223615901/ and http://Kamanja.org to download, for docs or community support. For the YouTube video, see https://www.youtube.com/watch?v=g9d87rvcSNk (you may want to start at minute 33).
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann
With only three months' instruction, a five-person team uses Azure Machine Learning Studio to predict Moscow real estate prices based on property descriptors, macroeconomic indicators, and geospatial data.
Predicting Moscow Real Estate Prices with Azure Machine LearningKarunakar Kotha
With only three months' instruction, a five-person team uses Azure Machine Learning Studio to predict Moscow real estate prices based on property descriptors, macroeconomic indicators, and geospatial data
Generalized linear models (GLMs) and gradient boosting machines (GBMs) are two of the most widely used supervised learning approaches in all of commercial data science. GLMs have been the go-to predictive and inferential modeling tool for decades, but important mathematical and computational advances have been made in training GLMs in recent years. This talk will contrast H2O’s implementation of penalized GLM techniques with ordinary least squares and give specific hints for building regularized and accurate GLMs for both predictive and inferential purposes. As more organizations begin experimenting with and embracing algorithms from the machine learning tradition, GBMs have come to prominence due to their predictive accuracy, the ability to train on real-world data, and resistance to overfitting training data. This talk will give some background on the GBM approach, some insight into the H2O implementation, and some tips for tuning and interpreting GBMs in H2O.
Patrick's Bio:
Patrick Hall is a senior data scientist and product engineer at H2O.ai. Patrick works with H2O.ai customers to derive substantive business value from machine learning technologies. His product work at H2O.ai focuses on two important aspects of applied machine learning, model interpretability and model deployment. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning.
Prior to joining H2O.ai, Patrick held global customer facing roles and R & D research roles at SAS Institute. He holds multiple patents in automated market segmentation using clustering and deep neural networks. Patrick is the 11th person worldwide to become a Cloudera certified data scientist. He studied computational chemistry at the University of Illinois before graduating from the Institute for Advanced Analytics at North Carolina State University.
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi
Scene recognition is one of the hallmark tasks of computer vision, allowing definition of a context for object recognition. Availability of large data sets like ImageNet and VGG has provided scopes of applying machine learning classifiers to train models. However high data dimensionality is an issue while training classifiers such as Support Vector Machine (SVM) and perceptron. To reduce data dimensionality and take advantage of parallel and distributed processing, we propose a framework with Convolutional Neural Network (CNN) as Feature extractor and SVM and perceptron as the classifier. MPI (Message passing interface) was used for programming clusters of CPUs. SVM showed 1.05x times improvement over perceptron in terms of run time and CNN reduced data dimensionality by 10x times.
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
Understanding Hallucinations in LLMs - 2023 09 29.pptxGreg Makowski
Hallucinations are a current fundamental problem for LLMs.
For one example, June this year in New York, attorneys did "research" on past cases with ChatGPT and turned it in to the Judge as a brief. The opposing council reported to the judge that they could not find the cases. When the judge confronted the GPT using attorneys, they stood behind their brief. The judge find the firm $5000.
Could this happen to you? YES. What can be done to avoid this in the future? I will answer.
In this talk, I will explain some fundamental areas of LLM's to explain how and why hallucinations occur. To understand that, an introduction into how words, concepts and dialogs are represented will help.
Words were first represented as a point in an embedding space with Word2Vec in 2013. This could compress 10,000 words into a vector of 300 elements, with a word being represented as a point in the 300-dimensional embedding space. Not just words can be represented, but also longer text, such as books can be compressed into a type of embedding. In that situation, areas of embedding space relate to different genres, such as: non-fiction, science fiction, children's fiction and so on. A new data point between training data points, when converted to text, would be a hallucination. In the area of "legal cases" in embedding space, if there is not an exact match, the text generation would try to generate what is plausible.
During an LLM conversation, the output of the previous text provides context for the next text in the style of a recurrent neural network. The starting position of a conversation matters. Understanding areas of weight space represent genres like "non-fiction" or other language aspects, and the starting position of a discussion time series matters, helps to understand why prompt engineering helps. The neural network conversation is represented in the activations of the 7B or 500B weights, a much larger space. During a conversation, learning is not occurring, but neural network activations are changing. The neural network is not a database. Even if you reach the exact set of weight activations from a training record, due to lossy compression, the exact text may not be regenerated.
Chat GPT does not use word embeddings. For implementation efficiency reasons, it is practical to break down what is embedded to about 50,000 items in a lookup table. Also, if we want to support proper nouns, like names, and dozens of languages, the number of words would be in the millions. Chat GPT and other LLMs use "tokens" for embedding. Examples of Byte Pair Encoding (BPE) and its process is given. The ChatGPT embedding is a vector of numbers 1,536 long for each token.
A solution for today is Retrieval Augmented Generation (RAG). As a brief introduction, you can ask with an English or natural question. It can be matched against a large library or database of paragraphs from internal documents or websites.
Give a background of Data Science and Artificial Intelligence, to better understand the current state of the art (SOTA) for Large Language Models (LLMs) and Generative AI. Then start a discussion on the direction things are going in the future.
A Successful Hiring Process for Data ScientistsGreg Makowski
Discuss one successful hiring process for data scientists. The current "best" algorithms are constantly changing. Also, it is not uncommon to need to learn about a new vertical market for a DS application. From my DS hiring experience over 2010-2022, I have focused on hiring people that are good at learning and adapting.
Kdd 2019: Standardizing Data Science to Help HiringGreg Makowski
Initiative for Analytics and Data Science Standards (IADSS) workshop presentation at the ACM KDD conference (Association of Computing Machinery Knowledge Discovery in Databases).
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Greg Makowski
Describing a predictive data mining model can provide a competitive advantage for solving business problems with a model. The SSA approach can also provide reasons for the forecast for each record. This can help drive investigations into fields and interactions during a data mining project, as well as identifying "data drift" between the original training data, and the current scoring data. I am working on open source version of SSA, first in R.
How to Create 80% of a Big Data Pilot ProjectGreg Makowski
When evaluating Open Source Software, or other software of a certain size or complexity, organizations frequently want to conduct a Pilot project, or Proof of Concept (POC). This talk describes a process to reduce the length of the Pilot, by leveraging configurations from performance testing to POC starting configurations.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Heuristic design of experiments w meta gradient search
1. 1
Heuristic Design of Experiments
with Meta-Gradient Search
of Model Training Parameters
SF Bay ACM, Data Mining SIG, Feb 28, 2011
http://www.sfbayacm.org/?p=2464
Greg_Makowski@yahoo.com
www.LinkedIn.com/in/GregMakowski
3. Key Questions Discussed
• You (a data miner) have many algorithms or
libraries you can use, with many choices…
– How to stay organized among all the choices?
• Algorithm parameters
• Adjustments in Cost vs. Profit (Type I vs. II error bias)
• Metric selection (Lift if acting on top % vs. RMSE or ROC)
• Ensemble Modeling, boosting, bagging, stacking
• Data versions, preprocessing, trying new fields
– How to plan, and learn as you go?
– How simple should you stay ?
– to keep descriptiveness vs. Occam’s Razor?
3
4. Outline
Model Training Parameters in SAS Enterprise Miner
Tracking Conservative Results in a “Model Notebook”
How to Measure Progress
Meta-Gradient Search of Model Training Parameters
How to Plan and dynamically adapt
How to Describe Any Complex System – Sensitivity
4
5. Enterprise Miner
Sample Data Flow for a Project
: 5
(Boxes are expanded in later slides)
Learning
Tuning
Validation
Stratified
Sampling
6. Type I vs. II Error Weights
Profit-Loss Ratios
6
In the Data Source,
NOT the Model Engines
In other software,
may use a weight field
Need to stay organized
regardless
7. Regression
• It is always good to find the
best linear solution early on
– Like testing a null hypothesis:
(linear vs. non-linear) problem
• Can feed “score” or “residual
error” as a source field into
non-linear models
7
8. Neural Net Architecture
and Parameters
8
c
c c
c
c
c
c
c
c
c c
field 1
field 2
$
c
$
$
$
$
c
c
$
c
c
c
RBF
c
c
c
$ $
c
$
c
$
c
c
c
$ $ $$
A Neural Net
Solution
“Non-Linear”
Several
regions
which are
not adjacent
MLP
9. A Comparison of a Neural Net
and Regression
Direct connect
9
A Logistic regression formula:
Y = f( a0 + a1*X1 + a2*X2 + a3*X3)
a* are coefficients
Backpropagation, cast in a similar form:
H1 = f(w0 + w1*I1 + w2*I2 + w3*I3)
H2 = f(w4 + w5*I1 + w6*I2 + w7*I3)
a0
Y
X1 X2 X3
:
Hn = f(w8 + w9*I1 + w10*I2 + w11*I3)
O1 = f(w12 + w13*H1 + .... + w15*Hn)
On = ....
w* are weights, AKA coefficients
I1..In are input nodes or input variables.
H1..Hn are hidden nodes, which extract features of the data.
O1..On are the outputs, which group disjoint categories.
f() is the SIGMOID function, a non-linear “S” curve
a1 a2 a3
Output
H1 Hidden 2
w1
w2
w3
Input 1 I2 I3
Bias
it is very noisy in the brain – chemical depletion of neurotransmitters
10. Neural Net
• Network Architecture can be linear
(MLP) or circular (many RBF)
• Network Direct Connection allows
inputs to connect to output (to find the
simple, linear solution first)
• Network Hidden Units can go up to
64 (much better than 8)
• Profit/Loss uses settings in Data Source
10
11. Tree
Depth = 2
11
What does a DecisionTree Look Like?
Split 3
Age
Income
$
Split 1
Split 2
$
$$
$
Leaf 3
$
$ $
$
$
$ $ $
$ $
$
$
$
$
$
c
c c
c
c
c
c
c
c
c
c
$ c
Leaf 4
Leaf 1
Leaf 2
Split 2 Split 3
Leaf 1
Split 1
Leaf 2 Leaf 3 Leaf 4
If (Age < Split1) then
:…If (Income > Split2) then Leaf1 with dollar_avg1
:…If (Income < Split2) then Leaf2 with dollar_avg2
If (Age > Split1) then
:…If (Income > Split3) then Leaf3 with dollar_avg3
:…If (Income < Split3) then Leaf4 with dollar_avg4
12. Decision Tree
• Primary Parameters to vary
– Criterion
• Probchisq (Default)
• Entropy
• Gini
– Assessment (Decision vs. Lift)
– Tree size (depth, leaf size, Xvalid)
12
13. Gradient Boosting
(Tree Based)
Based on “Greedy Function
Approximation: A Gradient
Boosting Machine” by Jerome
Friedman
Each new CART tree:
• is on a 60% random sample
• Is a small, general tree
• Forecasts the error from the forecast
from all previous trees summed
• May have 50 to 2,000 trees in a
sequence
• Evaluate how far “back” in sequence
to prune
13
14. DM Algorithms Available in Packages
14
# Modules per Forecasting Family in DM Software
Regres -
s ion
Las s o
Reg
Decis ion
Tree
Neural
Net
Support
Vector
Mach
Other TOT
2 1 0 0 0 1 4
0 0 1 0 0 0 1
3 0 3 3 0 3 12
1 0 1 0 1 1 4
0 0 4 0 0 0 4
3 2 5 3 2 3 18
0 0 0 0 0 5 5
15. Feel Overwhelmed on Lots of Complex
Algorithm Parameters? GOOD!
• A deep understanding of algorithms, math and
assumptions helps significantly Heuristics
– i.e. typically, regression has a problem with correlating
inputs because the solution calculation uses matrix
inversion (if you are worried about weight sign inversion)
– SVM’s or Bayesian Nets do not have this problem,
because they are solved differently.
• Don’t have a problem with correlating inputs, input selection
becomes more random – but you still get a decent solution
• How can you manage the details?
– I am glad you asked…. Moving on to the next section
15
16. Outline
Model Training Parameters in SAS Enterprise Miner
Tracking Conservative Results in a “Model Notebook”
How to Measure Progress
Meta-Gradient Search of Model Training Parameters
How to Plan and dynamically adapt
How to Describe Any Complex System – Sensitivity
16
17. Model Exploration Process
• Scientific Method of
Hypothesis Test
– If you change ONE thing, than any change
in the results is because of that one
change
– Design of Experiments (DOE), test plan
– Best to compare model settings on same
data version
• New data versions add new preprocessed fields,
or new months (records)
– Key design objective: all experiments are
reproducible
• SAME Random split between Learning – Test –
Validation, with a consistent random seed
– LTV split before loading data in a tool, so same
partitioning for all tools/libraries/algorithms
18. Model Notebook
Input Parameters Outcomes
Lift in Top 10%
Train Val
18
Gap =
Abs(
Trn-Val)
Consrv
Result
Param
1
vars
offerd
Param
2
var
selct
Param
3
…
Vars
Seltd
Trn
Time
Data
Ver
Algor
Mod
Num
1 Regrsn 1 27 stepw 9 12 5.77 5.94 0.17 5.60
vars
offerd
Hidn
Nodes
Direct
Conn
Arch
Bad vs. Good
1 Neural 1 27 3 n MLP all 77 6.65 10.89 4.24 2.41
1 Neural 2 27 10 n MLP all 40 6.88 6.73 0.15 6.58
1 Neural 3 27 10 Y MLP all 36 6.40 6.93 0.53 5.87
1 Neural 4 27 10 n RBF all 34 5.67 5.54 0.13 5.41
1 Neural 5 27 10 Y RBF all 35 5.95 7.92 1.97 3.98
19. Model Notebook
Outcome Details
• My Heuristic Design Objectives: (yours may be different)
– Accuracy in deployment
– Reliability and consistent behavior, a general solution
• Use one or more hold-out data sets to check consistency
• Penalize more, as the forecast becomes less consistent
– No penalty for model complexity (if it validates consistently)
• Let me drive a car to work, instead limiting me to a bike
– Message for check writer
– Don’t consider only Occam’s Razor: value consistent good results
– Develop a “smooth, continuous metric” to sort and find
models that perform “best” in future deployment
19
20. Model Notebook
Outcome Details
• Training = results on the training set
• Validation = results on the validation hold out
• Gap = abs( Training – Validation )
A bigger gap (volatility) is a bigger concern for deployment, a symptom
Minimize Senior VP Heart attacks! (one penalty for volatility)
Set expectations & meet expectations
Regularization helps significantly
• Conservative Result
= worst( Training, Validation) + Gap_penalty
Corr / Lift / Profit higher is better: Cons Result = min(Trn, Val) - Gap
MAD / RMSE / Risk lower is better: Cons Result = max(Trn, Val) + Gap
Business Value or Pain ranking = function of( conservative result2 0)
21. Model Notebook
Input Parameters Outcomes
Lift in Top 10%
Train Val
21
Gap =
Abs(
Trn-Val)
Consrv
Result
Param
1
vars
offerd
Param
2
var
selct
Param
3
…
Vars
Seltd
Trn
Time
Data
Ver
Algor
Mod
Num
1 Regrsn 1 27 stepw 9 12 5.77 5.94 0.17 5.60
vars
offerd
Hidn
Nodes
Direct
Conn
Arch
Bad vs. Good
1 Neural 1 27 3 n MLP all 77 6.65 10.89 4.24 2.41
1 Neural 2 27 10 n MLP all 40 6.88 6.73 0.15 6.58
1 Neural 3 27 10 Y MLP all 36 6.40 6.93 0.53 5.87
1 Neural 4 27 10 n RBF all 34 5.67 5.54 0.13 5.41
1 Neural 5 27 10 Y RBF all 35 5.95 7.92 1.97 3.98
22. Model Notebook Process
Tracking Detail Training the Data Miner
Data
Ver
Aut
hor
Input / Test Outcome
Algor
Mod
Num
chng
from
prior
Model Notebook
Project = Transit, Last Update 5/6/2010
Input Parameters Outcomes
Param 1 Param 2 Param 3 Param 4 Param 5 Param 6 Param 7
Status Lift in Top 10% Over File Avg
Var
Sel
Trn
time
(sec)
Lift in Top 5% Over File Avg
Top
5%
Train Val
Gap =
Abs(
Trn-Val)
Consrv
Result
Outcomes
Top
10%
Train Val
Gap =
Abs(
Trn-Val)
Consrv
Result
Outcomes
Lift in Top 20% Over File Avg
Top
20%
Train Val
Gap =
Abs(
Trn-Val)
Consrv
Result
Data
Ver
Aut
hor
Algor
Mod
Num
chng
from
prior
vars
offered
var
selectn
Var
Sel
Trn
Time
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
1 GM B logistic 1 0 27 stepws 10 12.04 8.12 3.92 4.20 7.59 4.85 2.74 2.11
1 GM B logistic 2 1 19 stepws 10 12.04 8.12 3.92 4.20 7.59 4.85 2.74 2.11
1 GM B logistic 3 1 6, no dbc stepws 4 7.51 1.98 5.53 -3.55 4.90 3.96 0.94 3.02 investigate inconsistency
1 GM B logistic 4 1
13, only
dbc
stepws 7 9.58 7.33 2.25 5.08 6.59 5.25 1.34 3.91
Data
Ver
Aut
hor
Algor
Mod
Num
chng
from
prior
vars
offered
regr type
var
selectn
2-factor
interact
polynom
Var
Sel
Trn
Time
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
Regression
1 GM regr 1 0 27 logistic stepws n 9 12 5.77 5.94 0.17 5.60 3.35 4.46 1.11 2.24 2.25 3.02 0.77 1.48
1 GM regr 2 1 27 logistic stepws Yes 9 16 5.76 5.94 0.18 5.58 3.35 4.46 1.11 2.24 2.25 3.02 0.77 1.48
1 GM regr 3 1 27 logistic stepws n 2 10 57 5.86 6.93 1.07 4.79 3.48 5.03 1.55 1.93 2.32 2.61 0.29 2.03
1 GM regr 4 1 27 logistic stepws Yes 2 11 58 5.86 6.93 1.07 4.79 3.48 5.04 1.56 1.92 2.32 2.92 0.60 1.72
4 GM regr 5 4 3 logistic stepwise Yes 2 8 63 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43
4 GM regr 6 5 28 logistic stepwise Yes 2
didn't finish, out of memory
4 GM regr 7 5 3 logistic stepwise n 2 63 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43
4 GM regr 8 5 3 logistic stepwise n 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43
4 GM regr 9 5 3 logistic stepwise Yes 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43
4 GM regr 10 8 28 logistic stepwise n 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.43
4 GM regr 11 5 3 logistic stepwise Yes 3 6 78 15.98 16.06 0.08 15.89 8.61 8.03 0.58 7.45 4.81 4.39 0.41 3.98
4 GM regr 12 5 3 logistic stepwise Yes 4 2 78 15.98 16.06 0.08 15.89 8.61 8.03 0.58 7.45 4.81 4.39 0.41 3.98
add Feb & Mar to
4n GM regr 13 11 3 logistic stepwise Yes 3 6 78 18.39 18.79 0.39 18.00 9.58 9.55 0.03 9.52 4.96 4.92 0.03 4.89
recent*
recent_serrtrn_dbc changed to recent_serrtrn_flag
4n GM regr 14 11 3 6 78 12.49 12.12 0.36 11.76 7.63 7.42 0.20 7.22 4.29 4.47 0.18 4.12
(does DBC on ser patt help? YES)
Yippeee!
1 GM DM Regr 1 0 27 logistic stepws 13 15 12.00 3.17 8.83 -5.66 7.21 4.16 3.05 1.11 4.28 3.07 1.21 1.86
4 GM DM Regr 2 0 28
max v
3000
min rsq
0.005
use
aov16 var
YES
6 72 16.27 15.76 0.52 15.24 8.67 8.03 0.64 7.39 4.58 4.24 0.34 3.90
1 GM PLS 1 0
1 GM PLS 2 1 27 default default default default 4 18 11.26 3.08 8.18 -5.10 7.12 4.85 2.27 2.58 4.28 3.12 1.16 1.96
1 GM PLS 3 1 Test Set Cros Val didn't finish, don't use Xvalidation
4 GM PLS 4 0 28 PLS NIPALS 200 28 122 16.63 15.76 0.87 14.89 8.93 8.03 0.90 7.13 4.76 4.32 0.45 3.87
Data
Ver
Aut
hor
Algor
Mod
Num
chng
from
prior
vars
offered
hidden
Direct
Conn ?
arch
Var
Sel
Trn
Time
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
1 GM AutoNrl 1 0 27 2 n MLP all 35 4.19 3.76 0.43 3.33 2.47 2.57 0.10 2.37 1.77 1.88 0.11 1.66
1 GM AutoNrl 2 1 27 6 n MLP all 189 4.37 2.77 1.60 1.17 2.82 1.78 1.04 0.74 1.98 1.93 0.05 1.88
1 GM AutoNrl 3 1 27 8 n MLP
AutoNeural
trn action
= search
all 532 0.83 0.56 0.27 0.29 0.83 0.56 0.27 0.29 0.83 0.56 0.27 0.29
1 GM AutoNrl 4 1 27 8 n MLP
activ =
logistic
all 356 5.12 2.97 2.15 0.82 3.02 3.37 0.35 2.67 1.90 2.57 0.67 1.23
1 GM AutoNrl 5 1 27 6 n MLP
arch =
block
all 130 0.89 0.97 0.08 0.81
1 GM AutoNrl 6 1 27 6 n MLP
arch =
funnel
all 595 1.36 1.08 0.28 0.80
4 GM AutoNrl 7 1 28 6 n MLP all 1201 16.2722 15.76 0.51 15.24 8.65 7.88 0.77 7.11 4.46 4.24 0.22 4.03
Data
Ver
Aut
hor
Algor
Mod
Num
chng
from
prior
vars
offered
hidden
Direct
Conn ?
arch Decay
Decision
Weight
Var
Sel
Trn
Time
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
1 GM Neural 1 0 27 3 n MLP all 77 6.65 10.89 4.24 2.41 3.90 6.53 2.63 1.27 2.52 3.96 1.44 1.08
1 GM Neural 2 1 27 10 n MLP all 40 6.88 6.73 0.15 6.58 3.97 4.55 0.58 3.39 2.56 3.02 0.46 2.10
1 GM Neural 3 1 27 10 Y MLP all 36 6.40 6.93 0.53 5.87 3.49 5.45 1.96 1.53 2.32 3.22 0.90 1.42
1 GM Neural 4 1 27 10 n RBF (orbfeq) all 34 5.67 5.54 0.13 5.41 3.25 4.85 1.60 1.65 2.20 3.22 1.02 1.18
1 GM Neural 5 1 27 10 Y RBF all 35 5.95 7.92 1.97 3.98 3.48 4.85 1.37 2.11 2.31 3.17 0.86 1.45
js1 JS Neural 6 0 17 5 n MLP Softmax 10,-5,-1,0 all 6.03 6.53 0.50 5.53 3.40 4.55 1.15 2.25 2.67 3.36 0.69 1.98
js1 JS Neural 7 6 15 5 Y MLP Softmax 10,-5,-1,0 all 6.14 5.74 0.40 5.34 3.59 2.97 0.62 2.35 2.77 2.37 0.40 1.97
js1 JS Neural 8 6 15 3 Y MLP Softmax 0.5 10,-5,-1,0 all 6.27 7.13 0.86 5.41 3.54 3.56 0.02 3.52 2.74 2.57 0.17 2.40
js1 JS Neural 9 6 15 3 n MLP Softmax 0.5 10,-5,-1,0 all 6.27 6.33 0.06 6.21 3.57 4.65 1.08 2.49 2.76 2.82 0.06 2.70
2 GM Neural 10 2 35 12 Y MLP 20,0,-1,0 all
3 GM Neural 11 2 45 20 n MLP 20,0,-1,0 all 18 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.91
3 GM Neural 12 11 45 20 n MLP 0.8 20,0,-1,0 all 16 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.91
3 GM Neural 13 11 45 20 n MLP 0.6 20,0,-1,0 all 16 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.91
4 GM Neural 14 11 3 20 n MLP 0.01 20,0,-1,0 all 204 16.39 15.15 1.24 13.91 8.67 8.03 0.64 7.39 4.82 4.39 0.43 3.97
4 GM Neural 15 11 28 20 n MLP 0.01 20,0,-1,0 all 713 16.39 15.76 0.63 15.12 8.54 7.88 0.66 7.22 4.40 4.25 0.15 4.11
4 GM Neural 16 15 31 40 n MLP 0.01 20,0,-1,0 all 782 18.02 18.18 0.16 17.86 9.21 9.55 0.34 8.87 4.60 4.77 0.17 4.44
4 GM Neural 17 15 same, max iter 20 --> 50 all 1754 18.02 18.18 0.16 17.86 9.21 9.55 0.34 8.87 4.66 4.77 0.11 4.55
4 GM Neural 18 16
29 (no
twoYr)
same, max iter 20 --> 50
Neural
40 0 0 all 18.386 18.98 18.18 0.80 17.38 9.25 9.59 0.34 8.90 4.67 4.86 0.20 4.47
4n GM DMNeural 19 0 13 3 n all 19 10.60 2.57 8.03 -5.46 6.93 4.36 2.57 1.79 4.14 2.57 1.57 1.00
More
Heuristic Strategy:
1) Try a few models of many
algorithm types (seed the
search)
2) Opportunistically spend
more effort on what is
working (invest in top stocks)
3) Still try a few trials on
medium success (diversify,
limited by project time-box)
4) Try ensemble methods,
combining model forecasts
& top source vars w/
The Data Mining Battle Field model
23. Model Notebook Process
Tracking Detail Training the Data Miner
M
cnt
Data
Ver
Aut
hor
Algor
Mod
Num
chng
from
prior
vars
offered
criterion
max
depth
leaf size
asses =
5% Lift
Decision
Weight
Var
Sel
Trn
Time
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
47 1 GM Dec Tree 1 0 27 default 6 5 20,0,-5,0 7 13 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.27
48 1 GM Dec Tree 2 1 27 probchisq 6 5 20,0,-5,0 7 16 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.27
49 1 GM Dec Tree 3 1 27 entropy 6 5 20,0,-5,0 6 16 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
50 1 GM Dec Tree 4 1 27 gini 6 5 20,0,-5,0 10 22 13.76 11.28 2.48 8.80 7.70 6.10 1.60 4.50 4.32 3.71 0.61 3.10
51 1 GM Dec Tree 5 3 27 entropy 12 5 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
52 1 GM Dec Tree 6 3 27 entropy 6 10 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
53 1 GM Dec Tree 7 3 27 entropy 6 100 20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
54 1 GM Dec Tree 8 3 27 entropy 6 100 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54
55 1 GM Dec Tree 9 3 27 entropy 6 5 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54
56 1 GM Dec Tree 10 3 27 entropy 6 5
obs
import =
Y
DecisionTree
Data Version 1
20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
57 1 GM Dec Tree 11 3 27 entropy 6 5
asses =
5% Lift
20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
58 1 GM Dec Tree 12 3 27 entropy 10 2 20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
46 2 GM Dec Tree 13 3 33 entropy 6 5 a=5% lift 20,0,-5,0 7 16 15.92 14.96 0.96 14.00 8.29 7.84 0.45 7.39 4.40 4.17 0.23 3.94
47 2 GM Dec Tree 14 13 33 entropy 6 5 a=5% lift 10,-2.5,-1,0 13 15 16.32 15.05 1.27 13.78 9.07 8.00 1.07 6.93 4.63 4.08 0.55 3.53
48 2 GM Dec Tree 15 13 33 entropy 6 5 a=5% lift 1,-1,1,-1 8 15 15.30 14.34 0.96 13.38 7.98 7.53 0.45 7.08 4.25 4.05 0.20 3.85
49 2 GM Dec Tree 16 13 33 entropy 6 5 a=5% lift 10,-1,1,-1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.84
50 2 GM Dec Tree 17 13 33 entropy 6 5 a=5% lift 20,-5,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.95
51 2 GM Dec Tree 18 13 33 entropy 6 5 a=5% lift 20,-1,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.95
52 2 GM Dec Tree 19 13 33 entropy 6 5 a=5% lift xval = no DecisionTree
20,0,-1,0 6 15 15.87 15.52 0.35 15.17 8.26 8.12 0.14 7.98 4.40 4.32 0.08 4.24
53 2 GM Dec Tree 20 13 33 entropy 6 5 a=5% lift 20,-5,-1,1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.84
54 2 GM Dec Tree 21 13 33 entropy 6 5 a=5% lift xval = no 20,0,0,1 9 16 16.17 15.57 0.60 14.97 8.74 8.25 0.49 7.76 4.44 4.21 0.23 3.98
55 2 GM Dec Tree 22 19 33 gini 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 56 2 GM Dec Tree 23 19 33 probchisq 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 57 2 GM Dec Tree 24 19 33 entropy 20 5 a=5% lift Data 20,0,-1,0 19 Version 26 18.94 15.42 3.52 2
11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.12
11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.12
11.90 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.22
58 2 GM Dec Tree 25 19 33 entropy 20 20 a=5% lift 20,0,-1,0 19 26 18.94 13.80 5.14 8.66 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.22
59 2 GM Dec Tree 26 19 33 entropy 20 40 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.05
60 2 GM Dec Tree 27 19 33 entropy 20 60 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.05
61 2 GM Dec Tree 28 19 33 entropy 7 5 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.52
62 2 GM Dec Tree 29 19 33 entropy 7 10 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.52
63 2 GM Dec Tree 30 19 33 entropy 7 20 a=5% lift 20,0,-1,0 7 37 16.04 14.66 1.38 13.28 8.35 7.69 0.66 7.03 4.41 4.07 0.34 3.73
64 2 GM Dec Tree 31 19 35 entropy 7 40 a=5% lift itmledratio
itm_to_led 20,0,-1,0 7 36 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14
65 2 GM Dec Tree 32 19 35 entropy 7 60 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14
66 2 GM Dec Tree 33 19 35 entropy 7 80 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14
67 2 GM Dec Tree 34 19 35 entropy 7 100 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14
68 2 GM Dec Tree 35 19 35 entropy 7 150 a=5% lift 20,0,-1,0 5 37 14.53 13.08 1.45 11.63 7.75 7.19 0.56 6.63 4.36 4.29 0.07 4.22
64 2 GM Dec Tree 36 19 35 entropy 6 5 a=5% lift 20,0,-1,0 7 29 15.91 14.95 0.96 13.99 8.29 7.83 0.46 7.37 4.40 4.17 0.23 3.94
ex=20k
node s mp
= 30k
65 2 GM Dec Tree 37 19
14, raw
only
entropy 6 5 a=5% lift 0 20,0,-1,0 7 16 13.92 11.81 2.11 9.69 7.46 6.54 0.93 5.61 4.24 3.91 0.33 3.57
5.28 2.15 0.41
improvement gain in Conservative Lift from new variables (vs. DecTree-d2-m19)
66 3 GM Dec Tree 38 19 45 entropy 8 5 a=5% lift xval = no 20,0,-5,1 3 39 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58
67 3 GM Dec Tree 39 38 45 gini 8 5 a=5% lift xval = no 20,0,-5,1 3 71 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58
68 3 GM Dec Tree 40 38 45 propchi 8 5 a=5% lift xval DecisionTree
= no 20,0,-5,1 3 42 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58
69 3 GM Dec Tree 41 38 45 entropy 20 5 a=5% lift subtr= 20,0,-5,1 33 91 20.00 14.81 5.19 9.61 10.00 7.54 2.46 5.08 5.00 3.90 1.10 2.80
70 3 GM Dec Tree 42 38 45 entropy 20 100 a=5% lift sub=lrg 20,0,-5,1 25 70 19.09 16.25 2.84 13.42 10.00 8.17 1.83 6.35 5.00 4.19 0.81 3.38
71 3 GM Dec Tree 43 38 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 23 64 17.67 16.67 1.01 72 3 GM Dec Tree 44 38 45 entropy 20 400 a=5% lift sub=lrg 20,0,-5,1 21 59 15.87 17.08 1.21 73 3 GM Dec Tree 45 38 45 entropy 20 800 a=5% lift Data sub=lrg 20,0,-5,1 Version 16 52 14.35 16.16 1.81 3
15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67
14.67 9.02 8.96 0.06 8.89 4.97 4.69 0.28 4.41
12.53 8.46 8.96 0.50 7.96 4.78 4.79 0.01 4.78
74 3 GM Dec Tree 46 38 45 entropy 20 1600 a=5% lift sub=lrg 20,0,-5,1 16 47 14.25 16.02 1.78 12.47 8.26 8.59 0.34 7.92 4.58 4.42 0.17 4.25
75 3 GM Dec Tree 47 38 45 entropy 20 3200 a=5% lift sub=lrg 20,0,-5,1 10 39 12.45 14.35 1.91 10.54 7.49 8.31 0.82 6.67 4.36 4.48 0.12 4.24
76 3 GM Dec Tree 48 43 45 entropy 20 150 a=5% lift sub=lrg 20,0,-5,1 23 68 18.57 16.25 2.32 13.93 10.00 8.14 1.86 6.27 5.00 4.17 0.83 3.34
77 3 GM Dec Tree 49 43 45 entropy 20 300 a=5% lift sub=lrg 20,0,-5,1 23 62 16.45 17.86 1.41 15.03 9.31 8.96 0.35 8.61 5.00 4.60 0.40 4.20
78 3 GM Dec Tree 50 43 45 entropy 20 250 a=5% lift sub=lrg 20,0,-5,1 24 65 16.64 17.71 1.07 15.57 9.56 8.96 0.60 8.36 5.00 4.61 0.39 4.21
79 3 GM Dec Tree 51 43 45 entropy 20 350 a=5% lift sub=lrg 20,0,-5,1 24 67 16.07 17.50 1.43 14.64 9.19 8.96 0.23 8.73 5.00 4.59 0.41 4.18
80 3 GM Dec Tree 52 43 45 entropy 20 225 a=5% lift sub=lrg 20,0,-5,1 23 63 17.85 16.67 1.18 15.49 9.83 8.96 0.87 8.09 5.00 4.53 0.48 4.05
81 3 GM Dec Tree 53 43 45 entropy 20 175 a=5% lift sub=lrg 20,0,-5,1 26 68 18.15 16.25 1.90 14.35 9.97 8.13 1.84 6.28 5.00 4.16 0.84 3.32
82 3 GM Dec Tree 54 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5.0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67
83 3 GM Dec Tree 55 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-1,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67
84 3 GM Dec Tree 56 43 45 entropy 20 200 a=5% lift sub=lrg 20,-5,0,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67
85 4 GM Dec Tree 57 43 146 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 9 149 20.00 14.09 5.91 8.19 10.00 7.20 2.80 4.40 5.00 3.76 1.24 2.51
86 4 GM Dec Tree 58 57 107 (tree settings the same, dropped INT* categorical vars, not DBC)
18 115 20.00 16.09 3.91 12.18 10.00 8.15 1.85 6.29 5.00 4.18 0.82 3.35
87 4 GM Dec Tree 59 57 107 entropy 20 500 a=5% lift sub=DecisionTree
lrg 20,0,-5,1 13 110 19.46 14.79 4.68 10.11 10.00 7.64 2.36 5.29 5.00 3.95 1.05 2.91
88 4 GM Dec Tree 60 57 107 entropy 20 1000 a=5% lift sub=lrg 20,0,-5,1 10 89 18.94 14.47 4.47 10.00 10.00 7.44 2.56 4.88 5.00 3.86 1.14 2.73
89 4 GM Dec Tree 61 57 107 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 81 14.41 13.91 0.50 13.41 9.54 8.02 1.51 6.51 6.61 4.25 2.36 1.90
90 4 GM Dec Tree 62 57 107 entropy 20 3000 a=5% lift sub=lrg 20,0,-5,1 5 71 9.89 7.91 1.98 5.94 8.74 6.39 2.35 4.04 5.00 3.70 1.30 2.40
91 4 GM Dec Tree 63 57 107 entropy 20 1500 a=5% lift Data sub=lrg 20,0,-5,1 Version 9 60 16.17 14.66 1.50 4
13.16 9.89 8.18 1.71 6.47 5.00 3.38 1.62 1.76
92 4 GM Dec Tree 64 57 107 entropy 20 1750 a=5% lift sub=lrg 20,0,-5,1 7 60 15.23 14.32 0.92 13.40 9.68 8.07 1.61 6.46 5.00 4.26 0.75 3.51
93 4 GM Dec Tree 65 57 107 entropy 20 2250 a=5% lift sub=lrg 20,0,-5,1 5 60 15.43 11.00 4.43 6.56 9.55 6.30 3.25 3.05 5.00 3.70 1.30 2.40
94 4 GM Dec Tree 66 61 58 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 105 14.07 13.92 0.15 13.77 8.45 7.88 0.57 7.30 4.74 4.02 0.73 3.29
95 4 GM Dec Tree 67 61 80 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 97 14.25 13.94 0.30 13.64 9.25 7.88 1.37 6.51 5.00 4.25 0.75 3.49
96 4 GM Dec Tree 68 61 103 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 103 14.41 13.72 0.69 13.03 9.54 8.02 1.52 6.50 5.00 4.25 0.75 3.50
interactions are getting selected, improve Trn results but
decrease Val results. Perhaps I should regen the INT*dbc with a
larger number of min records.
More
97 4n GM Dec Tree 69 61 3 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 7 14.61 15.54 0.93 13.68 8.83 8.99 0.16 8.67 4.88 4.73 0.15 4.58
98 4n GM Dec Tree 70 0 20 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 10 11.50 11.12 0.38 10.74 7.08 7.29 0.21 6.87 4.24 3.94 0.30 3.64
use RAW vars ONLY, to test value of my preprocessing
M
cnt
Data
Ver
Aut
hor
Algor
Mod
Num
chng
from
prior
binary
model
cleanup
model
max num
rips
Var
Sel
Trn
Time
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
94 1 GM Rule Ind 1 0 tree neural 16 32 10.77 9.92 0.85 9.07 6.28 5.60 0.68 4.92 3.35 3.09 0.26 2.83
95 1 GM Rule Ind 2 1 regr neural 16 36 5.95 7.52 1.57 4.38 3.55 4.85 1.30 2.25 2.35 3.17 0.82 1.53
96 1 GM Rule Ind 3 1 neural tree 16 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.37
97 1 GM Rule Ind 4 3 neural tree 4 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.37
98 1 GM Rule Ind 5 3 neural tree 32 121 5.95 7.92 1.97 3.98 3.53 5.64 2.11 1.42 2.34 3.32 0.98 1.36
99 1 GM Rule Ind 6 1 tree neural 32 32 7.25 5.26 1.99 3.27 6.45 5.17 1.28 3.89 3.43 3.09 0.34 2.75
“Agile Software Design”
Get something simple,
fully working and tested
early on (Data Version 1)
Data Version 2…4
Working, incremental improvements
Incremental complexity
Different preprocessing
Add more fields, records
Add & test more
complexity
24. Model Notebook Process
Tracking Detail Training the Data Miner
M
cnt
Data
Ver
Aut
hor
Algor
Mod
Num
chng
from
prior
vars
offered
criterion
max
depth
leaf size
asses =
5% Lift
Decision
Weight
Var
Sel
Trn
Time
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
47 1 GM Dec Tree 1 0 27 default 6 5 20,0,-5,0 7 13 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.27
48 1 GM Dec Tree 2 1 27 probchisq 6 5 20,0,-5,0 7 16 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.27
49 1 GM Dec Tree 3 1 27 entropy 6 5 20,0,-5,0 6 16 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
50 1 GM Dec Tree 4 1 27 gini 6 5 20,0,-5,0 10 22 13.76 11.28 2.48 8.80 7.70 6.10 1.60 4.50 4.32 3.71 0.61 3.10
51 1 GM Dec Tree 5 3 27 entropy 12 5 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
52 1 GM Dec Tree 6 3 27 entropy 6 10 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
53 1 GM Dec Tree 7 3 27 entropy 6 100 20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
54 1 GM Dec Tree 8 3 27 entropy 6 100 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54
55 1 GM Dec Tree 9 3 27 entropy 6 5 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54
56 1 GM Dec Tree 10 3 27 entropy 6 5
obs
import =
Y
20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
57 1 GM Dec Tree 11 3 27 entropy 6 5
asses =
5% Lift
20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
58 1 GM Dec Tree 12 3 27 entropy 10 2 20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
46 2 GM Dec Tree 13 3 33 entropy 6 5 a=5% lift 20,0,-5,0 7 16 15.92 14.96 0.96 14.00 8.29 7.84 0.45 7.39 4.40 4.17 0.23 3.94
47 2 GM Dec Tree 14 13 33 entropy 6 5 a=5% lift 10,-2.5,-1,0 13 15 16.32 15.05 1.27 13.78 9.07 8.00 1.07 6.93 4.63 4.08 0.55 3.53
48 2 GM Dec Tree 15 13 33 entropy 6 5 a=5% lift 1,-1,1,-1 8 15 15.30 14.34 0.96 13.38 7.98 7.53 0.45 7.08 4.25 4.05 0.20 3.85
49 2 GM Dec Tree 16 13 33 entropy 6 5 a=5% lift 10,-1,1,-1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.84
50 2 GM Dec Tree 17 13 33 entropy 6 5 a=5% lift 20,-5,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.95
51 2 GM Dec Tree 18 13 33 entropy 6 5 a=5% lift 20,-1,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.95
52 2 GM Dec Tree 19 13 33 entropy 6 5 a=5% lift xval = no 20,0,-1,0 6 15 15.87 15.52 0.35 15.17 8.26 8.12 0.14 7.98 4.40 4.32 0.08 4.24
53 2 GM Dec Tree 20 13 33 entropy 6 5 a=5% lift 20,-5,-1,1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.84
54 2 GM Dec Tree 21 13 33 entropy 6 5 a=5% lift xval = no 20,0,0,1 9 16 16.17 15.57 0.60 14.97 8.74 8.25 0.49 7.76 4.44 4.21 0.23 3.98
55 2 GM Dec Tree 22 19 33 gini 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.12
56 2 GM Dec Tree 23 19 33 probchisq 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.12
57 2 GM Dec Tree 24 19 33 entropy 20 5 a=5% lift 20,0,-1,0 19 26 18.94 15.42 3.52 11.90 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.22
58 2 GM Dec Tree 25 19 33 entropy 20 20 a=5% lift 20,0,-1,0 19 26 18.94 13.80 5.14 8.66 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.22
59 2 GM Dec Tree 26 19 33 entropy 20 40 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.05
60 2 GM Dec Tree 27 19 33 entropy 20 60 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.05
61 2 GM Dec Tree 28 19 33 entropy 7 5 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.52
62 2 GM Dec Tree 29 19 33 entropy 7 10 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.52
63 2 GM Dec Tree 30 19 33 entropy 7 20 a=5% lift 20,0,-1,0 7 37 16.04 14.66 1.38 13.28 8.35 7.69 0.66 7.03 4.41 4.07 0.34 3.73
64 2 GM Dec Tree 31 19 35 entropy 7 40 a=5% lift itmledratio
itm_to_led 20,0,-1,0 7 36 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14
65 2 GM Dec Tree 32 19 35 entropy 7 60 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14
66 2 GM Dec Tree 33 19 35 entropy 7 80 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14
67 2 GM Dec Tree 34 19 35 entropy 7 100 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14
68 2 GM Dec Tree 35 19 35 entropy 7 150 a=5% lift 20,0,-1,0 5 37 14.53 13.08 1.45 11.63 7.75 7.19 0.56 6.63 4.36 4.29 0.07 4.22
64 2 GM Dec Tree 36 19 35 entropy 6 5 a=5% lift 20,0,-1,0 7 29 15.91 14.95 0.96 13.99 8.29 7.83 0.46 7.37 4.40 4.17 0.23 3.94
ex=20k
node s mp
= 30k
65 2 GM Dec Tree 37 19
14, raw
only
entropy 6 5 a=5% lift 0 20,0,-1,0 7 16 13.92 11.81 2.11 9.69 7.46 6.54 0.93 5.61 4.24 3.91 0.33 3.57
5.28 2.15 0.41
improvement gain in Conservative Lift from new variables (vs. DecTree-d2-m19)
66 3 GM Dec Tree 38 19 45 entropy 8 5 a=5% lift xval = no 20,0,-5,1 3 39 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58
67 3 GM Dec Tree 39 38 45 gini 8 5 a=5% lift xval = no 20,0,-5,1 3 71 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58
68 3 GM Dec Tree 40 38 45 propchi 8 5 a=5% lift xval = no 20,0,-5,1 3 42 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.58
69 3 GM Dec Tree 41 38 45 entropy 20 5 a=5% lift subtr= 20,0,-5,1 33 91 20.00 14.81 5.19 9.61 10.00 7.54 2.46 5.08 5.00 3.90 1.10 2.80
70 3 GM Dec Tree 42 38 45 entropy 20 100 a=5% lift sub=lrg 20,0,-5,1 25 70 19.09 16.25 2.84 13.42 10.00 8.17 1.83 6.35 5.00 4.19 0.81 3.38
71 3 GM Dec Tree 43 38 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 23 64 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67
72 3 GM Dec Tree 44 38 45 entropy 20 400 a=5% lift sub=lrg 20,0,-5,1 21 59 15.87 17.08 1.21 14.67 9.02 8.96 0.06 8.89 4.97 4.69 0.28 4.41
73 3 GM Dec Tree 45 38 45 entropy 20 800 a=5% lift sub=lrg 20,0,-5,1 16 52 14.35 16.16 1.81 12.53 8.46 8.96 0.50 7.96 4.78 4.79 0.01 4.78
74 3 GM Dec Tree 46 38 45 entropy 20 1600 a=5% lift sub=lrg 20,0,-5,1 16 47 14.25 16.02 1.78 12.47 8.26 8.59 0.34 7.92 4.58 4.42 0.17 4.25
75 3 GM Dec Tree 47 38 45 entropy 20 3200 a=5% lift sub=lrg 20,0,-5,1 10 39 12.45 14.35 1.91 10.54 7.49 8.31 0.82 6.67 4.36 4.48 0.12 4.24
76 3 GM Dec Tree 48 43 45 entropy 20 150 a=5% lift sub=lrg 20,0,-5,1 23 68 18.57 16.25 2.32 13.93 10.00 8.14 1.86 6.27 5.00 4.17 0.83 3.34
77 3 GM Dec Tree 49 43 45 entropy 20 300 a=5% lift sub=lrg 20,0,-5,1 23 62 16.45 17.86 1.41 15.03 9.31 8.96 0.35 8.61 5.00 4.60 0.40 4.20
78 3 GM Dec Tree 50 43 45 entropy 20 250 a=5% lift sub=lrg 20,0,-5,1 24 65 16.64 17.71 1.07 15.57 9.56 8.96 0.60 8.36 5.00 4.61 0.39 4.21
79 3 GM Dec Tree 51 43 45 entropy 20 350 a=5% lift sub=lrg 20,0,-5,1 24 67 16.07 17.50 1.43 14.64 9.19 8.96 0.23 8.73 5.00 4.59 0.41 4.18
80 3 GM Dec Tree 52 43 45 entropy 20 225 a=5% lift sub=lrg 20,0,-5,1 23 63 17.85 16.67 1.18 15.49 9.83 8.96 0.87 8.09 5.00 4.53 0.48 4.05
81 3 GM Dec Tree 53 43 45 entropy 20 175 a=5% lift sub=lrg 20,0,-5,1 26 68 18.15 16.25 1.90 14.35 9.97 8.13 1.84 6.28 5.00 4.16 0.84 3.32
82 3 GM Dec Tree 54 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5.0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67
83 3 GM Dec Tree 55 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-1,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67
84 3 GM Dec Tree 56 43 45 entropy 20 200 a=5% lift sub=lrg 20,-5,0,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67
85 4 GM Dec Tree 57 43 146 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 9 149 20.00 14.09 5.91 8.19 10.00 7.20 2.80 4.40 5.00 3.76 1.24 2.51
86 4 GM Dec Tree 58 57 107 (tree settings the same, dropped INT* categorical vars, not DBC)
18 115 20.00 16.09 3.91 12.18 10.00 8.15 1.85 6.29 5.00 4.18 0.82 3.35
87 4 GM Dec Tree 59 57 107 entropy 20 500 a=5% lift sub=lrg 20,0,-5,1 13 110 19.46 14.79 4.68 10.11 10.00 7.64 2.36 5.29 5.00 3.95 1.05 2.91
88 4 GM Dec Tree 60 57 107 entropy 20 1000 a=5% lift sub=lrg 20,0,-5,1 10 89 18.94 14.47 4.47 10.00 10.00 7.44 2.56 4.88 5.00 3.86 1.14 2.73
89 4 GM Dec Tree 61 57 107 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 81 14.41 13.91 0.50 13.41 9.54 8.02 1.51 6.51 6.61 4.25 2.36 1.90
90 4 GM Dec Tree 62 57 107 entropy 20 3000 a=5% lift sub=lrg 20,0,-5,1 5 71 9.89 7.91 1.98 5.94 8.74 6.39 2.35 4.04 5.00 3.70 1.30 2.40
91 4 GM Dec Tree 63 57 107 entropy 20 1500 a=5% lift sub=lrg 20,0,-5,1 9 60 16.17 14.66 1.50 13.16 9.89 8.18 1.71 6.47 5.00 3.38 1.62 1.76
92 4 GM Dec Tree 64 57 107 entropy 20 1750 a=5% lift sub=lrg 20,0,-5,1 7 60 15.23 14.32 0.92 13.40 9.68 8.07 1.61 6.46 5.00 4.26 0.75 3.51
93 4 GM Dec Tree 65 57 107 entropy 20 2250 a=5% lift sub=lrg 20,0,-5,1 5 60 15.43 11.00 4.43 6.56 9.55 6.30 3.25 3.05 5.00 3.70 1.30 2.40
94 4 GM Dec Tree 66 61 58 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 105 14.07 13.92 0.15 13.77 8.45 7.88 0.57 7.30 4.74 4.02 0.73 3.29
95 4 GM Dec Tree 67 61 80 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 97 14.25 13.94 0.30 13.64 9.25 7.88 1.37 6.51 5.00 4.25 0.75 3.49
96 4 GM Dec Tree 68 61 103 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 103 14.41 13.72 0.69 13.03 9.54 8.02 1.52 6.50 5.00 4.25 0.75 3.50
interactions are getting selected, improve Trn results but
decrease Val results. Perhaps I should regen the INT*dbc with a
larger number of min records.
More
97 4n GM Dec Tree 69 61 3 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 7 14.61 15.54 0.93 13.68 8.83 8.99 0.16 8.67 4.88 4.73 0.15 4.58
98 4n GM Dec Tree 70 0 20 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 10 11.50 11.12 0.38 10.74 7.08 7.29 0.21 6.87 4.24 3.94 0.30 3.64
use RAW vars ONLY, to test value of my preprocessing
M
cnt
Data
Ver
Aut
hor
Algor
Mod
Num
chng
from
prior
binary
model
cleanup
model
max num
rips
Var
Sel
Trn
Time
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
Train Val Gap
Consrv
Result
94 1 GM Rule Ind 1 0 tree neural 16 32 10.77 9.92 0.85 9.07 6.28 5.60 0.68 4.92 3.35 3.09 0.26 2.83
95 1 GM Rule Ind 2 1 regr neural 16 36 5.95 7.52 1.57 4.38 3.55 4.85 1.30 2.25 2.35 3.17 0.82 1.53
96 1 GM Rule Ind 3 1 neural tree 16 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.37
97 1 GM Rule Ind 4 3 neural tree 4 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.37
98 1 GM Rule Ind 5 3 neural tree 32 121 5.95 7.92 1.97 3.98 3.53 5.64 2.11 1.42 2.34 3.32 0.98 1.36
99 1 GM Rule Ind 6 1 tree neural 32 32 7.25 5.26 1.99 3.27 6.45 5.17 1.28 3.89 3.43 3.09 0.34 2.75
Can treat model notebook table
as meta-data (i.e. 144 records or
models)
Train models on meta-data
Source vars = model parameters
Target 1 = conservative result
or
Target 2 = training time
Perform sensitivity analysis
to answer questions:
Q) Searching which model
training parameters lead to the
best results?
Q) …most training time?
25. Outline
Model Training Parameters in SAS Enterprise Miner
Tracking Conservative Results in a “Model Notebook”
How to Measure Progress
Meta-Gradient Search of Model Training Parameters
How to Plan and dynamically adapt
How to Describe Any Complex System – Sensitivity
25
26. Design Of Experiments (DOE)
Parameter Search
• Ideally, vary one parameter at a time, quantify the results
– Bigger challenge in BIG DATA compute per model
• Exhaustive Grid Search O(3P)
– for Param A = Low, Med, High (test 3 settings)
– for Param B = Low, Med, High
– for Param C = Low, Med, High
– easy to implement, not the most efficient
– Can use Fractional Factorial design (i.e. 10%)
• Scales less effectively for many parameters
• Stochastic Search (Genetic Algorithms) O(1002)
C
– Directed Random Search is more efficient than Grid Search, but…
– Can be overkill in complexity: (100 models / generation) * (100’s gens)
• Taguchi Analysis (works with this DOE approach)
– Efficient multivariate orthogonal search
– test landing pages w/ Offermatica (acquired by Ominture in 2007 for DOE)
– http://en.wikipedia.org/wiki/Taguchi_methods
– Does not use domain knowledge of parameter interactions - OPPORTUNITY
A
B
27. Taguchi
Design
• Not a full grid
search
• Can we
improve with
experience
and a
heuristic
process?
27
http://www.itl.nist.gov/div898/handbook/pri/section5/pri56.htm
http://www.jmp.com/support/downloads/pdf/jmp_design_of_experiments.pdf
28. Model Parameters
Algorithm Searches Meta-Search by a Data Miner
Design of Experiments (DOE)
Over Your Choices
Algorithm Model Parameters Model Training Parameters
Regression weights variable selct (forward, step)
Neural net weights step size; learning rate
Decision Tree (spend < $1000) max depth; (Gini, Entropy)
28
29. Model Parameters vs.
Model Training Parameters
Algorithm Searches Meta-Search by a Data Miner
Design of Experiments (DOE)
Over Your Choices
Algorithm Model Parameters Model Training Parameters
Regression weights variable select (forward, step)
Neural net weights step size; learning rate
Decision Tree (spend < $1000) max depth; (Gini, Entropy)
29
30. Heuristic Planning Your
Design of Experiments (DOE)
• Assumptions about Data Mining Project
– May be on BIG DATA, with practical constraints
– May be training 4 to 400 models (not 4000+ like GA)
– Want diversity, to investigate different algorithms
– Want to generalize process to future deployments
• Heuristic Strategies
– Use knowledge of interacting parameters (parallel tests)
• (Cost+profit weights) and (boosting weights) fight each other
– Delay searching compute intensive parameters
• First stabilize most other “computationally reasonable” params
• Large decision tree depth,
• neural nets w/ lots of connections
– Opportunistically spend time by algorithm success 30
31. Gradient Descent Numerical Methods
Searching to Find Minima
31
High Error
Low Error
Forest
Fields
Beach
Water
Deep Water
Weight Parameter 1
Weight Param 2
Min
Min
hill tops
beach
water
Min
32. Gradient Descent Numerical Methods
Searching to Find Minima
32
“Ski Down” from
the mountains to
Lake Tahoe
Moving = adjust param
X = starting position
M = a local minimum
High Error
Low Error
Forest
Fields
Beach
Water
Deep Water
Weight Parameter 1
Weight Param 2
X
M
M
hill tops
beach
water
33. Conservative Result with Respect to
Model Training Parameters
33
“Ski Down” from
the mountains to
Lake Tahoe
Moving = adjust param
X = starting position
M = a local minimum
High Error
Low Error
Forest
Fields
Beach
Water
Deep Water
Model Parameter 1
Model Param 2
X
M
M
34. Heuristic Planning Your
Design of Experiments (DOE)
• Start with a reasonable default setting of
parameters,
– the “center of the daisy” the gradient check
• Vary one parameter at a time from the center
– “each petal of the daisy” gradient search trial
• Move to the next “reasonable multivariate start”
– The “stem of the daisy” steepest descent 34
37. Heuristic “Meta-Gradient Search” of
Model Training Parameters
37
Parameter 1
Parameter 2
M
vs.
Taguchi DOE
Art vs. Science?
No, a practical
compliment
using existing
num. methods
38. Heuristic “Meta-Gradient Search” of
Model Training Parameters
38
Mod
Num
chng
from
prior
vars
offered
criterion
max
depth
leaf size
1 0 27 default 6 5
2 1 27 probchisq 6 5
3 1 27 entropy 6 5
4 1 27 gini 6 5
5 3 27 entropy 12 5
6 3 27 entropy 6 10
7 3 27 entropy 6 100
8 3 27 entropy 6 100
9 3 27 entropy 6 5
10 3 27 entropy 6 5
11 3 27 entropy 6 5
12 3 27 entropy 10 2
Can you give a more
tangible example?
This sounds a bit
vague.
Change from Prior Model
– tracks change from the
“center of a daisy”
(Model 1 or 3)
39. Heuristic “Meta-Gradient Search” of
Model Training Parameters
• After stabilizing most of the “fast” and “medium”
compute time parameters, search the “long compute
time settings”
• With the final parameter settings, if 2x or 10x more data
is available, perform a “final bake in,” long training run
• Then try Ensemble Methods
– Stacking, boosting, bagging combining many of the best
models,
– Gradient Boosting over residual error
– Select models who’s residual errors correlate the least
– Use a 2nd stage model to combine 1st stage models and top
preprocessed fields (for context switching)
– Last year’s KDD Cup winners
– Netflix winners used Ensemble methods
40. Outline
Model Training Parameters in SAS Enterprise Miner
Tracking Conservative Results in a “Model Notebook”
How to Measure Progress
Meta-Gradient Search of Model Training Parameters
How to Plan and dynamically adapt
How to Describe Any Complex System
Sensitivity Analysis
40
41. Needs to Describe Forecast Alg
• Many Data Mining solutions need description
– To check writer (to SVP, owner, business unit, …) business reality
check before deployment
– “What if” analysis, to fine tune larger system
• Feed Operations Research or Revenue Management systems
– Need a modeling “descriptive simulation” (political donations)
– When evaluating credit, by law required to offer 4 “reason
codes” for each person scored – when they are declined
• Should the Data Miner cut algorithm choices?
– NO! “I understand how a bike works, but I drive a car to work”
– how much detailed understanding is needed?
– Provide enough info to “drive the car” vs. “build the car”
• Check writer does not need to understand B-tree to buy SQL 41
42. Sensitivity Analysis
(OAT) One At a Time*
For source fields with
binned ranges, sensitivity
tells you importance of the
range, i.e. “low”, …. “high”
Can put sensitivity values in
Record Level “Reason
codes” can be extracted
from the most important
bins that apply to the given
42
Target field
Arbitrarily Complex
Data Mining System
(S) Source fields
*Some catch interactions
Pivot Tables
or Cluster
record
Delta in forecast
Present record N, S times, each input 5% bigger (fixed input delta)
Record delta change in output, S times per record
Aggregate: average(abs(delta)), target change per input field delta
43. 43
Descriptions of Predictive Models
Reason Codes – Ranked by Sensitivity Analysis
• Reason codes are specific to the model and record
• Ranked predictive fields Mr. Smith Mr. Jones
max_late_payment_120d 0 1
max_late_payment_90d 1 0
bankrupt_in_last_5_yrs 1 1
max_late_payment_60d 0 0
• Mr. Smith’s reason codes include:
max_late_payment_90d 1
bankrupt_in_last_5_yrs 1
44. Summary
• Conservative Result (How to Measure)
– Continuous metric to select accurate and general models
• Heuristic Meta-Gradient Search (How to Plan)
– An automated or human process to plan a Design of
Experiments (DOE)
– Searches the training parameters that a data miner adjusts
in data mining software (“meta-parameter search”)
– Heuristic DOE improvements
• Most systems can be “reasonably described”
– Focus on repeatable business benefit (accuracy) over
description or blind Occam’s Razor on a tech metric
44
SF Bay ACM, Data Mining SIG, Feb 28, 2011
http://www.sfbayacm.org/?p=2464
Greg_Makowski@yahoo.com
www.LinkedIn.com/in/GregMakowski
Take Away: The process of going
from design objectives to heuristic design