The document discusses the importance of data refinement between data collection and decision making. It emphasizes the need to transform raw data into useful insights through techniques like data summarization, categorization, and predictive modeling in order to provide accurate marketing answers and improve targeting, costs, and results. Specifically, it recommends structuring data into a model-ready environment, creating descriptive variables from transaction histories, matching data to the appropriate analytical goals and levels, and categorizing non-numeric attributes.
Introduction to Business Analytics Part 1 published by BeamSync.
BeamSync is providing business analytics training course in Bangalore. If you are looking for analytics training then visit BeamSync. Regular classes are running during the weekend.
For details visit: http://beamsync.com/business-analytics-training-bangalore/
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...VMware Tanzu
Enterprise companies starting the transformation into a data-driven organization often wonder where to start. Companies have traditionally collected large amounts of data from sources such as operational systems. With the rise of big data, big data technologies and the Internet of Things (IoT), additional sources – such as sensor readings and social media posts – are rapidly becoming available. In order to effectively utilize both traditional sources and new ones, companies first need to join and view the data in a holistic context. After establishing a data lake to bring all data sources together in a single analytics environment, one of the first data science projects worth exploring is segmentation, which automatically identifies patterns.
In this DSC webinar, two Pivotal data scientists will discuss:
· What segmentation is
· Traditional approaches to segmentation
· How big data technologies are enabling advances in this field
They will also share some stories from past data science engagements, outline best practices and discuss the kinds of insights that can be derived from a big data approach to segmentation using both internal and external data sources.
Panelist:
Grace Gee, Data Scientist -- Pivotal
Jarrod Vawdrey, Data Scientist -- Pivotal
Hosted by:
Tim Matteson, Co-Founder -- Data Science Central
To learn more about data at Pivotal, visit http://www.pivotal.io/big-data
To view video, visit https://www.youtube.com/watch?v=svKLdMWusGA
The Business Analytics Value PropositionEric Stephens
Presentation made to the Nashville Technology Council Analytics Peer Network meeting on May 30, 2013. Discussion of the impact of analytics to an organization, along with use cases that can help convey the value of the practice to executives and other managers.
How to build a data analytics strategy in a digital worldCaseWare IDEA
This presentation will take you through TSB Bank’s journey from first establishing the audit function through to developing a data analytics strategy as the organization gets ready to move to a new, state-of-the-art online banking platform.
SLIDESHARE: www.slideshare.net/CaseWare_Analytics
WEBSITE: www.casewareanalytics.com
BLOG: www.casewareanalytics.com/blog
TWITTER: www.twitter.com/CW_Analytic
Slides used for a presentation to introduce the field of business analytics. Covers what BA is, how it is a part of business intelligence, and what areas make up BA.
The Truth About Cross-Channel Attribution... and Why it Does Not Have to be ...Birst
In a world where the customer is perpetually connected and purchase paths are increasingly complex, cross-channel attribution measurement promises to accurately measure intertwined marketing programs, helping marketers connect with their customers in a contextually relevant way.
Yet, companies struggle to identify the right metrics and technologies needed to help measure these complex marketing exposures. As a result, marketing departments are left scrambling to analyze performance data across multiple sources, such as email tactics, display ads, direct mail, and more.
In this webinar, our guest speaker Tina Moffett, an analyst from Forrester Research, will help you interpret the tricky landscape of attribution analysis. Tina will:
· Share the latest trends in marketing measurement and technology.
· Illustrate the challenges and risks inherent in cross-channel attribution measurement – and how to overcome them.
· Outline the core technology capabilities that will help you evaluate marketing analytics and attribution technology.
You’ll also see a demo of Birst and our capabilities around multi-touch attribution.
Introduction to Business Analytics Part 1 published by BeamSync.
BeamSync is providing business analytics training course in Bangalore. If you are looking for analytics training then visit BeamSync. Regular classes are running during the weekend.
For details visit: http://beamsync.com/business-analytics-training-bangalore/
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...VMware Tanzu
Enterprise companies starting the transformation into a data-driven organization often wonder where to start. Companies have traditionally collected large amounts of data from sources such as operational systems. With the rise of big data, big data technologies and the Internet of Things (IoT), additional sources – such as sensor readings and social media posts – are rapidly becoming available. In order to effectively utilize both traditional sources and new ones, companies first need to join and view the data in a holistic context. After establishing a data lake to bring all data sources together in a single analytics environment, one of the first data science projects worth exploring is segmentation, which automatically identifies patterns.
In this DSC webinar, two Pivotal data scientists will discuss:
· What segmentation is
· Traditional approaches to segmentation
· How big data technologies are enabling advances in this field
They will also share some stories from past data science engagements, outline best practices and discuss the kinds of insights that can be derived from a big data approach to segmentation using both internal and external data sources.
Panelist:
Grace Gee, Data Scientist -- Pivotal
Jarrod Vawdrey, Data Scientist -- Pivotal
Hosted by:
Tim Matteson, Co-Founder -- Data Science Central
To learn more about data at Pivotal, visit http://www.pivotal.io/big-data
To view video, visit https://www.youtube.com/watch?v=svKLdMWusGA
The Business Analytics Value PropositionEric Stephens
Presentation made to the Nashville Technology Council Analytics Peer Network meeting on May 30, 2013. Discussion of the impact of analytics to an organization, along with use cases that can help convey the value of the practice to executives and other managers.
How to build a data analytics strategy in a digital worldCaseWare IDEA
This presentation will take you through TSB Bank’s journey from first establishing the audit function through to developing a data analytics strategy as the organization gets ready to move to a new, state-of-the-art online banking platform.
SLIDESHARE: www.slideshare.net/CaseWare_Analytics
WEBSITE: www.casewareanalytics.com
BLOG: www.casewareanalytics.com/blog
TWITTER: www.twitter.com/CW_Analytic
Slides used for a presentation to introduce the field of business analytics. Covers what BA is, how it is a part of business intelligence, and what areas make up BA.
The Truth About Cross-Channel Attribution... and Why it Does Not Have to be ...Birst
In a world where the customer is perpetually connected and purchase paths are increasingly complex, cross-channel attribution measurement promises to accurately measure intertwined marketing programs, helping marketers connect with their customers in a contextually relevant way.
Yet, companies struggle to identify the right metrics and technologies needed to help measure these complex marketing exposures. As a result, marketing departments are left scrambling to analyze performance data across multiple sources, such as email tactics, display ads, direct mail, and more.
In this webinar, our guest speaker Tina Moffett, an analyst from Forrester Research, will help you interpret the tricky landscape of attribution analysis. Tina will:
· Share the latest trends in marketing measurement and technology.
· Illustrate the challenges and risks inherent in cross-channel attribution measurement – and how to overcome them.
· Outline the core technology capabilities that will help you evaluate marketing analytics and attribution technology.
You’ll also see a demo of Birst and our capabilities around multi-touch attribution.
In this lecture we discuss data quality and data quality in Linked Data. This 50 minute lecture was given to masters student at Trinity College Dublin (Ireland), and had the following contents:
1) Defining Quality
2) Defining Data Quality - What, Why, Costs
3) Identifying problems early - using a simple semantic publishing process as an example
4) Assessing Linked (big) Data quality
5) Quality of LOD cloud datasets
References can be found at the end of the slides
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 (CC-BY-SA-40) International License.
Project analytics in Project ManagementKetan Gandhi
Project managers can use this predictive information to make better decisions and keep projects on schedule and on budget. Analytics does more than simply enable project managers to capture data and mark the tasks done when completed.
Data Quality: A Raising Data Warehousing ConcernAmin Chowdhury
Characteristics of Data Warehouse
Benefits of a data warehouse
Designing of Data Warehouse
Extract, Transform, Load (ETL)
Data Quality
Classification Of Data Quality Issues
Causes Of Data Quality
Impact of Data Quality Issues
Cost of Poor Data Quality
Confidence and Satisfaction-based impacts
Impact on Productivity
Risk and Compliance impacts
Why Data Quality Influences?
Causes of Data Quality Problems
How to deal: Missing Data
Data Corruption
Data: Out of Range error
Techniques of Data Quality Control
Data warehousing security
What is SPI IQ? What is it used for, what are the applications, and what are the benefits/ROI? Advanced Analytics and Business Intelligence for the Retail space. Presented by Matthew Robinson (Director of Sales Engineering, SPI) at the 2016 SPI Conference.
Driving Digital Transformation with Machine Learning in Oracle AnalyticsPerficient, Inc.
The adoption of machine learning (ML) is increasing at near-breakneck speed. As organizations seek innovative ideas on how to improve the business, Oracle Analytics Cloud with ML capabilities is leading the charge. With built-in drag-and-drop functions into visualizations and autonomous prediction execution, Oracle Analytics puts the power of machine learning in your hands.
We covered how Oracle Analytics can connect various data sources, allow you to apply ML without being statistically savvy, and easily build your story in presentation format.
Discussion included:
-In-depth look at Oracle Analytics Cloud
-How to connect different data sources like SaaS applications, data lakes, external data sources and more
-Custom-trained ML models demonstration
-Real-world business use case from end to end
Improving the customer experience using big data customer-centric measurement...Business Over Broadway
This presentation provides an overview of some of the content of my new book, TCE: Total Customer Experience. In the presentation, I discuss customer experience management, customer loyalty, the optimal customer survey, the value of analytics and using a Big Data customer-centric approach to improve the value of all your business data
The big-data explosion is driving a shift away from gut-based decision making. Marketing, in particular, is feeling the pressure to embrace new data-driven customer intelligence capabilities.
Marketers working 70-80 hours a week is not a great thing to hear.
But the requirement for them to have such a large amount of work time causes problems in the data selection and filtering.
Hence many marketers flunk the big data test
In this lecture we discuss data quality and data quality in Linked Data. This 50 minute lecture was given to masters student at Trinity College Dublin (Ireland), and had the following contents:
1) Defining Quality
2) Defining Data Quality - What, Why, Costs
3) Identifying problems early - using a simple semantic publishing process as an example
4) Assessing Linked (big) Data quality
5) Quality of LOD cloud datasets
References can be found at the end of the slides
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 (CC-BY-SA-40) International License.
Project analytics in Project ManagementKetan Gandhi
Project managers can use this predictive information to make better decisions and keep projects on schedule and on budget. Analytics does more than simply enable project managers to capture data and mark the tasks done when completed.
Data Quality: A Raising Data Warehousing ConcernAmin Chowdhury
Characteristics of Data Warehouse
Benefits of a data warehouse
Designing of Data Warehouse
Extract, Transform, Load (ETL)
Data Quality
Classification Of Data Quality Issues
Causes Of Data Quality
Impact of Data Quality Issues
Cost of Poor Data Quality
Confidence and Satisfaction-based impacts
Impact on Productivity
Risk and Compliance impacts
Why Data Quality Influences?
Causes of Data Quality Problems
How to deal: Missing Data
Data Corruption
Data: Out of Range error
Techniques of Data Quality Control
Data warehousing security
What is SPI IQ? What is it used for, what are the applications, and what are the benefits/ROI? Advanced Analytics and Business Intelligence for the Retail space. Presented by Matthew Robinson (Director of Sales Engineering, SPI) at the 2016 SPI Conference.
Driving Digital Transformation with Machine Learning in Oracle AnalyticsPerficient, Inc.
The adoption of machine learning (ML) is increasing at near-breakneck speed. As organizations seek innovative ideas on how to improve the business, Oracle Analytics Cloud with ML capabilities is leading the charge. With built-in drag-and-drop functions into visualizations and autonomous prediction execution, Oracle Analytics puts the power of machine learning in your hands.
We covered how Oracle Analytics can connect various data sources, allow you to apply ML without being statistically savvy, and easily build your story in presentation format.
Discussion included:
-In-depth look at Oracle Analytics Cloud
-How to connect different data sources like SaaS applications, data lakes, external data sources and more
-Custom-trained ML models demonstration
-Real-world business use case from end to end
Improving the customer experience using big data customer-centric measurement...Business Over Broadway
This presentation provides an overview of some of the content of my new book, TCE: Total Customer Experience. In the presentation, I discuss customer experience management, customer loyalty, the optimal customer survey, the value of analytics and using a Big Data customer-centric approach to improve the value of all your business data
The big-data explosion is driving a shift away from gut-based decision making. Marketing, in particular, is feeling the pressure to embrace new data-driven customer intelligence capabilities.
Marketers working 70-80 hours a week is not a great thing to hear.
But the requirement for them to have such a large amount of work time causes problems in the data selection and filtering.
Hence many marketers flunk the big data test
Understanding big data and data analytics big dataSeta Wicaksana
Big Data helps companies to generate valuable insights. Companies use Big Data to refine their marketing campaigns and techniques. Companies use it in machine learning projects to train machines, predictive modeling, and other advanced analytics applications.
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackPrecisely
With recent studies indicating that 80% of AI and machine learning projects are failing due to data quality related issues, it’s critical to think holistically about this fact. This is not a simple topic – issues in data quality can occur throughout from starting the project through to model implementation and usage.
View this webinar on-demand, where we start with four foundational data steps to get our AI and ML projects grounded and underway, specifically:
• Framing the business problem
• Identifying the “right” data to collect and work with
• Establishing baselines of data quality through data profiling and business rules
• Assessing fitness for purpose for training and evaluating the subsequent models and algorithms
What is data mining? The process of analyzing data to discover hidden patterns and relationships that can help you manage and improve your business.
Check out: www.eleaderstochange.com
Follow #eleaders2change
This ppt includes an overview of
-OPS Data Mining method,
-mining incomplete servey data,
-automated decision systems,
-real-time data warehousing,
-KPIs,
-Six Sigma Strategy and its possible intergation with Lean approach,
-summary of my OLAP practice with Northwind data set (Access)
Data Profiling: The First Step to Big Data QualityPrecisely
Big data offers the promise of a data-driven business model generating new revenue and competitive advantage fueled by new business insights, AI, and machine learning. Yet without high quality data that provides trust, confidence, and understanding, business leaders continue to rely on gut instinct to drive business decisions.
The critical foundation and first step to deliver high quality data in support of a data-driven view that truly leverages the value of big data is data profiling - a proven capability to analyze the actual data content and help you understand what's really there.
View this webinar on-demand to learn five core concepts to effectively apply data profiling to your big data, assess and communicate the quality issues, and take the first step to big data quality and a data-driven business.
No REST till Production – Building and Deploying 9 Models to Production in 3 ...Databricks
The state of the art in productionizing machine Learning models today primarily addresses building RESTful APIs. In the Digital Ecosystem, RESTful APIs are a necessary, but not sufficient, part of the complete solution for productionizing ML models. And according to recent research by the McKinsey Global Institute, applying AI in marketing and sales has the most potential value.
In the digital ecosystem, productionizing ML models at an accelerated pace becomes easy with:
Feature Store with commonly used features that is available for all data scientists
Feature Stores that distill visitor behavior is ready to use feature vectors in a semi supervised manner
Data pipeline that can support the challenging demands of the digital ecosystem to feed the Feature Store on an ongoing basis
Pipeline templates that support the challenging demands of the digital ecosystem that feed feature store, predict and distribute predictions on an ongoing basis. With these, a major electronics manufacturer was able to build and productionize a new model in 3 weeks.
The use case for the model is retargeting advertising; it analyzes the behavior of website visitors and builds customized audiences of the visitors that are most likely to purchase 9 different products. Using the model, this manufacturer was able to maintain the same level of purchases with half of the retargeting media spend -increasing the efficiency of their marketing spend by 100%.
Lecture on thinking about business concepts from the perspective of an engineer. I focus on clearly scoping business questions into 3 contexts and discussing methods for thinking about business concepts at each level.
July 12, 2017. Odessa, Ukraine.
First Draft Slides and first public presentation of this material. Hopefully more to come.
The Customer Engagement Roadmap - The Key to Increasing the Value of Your Membership Base
Want to increase your subscription site’s profitability? The Customer Engagement Roadmap will show you how!
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Data Refinement: The missing link between data collection and decisions
1. Data Refinement:
The missing link between data collection and decisions
Stephen H. Yu
Data Strategy & Analytics Consultant
2. What we will cover
•
•
•
•
•
•
Database Marketing Landscape
Analytics and Models
“Model-Ready” Environment
Data Summarization & Categorization
Delivery: Scoring & QC
Closing the Loop
1
3. Big Data, Small Data, Neat Data, Messy Data
How is the "Big Data” working out for you?
2.5 quintillion bytes collected per “day”
1 quintillion (exabytes) = 1 billion gigabytes
• Did all this data improve your decision making process?
• Do you have the results to show for?
• Information Overload? You bet!
Harness insights, drop the noise
2
4. Database Marketing Landscape
• No guessing game – You MUST know your target
• Vast amount of online & offline data collected
But are they being used properly?
• Analytics play a huge roles in prospecting & CRM
• Short paced marketing cycle getting shorter
• Huge difference between advanced marketers
and those who are falling behind
Winners are the ones who know how to wield the power
of all available data faster.
3
5. Insights, not Raw Data
• Database marketers must excel in:
Collection
Refinement
Delivery
Size & speed
matters
Get to answers, not
just ingredients
For consumption
by end-users
• Must provide “marketing answers” via advanced analytics,
not just bits and pieces of data
• Insight does not come from data, it is derived from data
4
6. Refined Answers
Raw Data
Marketing Answers
•
•
•
•
•
•
•
•
•
• Likely to buy a luxury car
• Likely to take a foreign vacation
• Likely to response to free
shipping offer
• Likely to be a high value customer
• Likely to be qualified for credit
• Likely to upgrade
• Likely to leave
• Likely to come back
Demographic / Firmographic
RFM
Products & Services Used
Promotion / Response History
Lifestyle / Survey Responses
Delinquent history
Call / Communication Log
Movement Data
Sentiments
5
7. Different Types of Analytics
“Analytics” means different things…
• BI (Business Intelligence) Reporting: Display of success metrics, dashboard
reporting
• Descriptive Analytics: Profiling, segmentation, clustering
• Predictive Modeling: Response models, cloning Models, lifetime
value, revenue models, etc.
• Optimization: Channel optimization, marketing spending
analysis, econometrics models
Predictive Modeling for 1-to-1 Marketing
6
8. Why model?
• Increase Targeting Accuracy
• Reduce costs by contacting less/smart
• Stay relevant
• Consistent results
• Reveal hidden patterns in data
• Repeatable – key for automation
• Expandable
• “Supposedly” save time and effort
Models summarize complex data into simple-to-use “scores”
7
9. Why NOT model?
• Universe is too small
• Predictable data not available
• 1-to-1 marketing channels not in plan
• Tight budget
• Lack of resources
8
11. Any pain implementing models?
• Not easy to find “Best” customers
• Modelers are fixing data all the time
• Rely on a few popular variables
• Always need more variables
• Takes too long to build models and score
• Inconsistencies shown when scored
• Disappointing results
10
12. What does your database support?
If you have a database…
• Order Fulfillment
• Contact Management
• Standard Reports
• Ad hoc Reports and Queries
• Name Selections
• Response Analysis
• Trend Analysis
But does it support predictive
modeling and scoring?
11
13. For modeling, clean the data first
“Garbage-in, garbage-out”
• Most data sets are messy & “unstructured”
• Over 70-80% of model development time
goes to data prep work
• Most databases are NOT model-ready
• Modeling & Scoring
o Extension of database work
o Consistency is “the” key
12
14. Predictive Modeling is all about “Ranking”
2
1
3
Ultimately, Models
must properly “Rank”
Determine the level of
data accordingly
• Households
• Individuals
• Products
• Relational or unstructured
databases won’t cut it
• Must create “Descriptors” that
fit the level that needs to be
ranked
13
15. Unstructured to Structured
• Most modern databases optimized for massive storage and
rapid retrieval, not necessarily for predictive analytics
o Relational databases
o NoSQL databases
• Need “Analytical Sandbox” (or Database/Data-mart)
o Structured & de-normalized
o Variables as descriptors of model targets
o Common analytical language (SAS, R, SPSS)
o Must support “in-database” scoring
14
16. Analytical Sandbox – “Model-Ready” Environment
From “all” types of data collection to decisions. Then repeat.
15
17. Why Front-end DP Important?
Inexperienced analysts spend majority of time doing DP work
Modeling work at the last minute!
Creative variables enhance models
Inconsistent data creates a chain reaction to melt-downs
Data append/match becomes ineffective
16
19. 3 Major Types of Data for Marketers
Descriptive Data
• Demographic Data
• Firmographic Data
• Geo-demographic Data
Transaction Data /
Behavioral Data
• Transaction data
Attitudinal Data
• Surveys
• Sentiments
• Compiled / Co-op data
• Lifestyle data
• Online behavioral data
3-Dimensions in Predictive Analytics
18
20. Data Inventory
• “Modeling is making the best of
what we know”
• Beyond obvious RFM data
• Get Deeper
o Product/Service Level Data
o Historical Data
o Channel Data – Inbound & Outbound
o Online activities, sentiments,
unstructured data
• External Data
19
21. Create Data Menu
• Base it on Companywide Need-Analysis
• Ask the Analysts first:
What type of models are in the plan?
o Affinity/Look-alike Models
o Promotion/Response Models
o Time-series Models
o Attrition Models
• Consider non-analytical departments
• Maintain the ones that fit the objective
Don’t be afraid to throw out “noises”
20
22. Data Menu (continued)
Check the ingredients
Cost - Can you afford to maintain it?
• What do you have today?
• What can be bought?
• What can be created?
• Storage/Platform – Consider the
scoring part, too
• Programming/Processing Time
• Software
• Update
• External Data
21
23. Check Your Data Inventory
You may have more than you thought, so let’s start
with what you have:
•
•
Name & Address: Key to Geo/Demographic Data
Order Transaction Data: “RFM”, Payment Methods
•
Item/SKU Level Data: Products, Price, Units
•
Promotion/Response History: Source, Channel, Offer
•
Life-to-Date/Past “x” Months Summary Data
•
Customer Level Status Flags: Active, Dormant, Delinquent
•
Surveys/Product Registration Forms : Attitudinal/Lifestyle
•
Customer Communication History Data: Call-center, Web
•
Social Media, Click-through, Page views : Sentiment/Intentions
Need Conversion, Categorization, & Summarization
22
24. Maximize the Power of Transaction Data
Most databases describe shopping baskets
Start describing your targets
• RFM Data must be Summarized (or Denormalized)
• Turn RFM data into individual / household level
“Descriptors”
• Combine with essential categorical variables
(e.g., product, offer, channel, etc.)
23
25. Data Summarization – Matching the level of Data
Order Table
Cust ID
Order #
Order Summary Table
Order Date
$ Amount
000123
100011
2009-05-06
$199.99
000123
100128
2010-08-30
$50.49
000123
103082
2011-12-21
100036
2010-06-06
$43.99
003859
101658
2011-01-20
102189
2011-04-15
$119.45
003859
106458
2012-02-18
104535
2012-07-30
$354.72
016899
107296
2011-07-14
102982
2010-09-07
$128.60
019872
103826
2011-04-30
$499.99
019872
109056
2012-03-12
Last Order
Date
$199.99
019872
First Order
Date
$43.99
004593
$ Total
$43.99
003859
#
Orders
$128.60
003859
Cust ID
$59.99
000123
3
$379.08
2009-05-06
2011-12-21
003859
4
$251.42
2010-06-06
2012-02-18
004593
1
$354.72
2012-07-30
2012-07-30
016899
1
$199.99
2011-07-14
2011-07-14
019872
3
$688.58
2010-09-07
2012-03-12
24
26. Sample Variables after Summarization
Before
Recency
Frequency
Monetary
After Summarization
• Weeks since last online purchase
• Years since member sign up
• Days since last delinquent date
• Months since last response date
• Orders by offer type
• Orders by product/service type
• Payments by pay method
• Average days between transactions
• Total $ past 24 months
• Life-to-date spending
• Average dollars by channel
• Average dollars by product type
25
27. RFM Data Summary – Timeline
Life-to-date Summary provides
the historical view
May create bias towards
tenured customers
Put time limit on variables (e.g.
12-month, 24-month, etc.)
May require higher number of
variables and complicate the
process
For Lifetime Value & Time
Series Models
Must create historical arrays
(daily, weekly, monthly counts
of events)
26
28. Who does the summary work?
Answer: Not the statistician!
Key Takeaway
The data variables must be consistent
everywhere in the “Analytical Sandbox”
• Main analytical database
• Model development sample
• Pool of records to be scored
Pre-built summary variables in the
database
27
29. Data Categorization
Free-form data comes to life through categorization
Don’t Give Up!
• Hidden data in:
o Product, service, offer, channel, source, status, titles, surveys, etc.
• Have categorization guideline?
• Who will do it?
o Consider text mining techniques
• What to throw out?
o Keep data that matters in predictive modeling
28
30. Categorical Data
Any Non-numeric Data
• Product
• Service
• Offer
• Channel
• Source
• Market
• Region
• Business Title
• Member Status
• Payment Status
• etc…
Offer Code Example:
• Flat Dollar Discount
• % Discount
• Buy 1, Get 1 Free
• Free Shipping
• No Payment Until…
• Free Gift
• etc…
Categorize as much as possible at the data collection stage
29
31. Categorization Guidelines
Be consistent throughout
•
•
•
•
•
•
Survey Form
Data Entry
Inventory Database
Data Collection & Compilation
Summarization
Modeling, and Scoring
Create “Code” structure
• NEVER allow free-form answers
30
32. Categorization Guidelines (continued)
Create Rules and DON’T Deviate
from them
Training & Automation
More specific the better
Combine them later
But, don’t allow too many
variations (over 20) in one code
Break into multiple
codes if necessary
Don’t forget the end goals and
don’t over-do it
Must be “relevant”
31
33. Data Hygiene & Data Append
Do you know how many customers you
have in your database?
• Data conversion
o
o
o
o
Create consistency
Standardization
Edit
Purge
• Cover all bases – PII & RFM Data
• Create rules and be consistent
32
34. PII – Gateway to External Data
What is hidden behind simple name &
address?
• Standardize Name & Address first
o Maintain PII (Personally Identifiable Information)
o Hygiene via periodic NCOA and standardization
• First & Last Name – Ethnic, Gender
• Name, Address, Email – Demographic Data
• Address – Geo-demographic, Census Data
• Zip – County, Market Region, DMA
33
35. External Data
• Always consider buying data
before collecting and building
• Compiled Demographic /
Firmographic Data
• Behavioral / Transaction / Co-op
Data
• Lifestyle / Attitudinal / Survey
• Census / Geo-Demographic Data
34
36. External Data Check List
• Test multiple data sources
1.
2.
3.
4.
5.
6.
Depth of information / Uniqueness
Coverage / Match Rate
Consistency / Update Cycle
Price
User-friendliness
Delivery Options – Real-time?
• Learn about the data sources
o What’s real and what’s imputed?
o Don’t stop at Demographic: always consider
“Behavioral” data
35
37. Missing Values
“Missing Data can be meaningful.”
• For Numeric Data (e.g., $, Counters, Dates, etc.)
o Incalculable vs. Data-append Non-matches
o Missing is missing: DO NOT fill in with 0’s
$
• For Categorical Data (e.g., Codes, Text, etc.)
o Leave room for “N/A” (e.g., blank, “N/A”, “0”, “.”,
etc.)
o Code “Non-matches” to external files differently
And yet, unstructured database only store
what’s available.
36
38. More on missing data
• Agree on Imputation Rules
o Do it upfront
o Must be part of scoring codes
Data
• Educate non-analysts
o Hard to undo when combined with
other values
o Train vendors
• Always check % missing
o Development Sample vs. Life
Databases
37
39. Scoring – Sample vs. Database
• Development Sample vs. Live Database
o
o
o
o
Database Structure
Variable List/Name
Variable Value
Imputation Assumption
Lead to disasters if “anything” is different
• Do NOT play with model groups that are set in the
development sample
38
40. Scoring QC
Most troubles happen after the models are built…
Check:
•
•
•
•
•
•
Model Group Distributions
Variable distributions (values and indices)
Missing Values
Match rate for appended data
Scoring codes, including score breaks
Compare to previous runs – Check Deterioration
Set parameters for acceptable differences and Enforce
39
41. Score installment in the database
• Plan ahead to store model
scores in the database and datamarts
Sync with the main database
• Store raw scores, not just model
groups
• Match the levels of scores
o
o
o
o
Household
Individual
Email
Product
40
42. Back-end Analysis
Close the Loop Properly!
• 1-to-1 MKT 101:
“Learn from past campaigns”
• Must Plan ahead
o No excuse for not doing it
o Schedule ahead & budget properly
o Keep historical promo/response data (even
without marketing databases)
• Set Metrics Upfront
o List/Data Source
o By Offer, Creative, Season, Product, etc.
41
43. Response Reports & BI Analytics
• Start with “Canned” Reports and BI Dashboards
from vendors
• Don’t insist real-time for every report
• Get ready to create “Custom” reports and
dashboard views
o Prioritize what you want
• Format and Delivery
• Timing and Interval
• Timeline to be covered (YTD, 12-mo, etc.)
42
44. Key ROI Metrics
Set ROI Metrics, such as:
• Open, Click-through, Conversion Rates
o “Denominator” in each?
• Revenue
o Per 1,000 mailed/calls
o Per Order
o Per Display, Email, Click-through, Conversion
• By
o Source, Campaign, Time Period, Model
Group, Offer, Creative, Targeting
Criteria, Channel (in & outbound), Ad
server, Publisher, Key word, Script, Daypart, etc.
• Key variables must reside in the database
o Keep them in “ready-to-use” format
43
45. Where to Begin with Analytical Sandbox
Spec it out:
• Project Goal
• Data Source List (as detailed as possible)
• Final Variable List
• Project Flow:
o Collection
o Scoring
o Conversion & Categorization
o Storage
o Summarization
o Backend Analysis
o Matching / Data Append
o Development Sample
44
46. Who will build the sandbox?
• In-house vs. Out-sourcing;
Must consider
o Platform
o Software
o Programming
o Staffing
• Cost it out
o Don’t forget the update cost
• Back to the analysts for variable list
review
• Don’t be shy and ask for help from
consultants
45
47. 10 essential items to consider when outsourcing analytics
It is not just about modeling, but all surrounding services as well
10 essential items to consider when outsourcing analytical projects
1. Consulting capability: Translate marketing goals into mathematics
2. Data processing: Conversion, edit, summarization, data-append, etc.
3. Pricing structure: Model development is only one part; hidden fees?
4. Track record in the industry: Not in rocket science, but in marketing
5. Types of models supported: Watch out for one-trick ponies
6. Speed of execution: Turnaround time measured in days, not weeks
7. Documentation: Full disclosure of algorithms, charts and reports
8. Scoring validation: Job not done until fully scored and validated
9. Back-end analysis: For true “Closed-loop” marketing
10. Ongoing Support: Periodic review and update
46
48. Scope it out
• Know what you need, but don’t
over do it
“Modeling is making the best of
what’s available”
• Take a phased approach
o If budget is tight, start with low
hanging fruits
o Proof of concepts without full
database commitment in the
beginning
o Maintain consistency
o Keep Historical Data
47
49. Key Takeaways
•
•
•
•
•
•
•
•
Invest in analytics: It is about the insights, not just the data
Set business & analytical goals first
Advanced analytics requires its own sandbox
Ensure consistent data every step of the way: From
sampling to scoring
Check every data source, but don’t wait for a perfect data
set
Match the levels of data (Data Summary)
Don’t over-do it – employ phased approach
Ask for help
50. Stephen H. Yu · Data Strategy & Analytics Consultant· stephenyu1210@gmail.com· 201.218.2068