People’s queries to Google are powerful behavioral traces of people’s intentions and needs of information. Google classifies the trends in search volume over time of the queries submitted to its system into 4 district categories: Declining, Sustained Growth, Fast Rising, and Emerging. This report describes 3 modeling approaches explored to develop a classifier that replicates Google’s categorization of times series: 1) conventional tree-based models such as random forest and gradient boosting, 2) time-specific models such as time series forest and k-nearest neighbors with dynamic time warping (DTW) and 3) deep learning models such as recurrent neural networks and 1-d convolutional neural networks. We found conventional tree-based models, especially gradient boosting (Xgboost) to be best suited for this task achieving 92% levels of F1-accuracy with high precision and decent recall.
The report describes the results of a Discrete Choice Experiment (a type of Conjoint-Analysis) to explore the potential configuration of a tablet computer from a new entrant to the category.
In this paper, I develop a custom binary classifier of search queries for the makeup category using different Machine Learning techniques and models. An extensive exploration of shallow and Deep Learning models was performed using a cross-validation framework to identify the top three models, optimize them tuning their hyperparameters, and finally creating an ensemble of models with a custom decision threshold that outperforms all other models. The final classifier achieves an accuracy of 98.83% on a test set, making it ready for production.
A large appliance manufacturer was interested in using propensity models to better target consumers with direct mail campaigns. A data set containing transactional data from past purchases and enriched with all kinds of data about the consumer, the household or the zip code, from third party providers was used to develop a model to predict non-responders and avoid targeting them. Simulations varying the estimated revenue per customer and the cutoff point used to filter out potential consumers allowed me to identify different optimal point in the Reach-vs-Response-Rate tradeoff.
Modeling Sexual Selection with Agent-Based ModelsEsteban Ribero
The paper discusses a well-known principle in evolutionary biology called the handicap principle. Two agent-based-models were developed to illustrate the principle in an attempt to better understand its implications for the study of human behavior.
A focused practice aimed at using simulations from simple System Dynamics models to help us better understand the intended and unintended consequences of our actions.
Brand Communications Modeling: Developing and Using Econometric Models in Adv...Esteban Ribero
This report presents a description and a complete example of the modeling process required to build a comprehensive market response model that would account for the impact of previous marketing actions on sales.
The report describes the results of a Discrete Choice Experiment (a type of Conjoint-Analysis) to explore the potential configuration of a tablet computer from a new entrant to the category.
In this paper, I develop a custom binary classifier of search queries for the makeup category using different Machine Learning techniques and models. An extensive exploration of shallow and Deep Learning models was performed using a cross-validation framework to identify the top three models, optimize them tuning their hyperparameters, and finally creating an ensemble of models with a custom decision threshold that outperforms all other models. The final classifier achieves an accuracy of 98.83% on a test set, making it ready for production.
A large appliance manufacturer was interested in using propensity models to better target consumers with direct mail campaigns. A data set containing transactional data from past purchases and enriched with all kinds of data about the consumer, the household or the zip code, from third party providers was used to develop a model to predict non-responders and avoid targeting them. Simulations varying the estimated revenue per customer and the cutoff point used to filter out potential consumers allowed me to identify different optimal point in the Reach-vs-Response-Rate tradeoff.
Modeling Sexual Selection with Agent-Based ModelsEsteban Ribero
The paper discusses a well-known principle in evolutionary biology called the handicap principle. Two agent-based-models were developed to illustrate the principle in an attempt to better understand its implications for the study of human behavior.
A focused practice aimed at using simulations from simple System Dynamics models to help us better understand the intended and unintended consequences of our actions.
Brand Communications Modeling: Developing and Using Econometric Models in Adv...Esteban Ribero
This report presents a description and a complete example of the modeling process required to build a comprehensive market response model that would account for the impact of previous marketing actions on sales.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
2024 State of Marketing Report – by HubspotMarius Sescu
https://www.hubspot.com/state-of-marketing
· Scaling relationships and proving ROI
· Social media is the place for search, sales, and service
· Authentic influencer partnerships fuel brand growth
· The strongest connections happen via call, click, chat, and camera.
· Time saved with AI leads to more creative work
· Seeking: A single source of truth
· TLDR; Get on social, try AI, and align your systems.
· More human marketing, powered by robots
ChatGPT is a revolutionary addition to the world since its introduction in 2022. A big shift in the sector of information gathering and processing happened because of this chatbot. What is the story of ChatGPT? How is the bot responding to prompts and generating contents? Swipe through these slides prepared by Expeed Software, a web development company regarding the development and technical intricacies of ChatGPT!
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
2024 State of Marketing Report – by HubspotMarius Sescu
https://www.hubspot.com/state-of-marketing
· Scaling relationships and proving ROI
· Social media is the place for search, sales, and service
· Authentic influencer partnerships fuel brand growth
· The strongest connections happen via call, click, chat, and camera.
· Time saved with AI leads to more creative work
· Seeking: A single source of truth
· TLDR; Get on social, try AI, and align your systems.
· More human marketing, powered by robots
ChatGPT is a revolutionary addition to the world since its introduction in 2022. A big shift in the sector of information gathering and processing happened because of this chatbot. What is the story of ChatGPT? How is the bot responding to prompts and generating contents? Swipe through these slides prepared by Expeed Software, a web development company regarding the development and technical intricacies of ChatGPT!
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
The realm of product design is a constantly changing environment where technology and style intersect. Every year introduces fresh challenges and exciting trends that mold the future of this captivating art form. In this piece, we delve into the significant trends set to influence the look and functionality of product design in the year 2024.
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
Mental health has been in the news quite a bit lately. Dozens of U.S. states are currently suing Meta for contributing to the youth mental health crisis by inserting addictive features into their products, while the U.S. Surgeon General is touring the nation to bring awareness to the growing epidemic of loneliness and isolation. The country has endured periods of low national morale, such as in the 1970s when high inflation and the energy crisis worsened public sentiment following the Vietnam War. The current mood, however, feels different. Gallup recently reported that national mental health is at an all-time low, with few bright spots to lift spirits.
To better understand how Americans are feeling and their attitudes towards mental health in general, ThinkNow conducted a nationally representative quantitative survey of 1,500 respondents and found some interesting differences among ethnic, age and gender groups.
Technology
For example, 52% agree that technology and social media have a negative impact on mental health, but when broken out by race, 61% of Whites felt technology had a negative effect, and only 48% of Hispanics thought it did.
While technology has helped us keep in touch with friends and family in faraway places, it appears to have degraded our ability to connect in person. Staying connected online is a double-edged sword since the same news feed that brings us pictures of the grandkids and fluffy kittens also feeds us news about the wars in Israel and Ukraine, the dysfunction in Washington, the latest mass shooting and the climate crisis.
Hispanics may have a built-in defense against the isolation technology breeds, owing to their large, multigenerational households, strong social support systems, and tendency to use social media to stay connected with relatives abroad.
Age and Gender
When asked how individuals rate their mental health, men rate it higher than women by 11 percentage points, and Baby Boomers rank it highest at 83%, saying it’s good or excellent vs. 57% of Gen Z saying the same.
Gen Z spends the most amount of time on social media, so the notion that social media negatively affects mental health appears to be correlated. Unfortunately, Gen Z is also the generation that’s least comfortable discussing mental health concerns with healthcare professionals. Only 40% of them state they’re comfortable discussing their issues with a professional compared to 60% of Millennials and 65% of Boomers.
Race Affects Attitudes
As seen in previous research conducted by ThinkNow, Asian Americans lag other groups when it comes to awareness of mental health issues. Twenty-four percent of Asian Americans believe that having a mental health issue is a sign of weakness compared to the 16% average for all groups. Asians are also considerably less likely to be aware of mental health services in their communities (42% vs. 55%) and most likely to seek out information on social media (51% vs. 35%).
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
This article is all about what AI trends will emerge in the field of creative operations in 2024. All the marketers and brand builders should be aware of these trends for their further use and save themselves some time!
A report by thenetworkone and Kurio.
The contributing experts and agencies are (in an alphabetical order): Sylwia Rytel, Social Media Supervisor, 180heartbeats + JUNG v MATT (PL), Sharlene Jenner, Vice President - Director of Engagement Strategy, Abelson Taylor (USA), Alex Casanovas, Digital Director, Atrevia (ES), Dora Beilin, Senior Social Strategist, Barrett Hoffher (USA), Min Seo, Campaign Director, Brand New Agency (KR), Deshé M. Gully, Associate Strategist, Day One Agency (USA), Francesca Trevisan, Strategist, Different (IT), Trevor Crossman, CX and Digital Transformation Director; Olivia Hussey, Strategic Planner; Simi Srinarula, Social Media Manager, The Hallway (AUS), James Hebbert, Managing Director, Hylink (CN / UK), Mundy Álvarez, Planning Director; Pedro Rojas, Social Media Manager; Pancho González, CCO, Inbrax (CH), Oana Oprea, Head of Digital Planning, Jam Session Agency (RO), Amy Bottrill, Social Account Director, Launch (UK), Gaby Arriaga, Founder, Leonardo1452 (MX), Shantesh S Row, Creative Director, Liwa (UAE), Rajesh Mehta, Chief Strategy Officer; Dhruv Gaur, Digital Planning Lead; Leonie Mergulhao, Account Supervisor - Social Media & PR, Medulla (IN), Aurelija Plioplytė, Head of Digital & Social, Not Perfect (LI), Daiana Khaidargaliyeva, Account Manager, Osaka Labs (UK / USA), Stefanie Söhnchen, Vice President Digital, PIABO Communications (DE), Elisabeth Winiartati, Managing Consultant, Head of Global Integrated Communications; Lydia Aprina, Account Manager, Integrated Marketing and Communications; Nita Prabowo, Account Manager, Integrated Marketing and Communications; Okhi, Web Developer, PNTR Group (ID), Kei Obusan, Insights Director; Daffi Ranandi, Insights Manager, Radarr (SG), Gautam Reghunath, Co-founder & CEO, Talented (IN), Donagh Humphreys, Head of Social and Digital Innovation, THINKHOUSE (IRE), Sarah Yim, Strategy Director, Zulu Alpha Kilo (CA).
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
The search marketing landscape is evolving rapidly with new technologies, and professionals, like you, rely on innovative paid search strategies to meet changing demands.
It’s important that you’re ready to implement new strategies in 2024.
Check this out and learn the top trends in paid search advertising that are expected to gain traction, so you can drive higher ROI more efficiently in 2024.
You’ll learn:
- The latest trends in AI and automation, and what this means for an evolving paid search ecosystem.
- New developments in privacy and data regulation.
- Emerging ad formats that are expected to make an impact next year.
Watch Sreekant Lanka from iQuanti and Irina Klein from OneMain Financial as they dive into the future of paid search and explore the trends, strategies, and technologies that will shape the search marketing landscape.
If you’re looking to assess your paid search strategy and design an industry-aligned plan for 2024, then this webinar is for you.
5 Public speaking tips from TED - Visualized summarySpeakerHub
From their humble beginnings in 1984, TED has grown into the world’s most powerful amplifier for speakers and thought-leaders to share their ideas. They have over 2,400 filmed talks (not including the 30,000+ TEDx videos) freely available online, and have hosted over 17,500 events around the world.
With over one billion views in a year, it’s no wonder that so many speakers are looking to TED for ideas on how to share their message more effectively.
The article “5 Public-Speaking Tips TED Gives Its Speakers”, by Carmine Gallo for Forbes, gives speakers five practical ways to connect with their audience, and effectively share their ideas on stage.
Whether you are gearing up to get on a TED stage yourself, or just want to master the skills that so many of their speakers possess, these tips and quotes from Chris Anderson, the TED Talks Curator, will encourage you to make the most impactful impression on your audience.
See the full article and more summaries like this on SpeakerHub here: https://speakerhub.com/blog/5-presentation-tips-ted-gives-its-speakers
See the original article on Forbes here:
http://www.forbes.com/forbes/welcome/?toURL=http://www.forbes.com/sites/carminegallo/2016/05/06/5-public-speaking-tips-ted-gives-its-speakers/&refURL=&referrer=#5c07a8221d9b
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
Everyone is in agreement that ChatGPT (and other generative AI tools) will shape the future of work. Yet there is little consensus on exactly how, when, and to what extent this technology will change our world.
Businesses that extract maximum value from ChatGPT will use it as a collaborative tool for everything from brainstorming to technical maintenance.
For individuals, now is the time to pinpoint the skills the future professional will need to thrive in the AI age.
Check out this presentation to understand what ChatGPT is, how it will shape the future of work, and how you can prepare to take advantage.
A brief introduction to DataScience with explaining of the concepts, algorithms, machine learning, supervised and unsupervised learning, clustering, statistics, data preprocessing, real-world applications etc.
It's part of a Data Science Corner Campaign where I will be discussing the fundamentals of DataScience, AIML, Statistics etc.
Time Management & Productivity - Best PracticesVit Horky
Here's my presentation on by proven best practices how to manage your work time effectively and how to improve your productivity. It includes practical tips and how to use tools such as Slack, Google Apps, Hubspot, Google Calendar, Gmail and others.
The six step guide to practical project managementMindGenius
The six step guide to practical project management
If you think managing projects is too difficult, think again.
We’ve stripped back project management processes to the
basics – to make it quicker and easier, without sacrificing
the vital ingredients for success.
“If you’re looking for some real-world guidance, then The Six Step Guide to Practical Project Management will help.”
Dr Andrew Makar, Tactical Project Management
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Time series classification (search queries)
1. Time Series Trend Classification:
Classifying Google Search Queries
According to Their Search Volume
Trend Over Time
Esteban Ribero
Head of Strategy - Planning & Insights, at Performics (Publicis Groupe)
(eribero@gmail.com)
May 5, 2021
Abstract
People’s queries to Google are powerful behavioral traces of people’s intentions
and needs of information. Google classifies the trends in search volume over time
of the queries submitted to its system into 4 district categories: Declining,
Sustained Growth, Fast Rising, and Emerging. This report describes 3 modeling
approaches explored to develop a classifier that replicates Google’s categorization
of times series: 1) conventional tree-based models such as random forest and
gradient boosting, 2) time-specific models such as time series forest and k-nearest
neighbors with dynamic time warping (DTW) and 3) deep learning models such as
recurrent neural networks and 1-d convolutional neural networks. We found
conventional tree-based models, especially gradient boosting (Xgboost) to be best
suited for this task achieving 92% levels of F1-accuracy with high precision and
decent recall.
Keywords: Google’s search queries, Time Series Classification, Search Queries
Trends, Deep Learning, Random Forrest, Gradient Boosting, Time Series Trends,
RISE, BOSS, MiniRocket, RNNs, CNNs.
Google’s search data are a powerful source of information about people’s intentions and
interests in content. Google makes some of these data available to different users (the public,
advertising agencies, marketers, etc.) via different tools. A common use of the data is to look at
the trends in search volume over time to learn about the changing intentions and needs of
2. 1
consumers of different product categories. Google classifies these trends into four distinct
categories: Declining, Sustained Growth, Fast Rising, and Emerging. Figure 1 shows examples
of Declining and Emerging search queries. This typification of time-series trends is a simple yet
powerful way to group search queries with similar trends over time making the analysis and
identification of important trends faster and more meaningful.
Figure 1. Examples of Declining and Emerging, search queries according to their search
volume trend over time.
3. 2
The challenge is that these labels are only available in a few tools and for only certain
product categories while other relevant data, including the search volume over time for a broader
set of search queries, is available in other tools. Developing a classifier to label search queries
according to their trends over time will be a powerful insights tool that can be combined with
other tools that have been developed to analyze search queries for topic extraction and intent
identification using Natural Language Processing (NLP). This report describes the method and
results of the development of such a classifier.
Time Series Classification Overview
Time Series Classification (TSC) is a special case of the classification problems in
supervised machine learning. The important difference between TSC and traditional
classification problems is that the attributes are ordered (Bagnall, et al., 2017). This ‘temporal’
ordering characteristic is not limited to time but to any situation where there might be
discriminatory features depending on the ordering. In fact, ‘time series’ data are present in
almost every task requiring some sort of human cognition making it an important and
challenging problem in data mining (Fawaz, Forestier, Weber, et al., 2019). Since this is an
active area of research in Machine Learning, there are thousands of algorithms and approaches
developed over the years (Bagnall, et al. 2017, Fawazm et al., 2019). Following is a short
description of a few of the most popular approaches.
Conventional tree-based models. TSC problems can be cast as traditional classification
problems when the order of the features is not considered. In this approach feature extraction and
feature engineering are an important part of the process where features are extracted or
4. 3
constructed, and conventional classification algorithms such Random Forest or Gradient
Boosting classifiers are used.
Time-series-specific non-deep learning models. To take advantage of the ordered
nature of the values in time series, time-series-specific classifiers have been developed. In this
section, a few non-deep learning time-series-specific models are described. Deep-learning
models are described in a separate section.
1-Nearest Neighbors with Dynamic Time Warping (1-NN-DTW) is one of the most
popular and long-time benchmarks for time-series classification (Bagnall, et al., 2017). It is the
conventional KNN algorithm with Dynamic Time Warping (DTW) as the distance measure
instead of the Euclidean distance. DTW measures similarity between two sequences that may not
align exactly in time, speed, or length.
Time Series Forest Classifier (TSF) is an adaptation of the Random Forest classifier to
time series data. It splits the series into random intervals, extracts summary features such as the
mean, standard deviation, and slope from each interval, and trains a decision tree on the extracted
features. It repeats this process for a set number of trees and classifies the series according to
majority vote (Deng et al., 2013).
Random Interval Spectral Ensemble (RISE) is a popular variant of TSF. It only uses a
single time interval per tree, instead of multiple time intervals as the TSF and it is trained using
spectral features extracted from the series, instead of summary statistics. RISE uses several
series-to-series feature extraction transformers such as fitted auto-regressive coefficients,
estimated autocorrelation coefficients, and power spectrum coefficients.
BOSS Ensemble (Schäfer, 2015) is an ensemble of dictionary-based classifiers that
transform time series values into a sequence of discrete ‘words’ using a truncated Discrete
5. 4
Fourier Transform. The distribution of the extracted words is then the basis of the classification.
Dictionary-based algorithms are useful when the frequency of repetition of subseries (words) is
more important than their presence or absence (Bagnall, et al., 2017).
Contractable BOSS (cBOSS) is a variation of the original BOSS Ensemble algorithm
with several improvements in terms of memory requirements and speed.
Deep-learning models for time series data. Popular Deep Neural Networks (DNN) such
as Recurrent Neural Networks (RNNs) with the different variations of recurrent units
(simpleRNN, Gated Recurrent Unit -GRU, Long Short Term Memory -LSTM) and 1-
Dimensional Convolutional Neural Networks (1D-CNNs) can be used for time series
classification (Chollet, 2017). Sophisticated architectures such as ResNet or hybrid approaches
such as ROCKET (RandOm Convolutional KErnel Transform) are now reaching state-of-the-art
performance for TSC tasks with fewer computational requirements (Dempster, et al., 2020a).
MiniRocket (Dempster, et al., 2020b) is the most current and optimized version of
ROCKET. The algorithm first transforms the time series using random convolutional kernels,
such as those used in a CNN, and then trains a linear classifier (usually a Ridge Classifier) with
these features. Unlike typical CNN’s, ROCKET uses a variety of kernels and many of them
(10,000 is the default). The random lengths, dilations, paddings, weights, and biases of these
kernels allow ROCKET to capture a wide range of information making it a formidable TSC that
is fast and achieves state-of-the-art accuracy.
6. 5
Data
5642 search queries and their monthly search volume for 2 full years (24 months each)
were collected from Google’s Insights Finder tool. The data set was assembled by selecting
search queries from 12 different product categories (shoes, cars, electronics, bicycles, pet
products, etc.), and sampling queries from different calendar months so highly seasonal months
such as December or November would not always be months 12 and 24 or 11 and 23. The data
are already scaled and indexed to the maximum, so for each period, the search volume takes
values from 0 to100. The original data came with 36 observations for each time series and the
corresponding label, as well as their average monthly search volume in absolute terms. Since the
data sets to which we would end up using the classifier in production often have 24 observations
for each time series (the last two years of monthly data) we discarded the first 12 months of data
for each time series keeping the last 24. Since Google’s rules to classify the trends use daily
search volume (in absolute terms not indexed), it is possible that once the search volume gets
aggregated to a monthly basis and indexed several search queries, originally on different classes,
would end up in the same class afterward introducing some noise to the data. So, some noise in
the data is expected. There is a lot of variability whiting each of the categories, as can be
observed in figure 2, but their general trend can be easily identified. There are several outliers,
particularly for the Emergent category, and it is likely that those might be better represented by
the Fast-Rising category.
7. 6
Figure 2. Box and whisker plot for times series in each category. Although there is great
variability at each time period within each category and between time periods, the general
pattern represented by the median and the interquartile range is distinct for each category.
Feature Engineering. To identify potential features for the conventional Random Forest
and Gradient Boosting models a thorough exploratory data analysis was performed. The
following three groups of features were extracted and used for modeling:
Group 1. Features exploiting difference across time periods (Figure 3):
• Last 3 months of most recent year vs prior year (Q4Y2vsQ4Y1).
• Last 3 months of most recent year vs prior 3 months (Q4Y2vsQ3Y2).
• Last 6 months of most recent year vs prior year (H2Y2vs H2Y1).
• Last 6 months of most recent year vs first 6 months (H2Y2vsH1Y2).
• Year 2 vs year 1 (Y1vsY2).
Search Trend Categories
8. 7
Figure 3. Box and whisker plot for features taking advantage of indexed-search-volume
differences across different time periods. Declining search queries are the most distinctive
followed by Sustained Growth. Fast Rising queries often overlap with Emerging and Sustain
Growth queries.
Group 2. Descriptive features (Figure 4):
• Mean across 24 months
• Median across 24 months
• Standard Deviation across 24 months
• Min across 24 months
• Average monthly searches in absolute numbers (Volume)
9. 8
Figure 4. Box and whisker plot for descriptive features. Emerging search queries are the most
distinctive across these features followed by Fast Rising. Sustained Growth and Declining
queries overlap often.
10. 9
Group 3. Trend shape over time (Figure 5). To capture the shapes of the trends over time
with a few attributes, a 3rd order polynomial trend line was fitted to each of the 5642 time series
and each of the coefficients of the resulting formula representing the trend were added as a
feature. Figure 6 shows some examples of the original time series and their corresponding
polynomial trend line.
Figure 5. Box and whisker plot for trendline-shape features. The coefficients of the 3rd
polynomial trend appear to be powerful for differentiating Emerging search queries while they
seem to overlap often for the other categories.
11. 10
Figure 6. Sample of time series and their 3rd
order polynomial trendline by category. The
trendlines make it easier to visually identify the trend lover time. For instance, the inflection
points appear more dramatic for Emerging and Fast Rising search queries than the other ones.
The downward trend is also evident for Declining queries. However, the shape of the trends is
often similar and only the rotation and relative position in the y-axis appears to be the key
differentiating characteristics.
Polynomial Trendlines
12. 11
Methods
Models. Different versions of the models described in the TSC overview section above
(except for ResNet) were trained and tested with different data inputs.
For the conventional tree-based models, 4 data inputs were used: 1) The original data set
with monthly data plus average monthly search volume (no feature engineering). 2) The same
data set but aggregated by quarters to reduce variability. 3) The set of engineered features
described above. 4) The combination of quarterly data and feature engineering.
For the time-series-specific non-deep learning models, 3 data inputs were used: 1) The
original monthly data (excluding the average search volume feature). 2) the monthly data
smoothed with a 3-month rolling average. 3) the monthly data smoothed with a 6-month rolling
average.
For the deep-learning models, 3 data inputs were used: 1) The original monthly data
standardized with mean 0 and standard deviation equal to 1. 2) With a 3-month-rolling-average
smoothed data, also standardized. 3) With standardized quarterly data. Additionally, two
conventional Neural Networks (Baseline NNs), one with 2 layers and another one with 3-layers
of 100 units each, were trained to serve as a baseline for the deep-leaning models. These models
were trained with the original monthly data, standardized, as well as the engineered features also
standardized.
Train and test data sets for each of the data inputs described above were created using
75/25 % splits. The train data set was further divided into train and validation set for the deep
learning models using an 80/20 % split.
The best models from each of the 3 modeling approaches were combined in an ensemble
model to see if higher levels of performance were possible. These models were the Xgboost with
13. 12
feature engineering + quarterly data, the Random Forest with original monthly data, the Time
Series Forest with original monthly data, and the MiniRocket_Ridge with smoothed data.
Finally, the best model of all, the Xgboost with feature engineering + quarterly data, was
further fine-tuned and trained by calibrating its hyperparameters using Grid Search with 5-fold
cross-validation with the entire data set.
Performance metric. To compare the models’ performance, a Precision and Recall
framework was used with the weighted F1-Score as the measure of accuracy. The models were
also compared using Recessive Operating Characteristic (ROC) curves and their corresponding
area under the curve (AUC) measure for each class.
Results and Discussion
Table 1 summarizes the results for each of the models trained. In comparison with a
dummy classifier that would always predict the most frequent class, the performance of all
models is quite an improvement. Except for the 1-NN-DTW and RISE models, the weighted F1
score is above 80% and the best performer reaches 88%. The winning models are the
conventional tree-based models. Regardless of the data input, the conventional Random Forest
and Xgboost models achieve > 85% F1 accuracy on the test set. Only the MiniRocket_Ridge
model with and without smoothed data also reached 85% F1 accuracy.
14. 13
Model Train set Test set
Baseline Reference
Dummy Classifier 0.2432 0.2332
Conventional Tree-based Models
Random Forest (with monthly data) 1.0000 0.8716
Random Forest (quarterly data) 0.9962 0.8579
Random Forest (with feature engineering) 0.9915 0.8554
Random Forest (with feature engineering + quarterly data) 0.9998 0.8690
Xgboost (with monthly data) 0.9055 0.8691
Xgboost (with quarterly data) 0.8846 0.8574
Xgboost (with feature engineering) 0.8975 0.8518
Xgboost (with feature engineering + quarterly data) 0.9115 0.8829
Time-Series-Specific Non-DL Models
1-NN-DTW 0.9993 0.7498
1-NN-DTW (with smoothed data 3) 0.9995 0.7676
1-NN-DTW (with smoothed data 6) 0.9991 0.7775
Time Series Forest 0.9976 0.8497
Time Series Forest (with smoothed data 3) 0.9995 0.8438
Time Series Forest (with smoothed data 6) 0.9995 0.8382
RISE 0.9969 0.7564
RISE (with smoothed data 3) 0.9962 0.7533
RISE (with smoothed data 6) 0.9986 0.7160
BOSSEnsemble 0.9960 0.8048
cBOSS 0.9969 0.8054
Deep Learning Models
Baseline NN_100_100 0.8531 0.8228
Baseline NN_100_100_100_100 0.8756 0.8199
Baseline NN_100_100_100_100 (with feature engineering) 0.8361 0.8154
Simple_RNN_24_12_6_3 0.8294 0.8193
GRU_32_32_32_32 0.8184 0.8105
GRU_32_32_32_32 (with smoothed data 3) 0.7991 0.8031
GRU_32_32_32_32_Q 0.8288 0.8261
LSTM_24_24_24_24 0.8049 0.8000
LSTM_24_24_24_24_ (with smoothed data 3) 0.8009 0.7969
LSTM_24_24_24_24_Q 0.7865 0.7807
1D-CNN_32_32_32_3_GlobalMax_Pooling 0.8510 0.7906
1D-CNN_32_32_32_3_GlobalMax_Pooling (with smoothed 3) 0.8493 0.7932
1D-CNN_32_32_32_3_GlobalMax_Pooling_Q 0.8561 0.8316
MiniRocket_Ridge 0.8998 0.8578
MiniRocket_Ridge (with smoothed data 3) 0.8957 0.8581
F1 Score (w)
Table 1. Weighted F1 Scores for all models trained. The models in each section with the
best scores on the test set are highlighted.
15. 14
At first glance, it is surprising that the conventional machine learning models, not
designed specifically for time series, perform the best across the board. Even more surprising is
the fact that it is possible to achieve almost the highest performance by simply using the raw
monthly data with these conventional tree-based models. These models perform better even with
quarterly raw data vs features that have been engineered to take advantage of volume differences
between periods. Although the differences are minor, it suggests that the temporal ordering of the
values is less important for this challenge. Only when combining quarterly data with the
engineered features for the Xgboost we can achieve slightly higher performance.
The lack of strong importance of the temporal ordering of the values is also suggested by
the fact that the time-series-specific models are the ones performing the worst among these
models except for the Time Series Forest that barely missed the 85% mark. This is even true for
the deep learning models where a basic 2-layer Neural Net (Baseline NN_100_100) slightly
outperforms most of the more sophisticated sequence-based models such as the traditional RNN
(Simple_RNN), the Gated Recurrent Unit (GRU), and the Long Short Term Memory (LSTM)
models. The 1D-CNN did not perform better either. It is important to note that some light
hyperparameter tuning was performed for the deep learning models, and so it may be possible to
improve their performance with more fine-tuning, but the gains are probably going to be small.
As mentioned before, only the MiniRocket_Ridge achieved similar performance as the
conventional tree-based models. MiniRocket is a promising model that combines the
sophistication of CNNs with the speed of the conventional tree-based models, however, in this
study, it trailed slightly.
The lack of strong importance of temporal patterns could be explained by the large
diversity of the time series within each category, as evidenced by the boxplots in Figure 2 and
16. 15
the few samples of queries in Figure 6. These time series are similar to one another mostly in
their general long-term trend and not in specific short-term or cyclical patterns. For this precise
reason, we were not expecting the BOSSEnsemble or the cBOSS models to perform particularly
well since they are best suited to pick up patterns that repeat frequently in time series. Maybe this
is also the reason for the relatively poor performance of the other models.
Smoothing the data did not help much. Only the 1-NN-DTW model improved
significantly with smoothed data, and the more pronounced the smoothing the better. This makes
sense since reducing the temporal noise by smoothing makes the queries more similar to one
another, the basic mechanism by which K-Nearest Neighbors models work. The
MiniRocket_Ridge model also performs better with smoothed data, but the difference is so small
that it may be a random coincidence.
Figure 7. Feature importance for two versions of the Random Forest and the Xgboost models.
Random Forest (feature engineering)
Random Forest (raw monthly data) Xgboost (raw monthly data)
Xgboost (feature engineering)
17. 16
Looking at the feature importance of the best models in figure 7, it is possible to see that
each model accomplishes its task differently which suggested that combining the predictions
from these models could also improve the performance as they rely on different aspects of the
data.
Figure 8. ROC curves and Confusion Matrices for top 2 models.
However, when looking at the confusion matrices and ROC curves in figure 8 we can
observe that the models are performing very similarly and struggle to differentiate Fast Rising
18. 17
Model Train set Test set
Best individual models
Random Forest (with monthly data) 1.0000 0.8716
Xgboost (with feature engineering + quarterly data) 0.9115 0.8829
Time Series Forest 0.9976 0.8497
MiniRocket_Ridge (with smoothed data 3) 0.8957 0.8581
Ensemble models
Random Forest + Xgboost 0.9794 0.8841
Random Forest + Xgboost + MiniRocket_Ridge 0.9794 0.8841
Random Forest + Xgboost + MiniRocket_Ridge + Time Series Forest 0.9943 0.8790
Final model
Xgboost (with feature engineering + quarterly data -fine tuned) 0.9270 0.9182
F1 Score (w)
from Sustained Growth. There are still differences in the mistakes they make: The Random
Forest makes more mistakes misclassifying Emerging as Fast Rising but fewer mistakes
misclassifying Fast Rising as Sustained Growth. We hypothesized that combining these two
models would balance the mistakes and would give us a boost in performance, but as seen in
Table 2, the gains are minimal.
Table 2. Weighted F1 Scores for best individual models, ensemble models, and final
model after fine-tuning hyperparameters. The models in each section with the best scores
on the test set are highlighted.
Is worth noting that most of the best individual models here are already ensembles of
models picking up signals from different aspects of the data, and so it may not be that surprising
after all that combining them may not add much to an already varied set of models. What it is
surprising is that the MiniRocket_Ridge model did not add anything to the mix, it being the most
different of all the models. The models were ensembled averaging their predicted probabilities
for each class and then assigning the final prediction to the highest probability class. The
MiniRocket_Ridge model was either predicting the same labels or overwritten by the other 2
models in the ensemble. Adding the Time Series Forest did the opposite, it switched some
19. 18
correct predictions from the other three models into incorrect predictions worsening the
performance.
Given that the gains from combining the prediction from the Random Forest and the
Xgboost did not increase performance significantly, it was decided to further fine-tune the
Xgboost with grid search as described in the method sections. The calibrated model was then
trained with the same train and test data as the others for a fair comparison. In this case, we did
break the > 90% performance wall and achieved a weighted F1 score of 92%. To further assess
the performance of the final model and to identify potential drawbacks, the model trained via
grid search with cross-validation was used to predict the label of all the observations in the data
set. Table 3 shows the detailed precision/recall classification report for the final model using the
entire data set.
precision recall f1-score support
Class
Declining 1 1 1 672
Emerging 0.95 0.98 0.96 674
Fast Rising 0.9 0.88 0.89 1946
Sustained Growth 0.92 0.92 0.92 2350
accuracy 0.92 5642
macro avg 0.94 0.95 0.94 5642
weighted avg 0.92 0.92 0.92 5642
Classification Report
Table 3. Precision/recall classification report for the fined-tuned Xgboost model across
the entire data set. All classes are getting precision scores equal to or higher than 90%.
Only the recall for Fast Rising queries is below the 90% mark.
The performance of the final model is satisfactory. The precision of the predictions for
classes is accurate 90% of the time or above. The model is capable of identifying all the
Declining search queries without any error. Similarly, its recall of Emerging queries is 98% and
20. 19
will only make precision mistakes 1 out of 20 times. The performance for search queries with
Sustained Growth, the biggest class, is at or above 92%, both in terms of recall and precision.
The model is less accurate for Fast Rising search queries: The prediction tends to be accurate 9
out of 10 times but the model may miss identifying about 12% of the search queries in this
category. This is ok given that precision is the most important measure in this case since the
model will be used to identify search queries with clearly identified trends for insights purposes,
and so making sure the prediction is accurate to a high degree is the main goal, even if it is
failing to identify a few queries that should have been recalled. The user would have to keep an
eye on the accuracy of the prediction for Fast Rising knowing that some will tend to be
misclassified as Sustained Growth and a few as Emerging as evidenced in the confusion matrix
shown in Figure 9. Also, some queries predicted as Fast Rising will be Sustained Growth.
Figure 9. Confusion Matrix for the fine-tuned Xgboost model across the entire data set.
Most mistakes will be misclassified Fast Rising queries as Sustained Growth and
Sustained Growth as Fast Rising.
21. 20
Conclusion. The results obtained above are satisfactory. The final model is predicting the
general trend of the search volume over time of queries submitted to search engines such as
Google with a high degree of precision. The model is missing the mark on a few of the
predictions for Fast Rising search queries, but this is an acceptable trade-off given the usefulness
of the model in automating the identification of the trends.
In terms of learnings from the modeling exercise, the lack of strong importance of the
temporal ordering of the values in these time series is surprising but not concerning. The long-
term pattern in each of the classes is not very complex and so it may be sufficient, even
preferable, for some of these models to rely on the raw values at each time. This may also be due
to the large variability of the values at each period for the queries in each of the classes. The
general long-term trend is not related to the micro patterns one may find when looking into
shorter intervals of time or for particular cyclical patterns that are frequently present in search
queries for seasonal products or high-demand periods such as holidays.
A final note regarding the best-performing models for this task. It is also worth noting
that although we refer to Random Forrest and Xgboost as ‘conventional’ tree-based models, they
are sophisticated algorithms that tend to perform extremely well in general. Xgboost, in fact, has
become the winner in many competitions (Chollet, 2018) and is suggested as a good alternative
to deep learning models when structured data is available by the very same people that are at the
front of the deep learning revolution such as François Chollet (2018) the developer of Keras.
22. 21
References
Bagnall, Anthony, Jason Lines, Aaron Bostrom, James Large, and Eamonn Keogh. “The Great
Time Series Classification Bake off: a Review and Experimental Evaluation of Recent
Algorithmic Advances.” Data Mining and Knowledge Discovery 31, no. 3 (2016): 606–60.
https://doi.org/10.1007/s10618-016-0483-9.
Chollet François. Deep Learning with Python. Shelter Island, NY: Manning Publications Co.,
2018.
Dempster, Angus, François Petitjean, and Geoffrey I. Webb. “ROCKET: Exceptionally Fast and
Accurate Time Series Classification Using Random Convolutional Kernels.” Data Mining
and Knowledge Discovery 34, no. 5 (2020): 1454–95. https://doi.org/10.1007/s10618-020-
00701-z.
Dempster, Angus, Daniel F. Schmidt, and Geoffrey I. Webb. “MINIROCKET: A Very Fast
(Almost) Deterministic Transform for Time Series Classification.” arXiv.org, December
16, 2020. https://arxiv.org/abs/2012.08791v1.
Deng, Houtao, George Runger, Eugene Tuv, Martyanov Vladimir. “A time series forest for
classification and feature extraction.” Information Sciences. Volume 239, 2013, Pages 142-
153, ISSN 0020-0255, https://doi.org/10.1016/j.ins.2013.02.030.
Fawaz, Hassan Ismail, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-
Alain Muller. “Deep Learning for Time Series Classification: a Review.” Data Mining and
Knowledge Discovery 33, no. 4 (2019): 917–63. https://doi.org/10.1007/s10618-019-
00619-1.
Schäfer, P. The BOSS is concerned with time series classification in the presence of noise. Data
Min Knowl Disc 29, 1505–1530 (2015). https://doi.org/10.1007/s10618-014-0377-7