Learn our standard approach to modelling topics in Explorer. By adjusting your topics for the domain you're working in, you can get clearer results and more actionable insights.
2. August 2019
This deck contains a few starting tips for modelling a project in Explorer.
Each domain is different
- we recommend adapting the procedure to
find a model that works for you!
For detailed documentation visit docs.gavagai.io
3. August 2019August 2019
Prepare Your Data
Explorer is built to analyze large numbers of texts.
More texts typically means better results, due to
sample size and available terms and topics.
● In addition to texts, include metadata (such as
Ratings/Grade, Date, Revenue etc.) if available.
For the Grade column, check that the first 10
values are not identical, or this column will be
ignored. Re-order your rows if necessary.
● Label your headings. You will need to find the
heading with text data, and headings for other
metadata will be displayed in the analysis.
● Organize your data into either a CSV or XLSX file
with each row representing 1 review/text.
● Explorer reads 1 row as 1 review,
regardless of number of columns.
4. August 2019August 2019
Configure Your Project
Name your Project - it’s visible when shared
⚙️Configure Project Settings:
● Use either algorithm:
New - More accurate, but more sparse results
Classic - more results
● Number of Topics: 30 to start, reduce as needed
● Analyze Coverage = ✔
● Coverage analysis: ALL
● Sentiment: Negative & Positive, Neutral=On
● Binary Sentiment calculation
Save Settings, then Update and Save your
Project to apply the settings - You may
need to Pin and Un-pin a topic first.
You may need to
Pin and unpin a
Topic to enable
Update and Save
Start Merge/Group
Merge
Group
Ignore
Pin Topic
Edit
5. August 2019August 2019
1st Pass of Modelling
First, we will consolidate Topics. If in doubt, use Show
Associations and View Examples to look into the
nature of a Topic.
● Merge topics that are highly synonymous in
your domain (such as rooms and suites in
hotels). Sort Alphabetically to make this easier.
● Group topics that you suspect might be
synonymous. Name your groupes and merged
topics.
● Ignore topics that are:
○ Only showing one sentiment and not descriptive
(Ignore Good, Terrible - Keep Comfortable, Ugly).
○ Referring to the object itself (Hotel, Hilton).
○ Irrelevant, causing problems or
throwing off the results
- use discretion here!
Start Merge/Group
Merge
Group
Ignore
Pin Topic
Edit
6. August 2019August 2019
2nd Pass of Modelling
Update and Save to apply your changes. The
Unclassified topic tells you how much coverage you
have - we want to reduce the unclassified %.
● Edit all major topics to see if you need to
expand or correct them.
○ Remove words in the topic that should not be
included (i.e. not related closely enough)
○ Add Words you want to include, and use Get
words to see Explorer’s suggestions.
● Continue to Merge, Group and Ignore topics.
● Uncheck Sort Alphabetically when you want to
sort by how often the topic is mentioned
instead.
Start Merge/Group
Merge
Group
Ignore
Pin Topic
Edit
7. August 2019August 2019
3rd Pass of Modelling
Update and Save to apply your changes. Use your
understanding of the subject matter and the data, as
well as associations and examples, to Continue to
consolidate and refine your topics like in the previous
steps.
● Look up the topics that show up in associations,
and the associations of those associations.
● Groups should now be growing in %.
● Visualize Results to check how it looks in a
Dashboard. Keep consolidating and refining.
● Consider shrinking the number of topics to
to the number you want to display in the final
report (via Project Settings).
8. August 2019August 2019
4th and Further Passes
Update and Save to apply your changes. Keep
modelling, checking in on the Dashboard, Unclassified,
Examples and Associations to gauge your progress.
● When you reach the point where further
changes are either oversimplifying the results or
require a lot of consideration - move on.
● In Settings, uncheck Analyze Coverage ❌
to hide the Unclassified Topic.
● When ready, Publish your Dashboard to get a
shareable link.
9. August 2019August 2019
Troubleshooting
My Project is slow to explore:
Turn off “Should sentiment be calculated” for faster modelling, switch it back on when ready
No sentiment showing for large projects:
For big files, the setting “On demand sentiment and suggestions” will prevent sentiment from being calculated. Raise
the threshold for this setting, but it may slow things down when modelling.
Sentiment insights are being shown instead of Grade, but I have Grade data in column:
Check that the first 10 grade values are not all identical. If they are, change the order of your rows so they are not all
identical. You may also need to check that the column only contains numeric values in a set range (e.g. 1-5 or 1-10).
My time-based graphs show very few values:
If your timestamp data spans more than 2 years, Explorer will show time-based graphs aggregated by year. To show
finer resolution, reduce your sample to less than 24 (whole) months, and Explorer will use months instead of years.