Building a market and competitive intelligence platform presents unique challenges:
1. Sourcing information from thousands of websites continuously in a way that software can integrate while accounting for the dynamic nature of websites.
2. Removing irrelevant information requires fine-tuning what is relevant to one's business from the vast amount of web data.
3. Removing duplicate information requires sophisticated techniques to compare new data to existing data while accounting for ways websites make information appear unique.
4. Identifying companies and people mentioned requires understanding context and named entity recognition, which is challenging for common words and company names.
5. The platform must analyze aggregated information by accurately specifying industries and topics while accounting for complexity and lack of clear divisions between them.
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Behind the Scenes of a Market and Competitive Intelligence Platform
1. Behind the Scenes of a
Market and Competitive
Intelligence Platform
Key Challenges
2. Table of Contents
→ Introduction
→ Challenges faced while building a Market & Competitive
Intelligence platform
→ Sourcing of information
→ Removing irrelevant information
→ Removing duplicate or similar information
→ Identifying companies and persons
→ Confusions about companies and the mentions
→ Specifying Industry and topics of the article
→ Perspective of the social media
→ Conclusions and takeaways
4. Introduction
It took us a long time to build a market and
competitive intelligence platform
→ The platform is devised to continuously monitor thousands of websites for new information on
competitors, customers, industries and other signals such as sales opportunities.
→ All this which fits in a single line is a work of constant monitoring, testing and implementation
conducted over a decade.
→ In today’s time, everyone knows the importance of such competitive intelligence platform and
some CTOs are even confident that such a platform can be build over a month with five
engineers.
→ But the pointers ahead in this deck will prove that while it might be easy to start this project but
it is painfully difficult to finish it.
5. Key challenges faced while
building a MI platform
—Sourcing of information
—Removing irrelevant information
—Removing duplicate
—Identifying companies and persons
—Confusions about the mentions
—Specifying Industry and topics
—Perspective of the social media
6. Taking up the task of
building a market and
competitive intelligence
platform comes with unique
challenges
Sourcing of information
Integrating thousand websites with new information that is continuously
monitored.
Removing irrelevant information
Removing information that is not relevant to one’s business which is most web
data.
Removing duplicate information
Comparing the new information with everything else in our database.
Identifying companies and persons
Building the capability for technology to identify relevant companies and people.
Confusions about companies and the mentions
Managing the complexity of the problem of aboutness in the information collected.
Specifying Industry and topics of the article
Analyzing the aggregated information from different industries and topics
Perspective of the social media
Integrating different platforms and detecting the accuracy of information there.
7. Sourcing of
information
1
→ Most of the websites post information for humans to
read, not for a software to integrate
→ Interpreting information correctly from a website
→ Integration with unique websites
→ No universal standards for website development
→ Analyst spends time in analyzing each insight.
→ Scrapping of the ‘intelligent web pages’ is not easy
because they are responsive, dynamic and
personalized. These use cookies, JavaScript, AJAX calls
for generating a unique web page for user.
→ Dynamic name of the webpages that issues no warning
before changing the whole scenario.
8. Removing irrelevancy2
→ Defining principles for data relevance is difficult for the
dynamic and unique nature of information on web.
→ Contify fine-tunes this with learnings from the data that
comes at a very high technical and operational cost.
→ Removing the non-business information right at the
source such as crime, politics, entertainment, sports
→ E.g. we can remove the stories with the word “kill” in the
title with the possibility that they are crime related, but
we cannot ignore stories like “Google aims to kill
passwords”.
→ Remove the information related to business but not
relevant for business.
→ E.g.- information about our industry but from a different
geography, or information about our competitor but for a
different segment where we don’t compete.
9. Removing duplicates3
→ Comparing the new information with our database. But
websites do not duplicate in a manner that triggers
copyright or google algorithms to appear unique in
search optimizations.
→ Leveraging machine learning standard programs group
similar articles as they use efficient clustering
algorithms with reasonable accuracy. But the next
challenge is they incorrectly group different articles or
fail to group similar ones being a machine.
→ Google spent so much time to define such algorithms.
We struggled in figuring out cracks of such techniques.
→ Identifying the real article that is being duplicated and
not the other way round. We continued on our journey
of ‘Now what?’
10. Identifying companies4
→ In text analytics this is called Named Entity
Recognition.
→ Looking for words that have the first letter in uppercase
like ICICI.We can achieve this with some elementary
text processing. Now, if the following word also starts
with a capital letter then it is a part of the same name,
e.g. ICICI Bank.This could be true for the third word
also, ICICI Bank Ltd. So there are different patterns for
different identifications.
→ Company names which are common words, such as
Apple, Amazon, Gap are difficult to be recognized as
company names by the algorithms. For this, we need to
again look for other signals in the article.
→ Common English words cause a lot of confusions in
ordinary articles
11. Specifying Industry5
→ The industries are not set up in clear web of divisions.
→ Market Intelligence platform need to analyze the
aggregated information by industries & topics like
partnerships, business expansion, new offerings.
→ No rules to fine-tune the classification algorithms to
recognize words commonly used to describe an industry
→ Reaching accuracy is very difficult but in order to be a
sustainably reliable competitive intelligence platform,
there are not many shots to just try things
→ Example- a story reveals which company has acquired
what company and investment of which bank is
involved, it can easily be interchanged and turned out as
a banking acquisition.
12. Companies & mentions6
→ How to know whether the story is about the company or
just mentions the company? This is the problem of the
aboutness of the information.
→ Example- a story that say- “Amazon, Microsoft, Google,
and Oracle are also offering cloud computing solutions”.
Clearly, it mentions Microsoft. We don’t want our
“intelligence” users to get this in their updates for
Microsoft.
→ To address it we gave relevance scores to all the
companies in each article.
→ It is dependent on a lot of factors and knowledge base.
For example, for products and services signal, we need
a knowledge base of all the products and services of the
company.
13. Social media7
→ Social media is a web of information with very less
quality information that needs extraction.
→ Extracting a few relevant pieces of information from
tons of mindless shares, tweets, and retweets is like
finding a needle in a haystack without a magnet.
→ Our intelligence engine rejects more than 95% of social
updates from companies.
→ Increasing complexities on social media with the new
hacks of marketing. Companies have different accounts
not only for different regions but for different
departments too.
→ It is not easy to reach the right place, right article,
authentic profile of the companies in the junk of data on
social media platforms.
15. → Data is a goldmine on web but to extract the gold out of the
trash is a task that not everyone is capable of.
→ Example- Apple’s business strategy section of the annual
report had just two additional words in 2002 that were not
there in 2001.These were “cellular phones.”Yet, many were
surprised when Apple, a computer company, launched
iPhone five years later.
→ Competitive Market Intelligence is not an easy reach for any
team of developers but it is optimized keeping in mind
efficiencies of the organization and need to support better
internal decision making
Put this kind of effort in building a market
intelligence platform only if that is the
core of your business. If not, then building
one would not be wise — even if you have
a great technology team.
Key takeaways
16. ContifyStart a conversation
Thank you
Choose the industry-best
Competitive Intelligence and Market
Research system
marketing@contify.com
Read More
https://bit.ly/34zsYy7