Here's Why Text-Based Content Access and Management Plays Crucial Role in Real-Time BI
Here's Why Text-Based Content Access and Management
Plays Crucial Role in Real-Time BI
Transcript of a sponsored BrieﬁngsDirect podcast on information management for business
intelligence, one of a series on web data services with Kapow Technologies.
Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn
more. Sponsor: Kapow Technologies.
Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re
listening to BrieﬁngsDirect.
Today we present a sponsored podcast discussion on how text-based content and
information from across web properties and activities are growing in importance
to businesses. The need to analyze web-based text in real-time is rising to where
structured data was in importance just several years ago.
Indeed, for businesses looking to do even more commerce and community building across the
Web, text access and analytics forms a new mother lode of valuable insights to mine.
In Part 1 of our series on web data services with Kapow Technologies, we discussed how
external data has grown in both volume and importance across the Internet, social networks,
portals, and applications.
As the recession forces the need to identify and evaluate new revenue sources, businesses need to
capture such web data services for their business intelligence (BI) to work better, deeper, and
In Part 2, we dug even deeper into how to make the most of web data services for BI, along with
the need to share those web data services inferences quickly and easily.
Now, in this podcast, Part 3 of the series, we discuss how an ecology of providers and a variety
of content and data types come together in several use-case scenarios. We look speciﬁcally at
how near real-time text analytics ﬁlls out a framework of web data services that can form a
whole greater than the sum of the parts, and this brings about a whole new generation of BI
beneﬁts and payoffs.
Here to help explain the beneﬁts of text analytics and their context in web data services, is Seth
Grimes, principal consultant at Alta Plana Corp. Thanks for joining, Seth.
Seth Grimes: Thank you, Dana.
Gardner: We're also joined by Stefan Andreasen, co-founder and chief technology ofﬁcer at
Kapow Technologies. Welcome, Stefan.
Stefan Andreasen: Thank you, Dana.
Gardner: We have heard about text analytics for some time, but for many people it's been a bit
complex, unwieldy, and difﬁcult to manage in terms of volume and getting to this level of a
"noise-free" text-based analytic form. Something is emerging that you can actually work with,
and has now become quite important.
Let's go to you ﬁrst, Seth. Tell us about this concept of noise free. What do we need to do to
make text that's coming across the Web in sort of a ﬁre hose something we can actually work
Grimes: Dana, noise free is an interesting concept and a difﬁcult concept, when you're dealing
with text, because text is just a form of human communication. Whether it's
written materials or spoken materials that have been transcribed into text, human
communications are incredibly chaotic.
We have all kinds of irregularities in the way that we speak -- grammar, spelling,
syntax. Putting aside any kind of irregularities, we have slang, sarcasm,
abbreviations, and misspellings. Human communications are chaotic and they are
full of "noise." So really getting to something that's noise-free is very ambitious.
I'm going to tell you straightforwardly, it's not possible with text analytics, if you are dealing
with anything resembling the normal kinds of communications that you have with people. That's
not to say that you can't aspire to a very high level of accuracy to getting the most out of the
textual information that's available to you in your enterprise.
It's become an imperative to try to deal with the great volume of text -- the ﬁre hose, as you said
-- of information that's coming out. And, it's coming out in many, many different languages, not
just in English, but in other languages. It's coming out 24 hours a day, 7 days a week -- not only
when your business analysts are working during your business day. People are posting stuff on
the web at all hours. They are sending email at all hours.
Then, the volume of information that's coming out is huge. There are hundreds of millions of
people worldwide who are on the Internet, using email, and so on. There are probably even more
people who are using cell phones, text messaging, and other forms of communication.
If you want to keep up, if you want to do what business analysts have been referring to as a 360-
degree analysis of information, you've got to have automated technologies to do it. You simply
can't cope with the ﬂood of information without them.
That's an experience that we went through in the last decades with transactional information
from businesses. In order to apply BI or to get BI out of them, you have to apply automated
methods with specialized software.
Fortunately, the software is now up to the job in the text analytics world. It's up to the job of
making sense of the huge ﬂood of information from all kinds of diverse sources, high volume, 24
hours a day. We're in a good place nowadays to try to make something of it with these
Gardner: Of course, we're seeing the mainstream media starts behaving more like bloggers and
social media producers. We're starting to see that when events happen around the world, the ﬁrst
real credible information about them isn't necessarily from news organizations, but from
witnesses. They might be texting. They might be using Twitter. It seems that if you want to get
real-time information about what's going on, you need to be able to access those sorts of
Grimes: That's a great point Dana, and it helps introduce the idea of the many different use-
cases for text analytics. This is not only on the Web, but within the enterprise as
well, and crossing the boundary between the Web and the inside of the
Those use cases can be the early warning of a Swine ﬂu epidemic or other
medical issues. You can be sure that there is text analytics going on with
Twitter and other instant messaging streams and forums to try to detect what's going on.
You even have Google applying this kind of technology to look at the pattern of the searches that
people are putting in. If people are searching on a particular medical issue centered in a
particular geographic location, that's a good indicator that there's something unusual going on
It's not just medical cases. You also have brand and reputation management. If someone has
started posting something very negative about your company or your products, then you want to
detect that really quickly. You want early warning, so that you can react to it really quickly.
We have a great use case in the intelligence world. That's one of the earliest adopters of text
analytics technology. The idea is that if you are going to do something to prevent a terrorist
attack, you need to detect and respond to the signals that are out there, that something is pending
really quickly, and you have to have a high degree of certainty that you're looking at the right
thing and that you're going to react appropriately.
We have some great challenges out there, but, as I said, we have some great technologies to
respond to those challenges in a whole variety of business, government, and other applications.
Gardner: Stefan, I think there are very few people who argue with the fact that there is great
information out there on the Web, across these different new channels that have become so
prominent, but making that something that you can use is a far different proposition. Seth has
been telling us about automated tools. Tell us what you see in terms of web data services and
how we can make this information available to automated system.
Andreasen: Thank you Dana. Let's just look at something like Google. You go there and do a
search, and you think that you're searching the entire Internet. But, you're not,
because you're probably not going to access data that's hidden behind logins,
behind search forms, and so on.
There is a huge amount of what I call "deep web," very valuable information that
you have to get to in some other way. That's where we come in and allow you to
build robots that can go to the deep web and extract information.
I'd also like to talk a little bit more about the noise-free thing and go to the Google example. Let's
say you go to Google and you search for "IBM software." You think that you will be getting an
article that has something to do with IBM software.
You often actually ﬁnd an article that has nothing to do with IBM software, but, because there
are some advertisements from IBM, IBM was a hit. There is some other place that links to
software, and you will ﬁnd software. Basically, end up in something completely irrelevant.
Eliminating noise is getting rid of all this stuff around the article that is really irrelevant, so you
get better results.
The other thing around noise-free is the structure. It would be great if you could say, "I want to
search an article about IBM software which was dated after Oct. 7," or whatever, but that means
you also need to have that additional structured information in it.
The key here is to get noise-free data and to get full data. It's not only to go to the deep web, but
also get access to the data in a noise-free way, and in at least a semi-structured way, so that you
can do better text analysis, because text analysis is extremely dependent on the quality of data.
Grimes: I have to agree with you there, Stefan. It's very important to have tools that can strip
away not only the ads, but understand where the content is within a page and what's the
navigation on that page.
We might not be interested in navigation elements, the ﬂuff that's on a page. We want to focus on
the content. In addition, nowadays on the Web, there's a big problem of duplication of material
that's been hosted in multiple sites. If you're dealing with email or forums, then people typically
quote previous items in their reprise, and you want to detect and strip that kind of stuff away and
focus on the real relevant content. That is deﬁnitely part of the noise-free equation, getting to the
Gardner: Stefan, you refer to the deep web. I imagine this also has a role, when it comes to
organizations trying to uncover information inside of their ﬁrewalls, perhaps among their many
employees and all the different tools that they're using. We used to call it the intranet, but is there
an intranet effect here for this ability to gather noise-free text information that we can then start
Andreasen: Absolutely. I'd even say the extended intranet. If we're looking at a web browser,
which is the way that most business analysts or other persons today are accessing business
applications, we're accessing three different kinds of applications.
One involves applications inside the ﬁrewall. It could be the corporate intranet, etc. Then there
are applications where you have to use a login, and this can be your partners. You're logging in to
your supplier to see if some item is in stock. Or, it can be some federal reporting site or
The sites behind the login are like the extended enterprise. Then, of course, there is everything
out of the World Wide Web -- more than 150 million web pages out there -- which have all kinds
of data, and a lot of that is behind search forms, and so on.
Gardner: Seth, as a consultant and analyst, you've been focused on text analytics for some time,
but perhaps a number of our listeners aren't that familiar with it. Could you maybe give us a brief
primer on what it is that happens when you identify some information -- be it Internet, extended
web, deep web? How do you go through some basic steps to analyze, cleanse, and then put data
into a form that you can then start working with?
Grimes: Dana, I'm going to ﬁrst give you an extremely short history lesson, a little factoid for
you. Text analytics actually predates BI. The basic approaches to analyzing textual sources were
deﬁned in the late '50s. Actually, there is a paper from an IBM researcher from 1958, that deﬁnes
BI as the analysis of textual sources.
What happened is that enterprises computerized their operations, their accounting, their sales, all
of that in the 1960s. That numerical data from transactional systems is readily analyzable, where
text is much more difﬁcult to analyze. But, now we have come to the point, as I said earlier,
where there is software and great methods for analyzing text.
What do they do? The front-end of any text analysis system is going to be information retrieval.
Information retrieval is a fancy, academic type of term, meaning essentially the same thing as
search. We want to take a subset of all of the information that's out there in the so-called digital
universe and bring in only what's relevant to our business problems at hand. Having the
infrastructure in place to do that is a very important aspect here.
Once we have that information in hand, we want to analyze it. We want to do what's called
information extraction, entity extraction. We want to identify the names of people, geographical
location, companies, products, and so on. We want to look for pattern-based entities like dates,
telephone numbers, addresses. And, we want to be able to extract that information from the
In order to do that, people usually apply a combination of statistical and linguistic methods. They
look for language patterns in the text. They look for statistics like the co-occurrence of words in
multiple text. When two words appear next to each other or close to each other in many different
documents -- that can be web pages or other documents -- that indicates the degree of
relationship. People apply so-called machine-learning technologies in order to improve the
accuracy of what they are doing.
All of this sounds very scientiﬁc and perhaps abstruse -- and it is. But, the good message here is
one that I have said already. There are now very good technologies that are suitable for use by
business analysts, by people who aren't wearing those white lab coats and all of that kind of stuff.
The technologies that are available now focus on usability by people who have business
problems to solve and who are not going to spend the time learning the complexities of the
algorithms that underlie them.
So, we're at the point now where you can even treat some of these technologies as black boxes.
They just work. They produce the results that you need in the form that you need them. That can
be in a form that extracts the information into databases, where you can do the same kind of BI
that you have been used to for the last 20 years or so with BI tools.
It can be visualizations that allow you to see the interrelationships among the people, the
companies, and the products that are identiﬁed in the text. If you're working in law enforcement
or intelligence, that could be interrelationships among individuals, organizations, and incidents
of various types. We have visualization technologies and BI technologies that work on top of
Then, we have one other really nice thing that's coming on the horizon, which is semantic web
technology -- the ability to use text analytics to support building a web of data that can be
queried and navigated by automated software tools. That makes it even easier for individuals to
carry out everyday business and personal problems for that matter.
Gardner: I'd like to dig into some use-cases and understand a little bit better how this is being
used productively in the ﬁeld. Before we do that, Stefan, maybe you could explain from Kapow
Technologies' perspective, how you relate to this text analytics ﬁeld that Seth so nicely just
described. Where does Kapow begin and end, and how do you play perhaps within an ecosystem
of providers that help with text analytics?
Andreasen: Text analytics, exactly as Seth was saying, is really a form of BI. In BI, you are
examining some data and drawing some conclusions, maybe even making some automated
actions on it.
Obviously, any BI or any text analysis is no better than the data source behind it. There are four
extremely important parameters for the data sources. One is that you have the right data sources.
There are so many examples of people making these kind of BI applications, text analytics
applications, while settling for second-tier data sources, because they are the only ones they
have. This is one area where Kapow Technologies comes in. We help you get exactly the right
data sources you want.
The other thing that's very important is that you have a full picture of the data. So, if you have
data sources that are relevant from all kinds of verticals, all kinds of media, and so on, you really
have to be sure you have a full coverage of data sources. Getting a full coverage of data sources
is another thing that we help with.
We already talked about the importance of noise-free data to ensure that when you extract data
from your data source, you get rid of the advertisements and you try to get the major information
in there, because it's very valuable in your text analysis.
Of course, the last thing is the timeliness of the data. We all know that people who do stock
research get real-time quotes. They get it for a reason, because the newer the quotes are, the surer
they can look into the crystal ball and make predictions about the future in a few seconds.
The world is really changing around us. Companies need to look into the crystal ball in the
nearer and nearer future. If you are predicting what happens in two years, that doesn't really
matter. You need to know what's happening tomorrow. So, the timeliness of the data is important.
Let me get to the approach that we're taking. Business analysts work with business applications
through their web browser. They actually often cut and paste data out of business application into
You can see our product as a web browser, where you can teach it how to interact with the
website, how to only extract the data that's relevant, and how you can structure that data, and
then repeat it. Our product can give you automated, real-time, and noise-free access to any data
you see in a web browser.
How does that apply to text analytics? Well, it gives you the 100-percent covered, real-time data
source, with all of those values that I just explained.
Gardner: I really was intrigued by this notion of the crystal ball, and not two years from now,
but tomorrow. It seems to me that so many people are putting up so much information about their
lives, their preferences. People in business are doing the same around their occupation. We have
this virtual focus group going on around us all the time. If we could just suck out the right
information based on our products, we could get that crystal ball polished up.
Let me go back to you, Stefan. Can you give us an example of where a market research,
customer satisfaction, or virtual focus group beneﬁt is being derived from these text analytics
Knowing the customer
Andreasen: Absolutely. For any company selling services or products, the most important thing
for them to know is what the customers think about their product. Are we giving our customers
the right customer service? Are we packaging our products the right way? How do we
understand the customer's buying behavior, the customer communications, and so on?
Intuit is a customer we have together with a text analysis company called Clarabridge. They use
text analysis solution to understand the TurboTax customers.
Before they had a text analysis system, they had some people that did one percent coverage
sampling of forums on the web, their own customer support system, and emails into their contact
center to get some rudimentary overview of what the customer thought.
We went in, and with Kapow Technologies they can now get to all these data sources -- forums
online, their own customer support center, and wherever there are networks of TurboTax users --
and extract all the information in near real-time. Then, they use the text-analysis engine to make
much, much better predictions of what the customers think, and they actually having the ﬁnger
on the pulse.
If a set of customers suddenly talk about a feature that doesn't work, or that is much better in the
competitor's product -- and thereby looking into the near future of the crystal ball --they can react
early and try to deal with this in the best possible way.
Gardner: Seth Grimes, is this an area where you have seen a lot of the text analytics work
focused on these sort of virtual focus groups?
Grimes: Deﬁnitely. That's an interesting concept. The idea behind a focus group is that it's a
traditional qualitative research tool for market research ﬁrms. They get a bunch of people into a
room and they have the facilitator lead those people through a conversation to talk about brand
names, marketing, positioning, and then get their reactions to it.
With the web, you don't have to get those people together, because they come together on their
own and participate in social media forums of various types. There are a whole slew of them.
Together they constitute a virtual focus group, as you say.
The important point here is to get at the so-called voice of the customer. In other words, what is
the customer saying in his own voice, not in some form where you're forcing that person to tick
off number one, two, three, four, or ﬁve, in order to rate your product. They can bring up the
issues that are of interest to them, whether they are good or bad issues, and they can speak about
those issues however they naturally do. That's very important.
I've actually been privileged to share a stage with the analytics manager from Intuit, Chris Jones,
a number of times to talk about what he is doing, the technologies, and so on. It's really
interesting stuff that ampliﬁes what Stefan had to say.
The idea is that you can use these technologies, both to get a broad picture of the issues, and no
longer have to bend those issues into categories that your business analysts have predeﬁned.
Now, you can generate the topics of most interest, using automated, statistical methods from
what the people are actually saying. In other words, you let them have their own voice.
You also get the effect of not only looking at the aggregate picture, at the mass of the market, but
also at the individual cases. If someone posts about a problem with one of the products to an
online forum, you can detect that there's an issue there.
You can make sure that the issues gets to the right person, and the company can personally
address each issue in order to really keep it from escalating and getting a lot of attention that you
really don't want it to get. You get the reputation of being a very responsive company. That's a
very important thing.
The goal here is not necessarily to make more money. The goal is to boost your customer
satisfaction rating, Net Promoter score, or however you choose to measure it. These
technologies, the text technologies, are a very important package and part of the overall package
of responding to customer issues and boosting customer satisfaction.
While you're doing it, those people are going to buy more. They're going to reduce your support
costs, all of that kind of stuff, and you are going to make more money. So, by doing the right
thing, you're also doing something good for your own company.
Gardner: In business, you want to reduce the guesswork to do better by your customers. Stefan,
as I understand it, Kapow Technologies has been quite successful in working with a variety of
military, government, and intelligence agencies around the world on getting this real-time
information as to what's going on, but perhaps with the stakes being a bit higher, things like
terrorism, and even insurrections and uprising.
Tell us a little bit about a second use case scenario, where text analytics are being used by
government agencies and intelligence agencies.
Andreasen: As Seth said, the voice of the customer is very interesting and very valuable use
case with text analysis. I'll add one thing to what Seth said. He was talking about product input,
and of course, we all know that developing products -- maybe not so much a product like
TurboTax, but developing a car -- is extremely expensive. So, understanding what kind of
product your customers want in the future is an important part of the voice of the customer.
With a lot of the customers in the military intelligence, it's similar. Of course, they would like to
know what people are writing from a sentiment point of view, an opinion point of view, but
another thing that's actually even more important in the intelligence community is what I will
Seth mentioned relationships earlier, and also understanding the real inﬂuencers and who are the
ones that have the most connections in these relationships. Let's say somebody writes an article
about how you mix some chemicals together to make an efﬁcient bomb. What you really want to
know is who this person knows in all kinds of social networks on the 'Net, and to try to make a
network of who are the real inﬂuencers and who are the network centers.
We see a lot of uses of our product, going out to blogs, forums, etc., in all kinds of languages,
translating it often into English, and doing this relationship analysis. A very popular product for
that, which is a partner of ours, is Palantir Technologies. It has a very cool interactive way of
ﬁnding relationships. I think this is also very relevant for normal enterprises.
Yesterday I met with one of the big record companies, which is also a customer of ours. As soon
as I explained this relationship stuff, they said, "We can really use this for anti-piracy, because it
is really just very few people who do the major work when it gets to getting copies of new ﬁlms
out in the 'Net. So, understanding this relationship can be very relevant for this kind of scenario
Grimes: Dana, when you introduced our podcast today, you used the term ecology or ecosystem,
and that's a real great concept that we can apply here in a number of dimensions. We do have an
ecosystem in at least two dimensions.
Stefan mentioned one of the Kapow partners, Palantir. We earlier mentioned the text analytics
partner, Clarabridge. We have the ability now through integration technologies like Kapow to
bring together different information sources, very disparate, different information sources with
different characteristics, to provide an ecosystem of information that can be analyzed and
brought to bear to solve particular business or government problems.
We have a set of software technologies that can similarly be integrated into an ecosystem to help
you solve those problems. That might be text analysis technologies. It might be traditional BI or
data warehousing technologies. It might be visualization technologies, whatever it takes to
handle your particular business problem.
As we've been discussing, we do see applications in a whole variety of business and government
issues, whether it's customer or intelligence or many other things that we haven’t even discussed
today. So I ﬁnd that ecosystem concept to be very useful here in framing the discussions about
how the text technologies ﬁt into something that's a much larger picture.
Gardner: So, we are looking at the ecologies. We are looking at some of these use-cases. It
seems to me that we also want to be able to gather information from a variety of different
players, perhaps in some sort of a supply chain, ecosystem, business process, channel partners, or
value added partners. The ecology and ecosystem concept works not only in terms of what we do
with this information, but how we can apply that information back out to activities that are multi-
player, beyond the borders or boundaries of any one organization.
I'm thinking about product recall, health, and public-health types of issues. Seth, have you
worked with any clients or do you have any insights into how text analytics is beneﬁting an
extended supply chain of some sort, and how the ecosystem of insight into the text analytics
solves some unique problems there?
Grimes: Product recall is an interesting one. Let me give you an example there. This is, like
most examples that we are going to discuss, a multifaceted one.
People are all familiar with the problems with Firestone tires back a number of years ago, early
in this decade, where the tread was coming off tires. Well, there are a number of parties that are
going to be interested in this problem.
I am sorry, but put aside the consumers who are obviously affected by it, very badly affected by
it. But, we have the manufacturers, not only of the tires, but also of the vehicles, the Ford
Explorer in this case.
We have the regulatory bodies in the government, parts of the U.S. Department of
Transportation. We have the insurance industry. All of these are stakeholders who have an
interest in early detection, early addressing, and early correction of problem.
You don't want to wait until there are just so many cases here that it's just obvious to everyone,
the issues really spill out into the press, and there are questions of negligence, and so on. So, how
can you address something like a problem with tires where the tread is coming off?
Well, one way is warranty claims. For example, someone might ﬁle a claim through the vehicle
manufacturer, Ford in this case, or through the tire manufacturer, claiming a defective product.
Sometimes, just an individual tire is defective, but sometimes that's an indication of
manufacturing or design issues. So you have warranty claims.
You also have accident reports that are ﬁled by police departments or other government agencies
and ﬁnd their way into databases in the Department of Transportation and other places. Then,
you have news reports about particular incidents.
There are multiple sources of information. There are multiple stakeholders here. And, there are
multiple ways of getting at this. But, like so many problems, you're going to get at the issue
much faster, if you combine information from all of these different sources, rather than relying
on a single source.
Again, that's where the importance of building up an ecosystem of different data sources that
come to bear on your problem is really important, and that's just a typical use case. I know of
other organizations, manufacturing organizations, that are using this technology in conjunction
with data-mining technologies for warranty claims, for example. Consumer appliances is another
area that I have heard a lot about, but really there is no limitation in where you can apply this.
Gardner: Stefan, from your perspective, for these extended supply chains, public health issues,
etc., again we get down to this critical time element -- for example, the Swine ﬂu outbreak last
spring. If folks could identify through text analytics where this was starting to crop up, they
didn't have to wait for the hospital reports necessarily. Is that an instance where some of these
technologies can really play an important role?
Andreasen: Absolutely. Before I get into some more real examples, I want to emphasize some
of the things that Seth was saying. He's talking about getting to multiple data sources. I cannot
stress enough that what I have seen out there as one of the biggest pitfalls when people are
making a text analysis solution or actually any BI solution is that they look at what data sources
they have and they settle for that.
They should have said, "What are the optimal data sources to get the best prediction and get the
best outcome out of this text analysis?" They should settle for no less than that.
The example here will actually explain that. I also have a tire example. We actually have two
different kinds of customers using our products looking at tires, tire explosions, and tire recalls.
One is a tire company itself. They go to automated forums and try to monitor if people are doing
exactly what Seth is saying, ﬁling claims or writing on an automotive blog: "I got this tire, and it
exploded." "It's just really bad." "Don't buy it." All those kinds of information from different
If you get enough of the data source and you get that data in real-time, you can actually go in and
contain the situation of a potential tire recall before it happens, which of course could be very
valuable for your company.
The other use case is stock research. We have a lot of customers doing ﬁnancial and market
research with our technology. One of them is using our product, for example, to go out and check
the same forums, but their objective is to predict if there is a tire recall. Then, they can predict
that the stock is going to get a crash, when that happens, and project that beforehand.
Many different players here can use the same kind of information for different purposes, and that
makes this really interesting as well.
Gardner: Well, it really seems the age old part of this is that, getting information ﬁrst has many,
many advantages, but the new element is that more and more information is in the form of
analytics out in the web.
I wonder if we could cap this discussion -- we are about out of time -- by looking at the future.
Seth, you mentioned earlier the semantic web. How automated can this get, and what needs to
take place in order for that vision of a semantic web to take place?
Grimes: Well, the semantic web right now is a dream. It's a dream that was ﬁrst articulated over
a decade ago by Tim Berners-Lee, the person who created the World Wide Web, but it is one that
is on the fast track to being realized. Being realized in this case means creating meaning.
What Stefan was referring to earlier when he talked about the dates of a published article, the
title, perhaps other metadata ﬁelds such as the author, creating information that describes what's
out there on the web and in databases.
Rendering that information into a form that's machine processable, not only in the sense of
analysis, but also in the sense of making interconnections among different pieces of information,
is what the semantic web is really about. It's about structuring information that's out there on the
Web. That can include what Stefan referred to as the deep web, and creating tools that allow
people to search and issue other types of queries against that web data.
It's something that people are working hard on now, but I don't think will be really realized in
terms of any broad business usable applications for a fair number of years. Not next year or the
year after, but maybe three to ﬁve years out, we will really start to see a very broadly useful
business application. There is going to be niche applications in the near term, but later something
It's a direction that really hits on the themes that we have been talking about today, integrating
applications and data from multiple sources and of multiple types in order to create a whole that
is much greater than each of the parts.
We need software technologies that can do that nowadays, and fortunately we have them, as we
have been discussing. We need a path that will evolve us towards something that really creates
much greater value for much larger massive applications in the future, and fortunately the
technologies that we have now are evolving in that direction.
Gardner: Very good. I think we have to leave it there. I want to thank both of our guests. We
have been discussing the role of text analytics and how companies can take advantage of that and
bring that into play with their BI and marketing and other activities, and how the mining of this
information is now being done by tools and is increasingly being automated.
I want to thank Seth Grimes, principal consultant at Alta Plana Corp., for joining us. Thanks so
Grimes: Again, thank you Dana, and thanks to Kapow for making this possible.
Gardner: Also, Stefan Andreasen, co-founder and CTO at Kapow Technologies. Thanks again
for sponsoring and joining us, Stefan.
Andreasen: Well, thank you. That was a great discussion. Thank you.
Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. This is Part Three of a
series from Kapow Technologies on using BI and web data services in unique forms to increase
You have been listening to a sponsored BrieﬁngsDirect podcast. Thanks and come back next
Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn
more. Sponsor: Kapow Technologies.
Transcript of a sponsored BrieﬁngsDirect podcast on information management for business
intelligence, one of a series on web data services with Kapow Technologies. Copyright
Interarbor Solutions, LLC, 2005-2009. All rights reserved.