P:In order to develop a strategy for employing the Cloud for efficient content analytics, we need to understand the characteristics of the content we want to analyze and we need to understand the benefits of the Cloud.Both, Cloud and Content Analytics solutions are growing rapidly. With the amount of content being generated and the need to find that content, its become essential we develop strategies for analyzing and tagging it so we can find it later. We spend a lot of time and money finding what we need – and by some accounts it costs even more to reproduce it if we can’t find it. Because information is exploding and a greater percentage of it is now richer in content, we also need a strategy that allows the most efficient use of resources to process the content.A: During this presentation, I’d like you to consider the following:Think about your content – and its characteristicsThink about your needs, your customer needsThink about how you can meet your customer needs more efficiently – enabling them to find information more efficiently and make better decisions.Gaining a better understanding about your content, needs and the different cloud models will help drive your cloud strategy and bring you the best value when employing a cloud solution for efficient content analyticsB: I believe the cloud holds immense power, flexibility and economies of scale to meet our content analytics needs that solve problems, allow us to gain better insight and make better decisions that were previously out of reach when faced with massive amounts of data.
I think of information growth in three dimensions – volume, richness and dispersionHow much content do I need to analyze today and what is the rate of its growth – how much content will I potentially be analyzing several years from nowWhat type of content is it? Is it unstructured? Is it just text? Or is it video, images, or audio?Where is it? Is it already in the cloud? If not, can I put it on the Cloud?Finally, let’s not forget, what type of analytics do I need? Categorization, Named Entity Extraction, Pattern Detection, Sentiment Analysis, Facial Recognition (in video)? Something else?.
Consumers of content have different needs. In order to determine the type of Content Analytics you’ll leverage to understand and tag your content, you have to understand these needs.Who is your end-user consuming the content and output of content analysis? Are they librarians or taxonomists? Perhaps the primary use case in this instance is to provide a solution for your subject matter experts to discover new information that helps them build and maintain taxonomies and knowledge bases – that eventually better serve your customers.Is the end-user your customer? How will your end-users want to find the content – how do you want them to look for it? Do you want them to search for content by what it is “about” within a certain context? Is your user an analyst who’s required to comply to new policies on how to handle content with sensitive information? Always ask:Who will consume the content?For what purpose?Do they know what they are looking for?The answers will help choose how to analyze your content and therefore help drive your strategy for employing the cloud.
Let’s refresh our memories with some common types of content analysis. I’m going to mostly focus on analyzing text on this slide.Each of these has its benefits and disadvantages:Categorization’s intent is to determine if the content is about a concept you have described in a taxonomyNamed Entity Extraction uses Natural Language Processing Algorithms to discover new information like People, Places and OrganizationsPattern Detection uses rules expressed in some language to find patterns in the content – most effective when the domain is known.It’s important to note, real-life tests have shown Categorization to be three orders of magnitude faster than Named Entity Extraction. Pattern Detection falls in the large middle somewhere. This is important because when we talk about economies of scale with the cloud, Named Entity Extraction – which is compute intensive seems like a very nice fit for a Cloud-based solution. There are many other types of analysis, such as Sentiment Analysis and even more complex analysis like facial recognition in Images and Videos that I didn’t get into; but I think the point is made.... I think its safe to say that these types of analysis are compute intensive and would also be good candidates for leveraging the Cloud.------Now, you might look at the last couple of slides and think, well…no kidding. Sure, I need to establish the personas I’m serving, their needs, then design a solution...
Performance in the cloud has been touted; but one thing to keep in mind with regard to performance is the overall cost of content analytics. Don’t just evaluate cloud performance; but also consider network performance between the cloud edge and your enterprise or customer facing application.No cloud solutions are immune to latency – the extent varies; however there are architectures and solutions (such as WAN optimization) that can reduce the latency to an acceptable level. How you package your content to the cloud may also increase or decrease latency. For instance, if you are sending entire documents to the cloud (where text extraction occurs, then analysis), this may cost more to transfer if you were just transferring text (having done the extraction on-premise).In some cases, you may not have a choice. If you need to send video and images for complex analysis in the cloud, you will have to make do with the latency knowing you are benefitting from analyzing those assets in the cloud.Content Analytics in general is a very compute intensive process – some analysis are much faster than others; but we do know this – content must be extracted and analyzed in some form or fashion. You must ask, does executing content analytics in the cloud give me the performance benefit to justify the cost associated with latency. I believe it does in most cases.Here’s an interesting analogy of sorts:I was speaking to a colleague of mine in France about the benefits and disadvantages of content analytics in the cloud and he gave an interesting analogy. He said, in Europe, shrimp are fished in the North, then shipped to Morocco to be prepared, then shipped back to the North to be consumed. He finished by saying “Content is like Shrimp!” (Hardly possessing any culinary skills, I have no clue what is involved in the preparation; but for some reason its worth it)-----The point here is that if the cost of shipping content to the cloud for analysis (and getting the results back) is smaller than the cost of analyzing it on premise, then its worth having the content analyzed on the Cloud. But keep in mind, this only applies if your content is on-premise. If the content is already in the Cloud, then the shipping cost kept to a minimum (such as results, or taxonomies). The size of the data also counts as well as network latency from the Cloud edge to your enterprise.
There are many applications and services available in the cloud today. Organizations are moving their IT operations, data and applications to the cloud and are reporting immediate benefits in terms of cost, performance and customer satisfaction.I believe the Cloud will continue to grow and as security concerns are mitigated, we’ll see greater adoption rates.For content analytics, I believe we all agree unstructured content is growing explosively, we also can agree that in order to find it, we need to efficiently analyze it and intelligently tag it. Knowing the potential of the cloud today, it makes sense to consider a cloud model for efficient content analytics.Facebook is producing summaries over large amounts of data to drive business decisions. With around a half billions users and billions of page views every day, you could say Facebook accumulates massive amounts of data. In order to drive innovation, developers needed tools to mine and manipulate data – roughly 15 terabytes per day. Before the cloud, this analysis was nearly impossible to solve. See full description here: http://www.boozallen.com/media/file/MassiveData.pdfBig Data trends, statistics are helping companies determine their next moves – via Hadoop & MapReduce, why not Content AnalyticsExamples:Log ProcessingEvent DetectionFraud AnalysisTrend Analysis
Today, the private cloud offers the best balance of cloud benefits. The private cloud takes the benefits of economies of scale, low cost and flexibility the public cloud offers - and keeps the infrastructure in an internal closed network – where knowledge can remain secure and under better control of your organization. Its also easier for your organization to consider migrating your content to data management services in the private cloud knowing that its secure – and if you choose to leave the content in your existing managed repositories, you’ll be confident that the latency issue will be a smaller factor moving within the private network.Any computational intensive process is an ideal candidate for leveraging the cloud – content analytics falls into this category. If security, control and latency issues are mitigated, there is little argument against using a cloud-based solution for content analytics.
Employ the Cloud for Efficient Content Analytics - 10 november 2011