Employ the Cloud for
Efficient Content
Analytics
Enabling Optimal Decision Making
with the Cloud and Content Analytics
Sam...
Information Growth – Three Dimensional
VOLUME

DISPERSION

CONTENT
ANALYTICS

RICHNESS

INFORMATION

© Copyright 2011 EMC ...
Big Data Size: The Volume Of Content
Continues To Explode
The Digital Universe 2010 - 2020
90%
91% Video
Unstructured1
It’...
Types of Content Analysis
Driven by Need

“I need to find
“I need to discover
documents about
new knowledge to
some concep...
Types of Content Analysis
To list a few…
• Categorization – Taxonomy Driven
– Indicates what the content is about in a cer...
The Journey To Your Cloud
Private Cloud is a logical first step
Enterprise IT

Private Cloud

Complex
Trusted
Expensive
Co...
The Hybrid Cloud
Best of Private and Public Clouds
Hybrid Cloud

Information

Private Cloud

Public Cloud

Hybrid Cloud us...
Latency: The Cloud’s Achilles Heel
Shipping Costs

© Copyright 2011 EMC Corporation. All rights reserved.

8
Applications In The Cloud
ECM

CUSTOMER
COMMUNICATION

CONTENT
ANALYTICS
GOVERNANCE

DATA
ANALYTICS

© Copyright 2011 EMC ...
Private Cloud for Efficient Content
Analytics
Ideal First Step

Trusted
Controlled
Reliable
Secure

© Copyright 2011 EMC C...
THANK YOU

© Copyright 2011 EMC Corporation. All rights reserved.

11
Upcoming SlideShare
Loading in …5
×

Employ the Cloud for Efficient Content Analytics - 10 november 2011

297 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
297
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • P:In order to develop a strategy for employing the Cloud for efficient content analytics, we need to understand the characteristics of the content we want to analyze and we need to understand the benefits of the Cloud.Both, Cloud and Content Analytics solutions are growing rapidly. With the amount of content being generated and the need to find that content, its become essential we develop strategies for analyzing and tagging it so we can find it later. We spend a lot of time and money finding what we need – and by some accounts it costs even more to reproduce it if we can’t find it. Because information is exploding and a greater percentage of it is now richer in content, we also need a strategy that allows the most efficient use of resources to process the content.A: During this presentation, I’d like you to consider the following:Think about your content – and its characteristicsThink about your needs, your customer needsThink about how you can meet your customer needs more efficiently – enabling them to find information more efficiently and make better decisions.Gaining a better understanding about your content, needs and the different cloud models will help drive your cloud strategy and bring you the best value when employing a cloud solution for efficient content analyticsB: I believe the cloud holds immense power, flexibility and economies of scale to meet our content analytics needs that solve problems, allow us to gain better insight and make better decisions that were previously out of reach when faced with massive amounts of data.
  • I think of information growth in three dimensions – volume, richness and dispersionHow much content do I need to analyze today and what is the rate of its growth – how much content will I potentially be analyzing several years from nowWhat type of content is it? Is it unstructured? Is it just text? Or is it video, images, or audio?Where is it? Is it already in the cloud? If not, can I put it on the Cloud?Finally, let’s not forget, what type of analytics do I need? Categorization, Named Entity Extraction, Pattern Detection, Sentiment Analysis, Facial Recognition (in video)? Something else?.
  • Consumers of content have different needs. In order to determine the type of Content Analytics you’ll leverage to understand and tag your content, you have to understand these needs.Who is your end-user consuming the content and output of content analysis? Are they librarians or taxonomists? Perhaps the primary use case in this instance is to provide a solution for your subject matter experts to discover new information that helps them build and maintain taxonomies and knowledge bases – that eventually better serve your customers.Is the end-user your customer? How will your end-users want to find the content – how do you want them to look for it? Do you want them to search for content by what it is “about” within a certain context? Is your user an analyst who’s required to comply to new policies on how to handle content with sensitive information? Always ask:Who will consume the content?For what purpose?Do they know what they are looking for?The answers will help choose how to analyze your content and therefore help drive your strategy for employing the cloud.
  • Let’s refresh our memories with some common types of content analysis. I’m going to mostly focus on analyzing text on this slide.Each of these has its benefits and disadvantages:Categorization’s intent is to determine if the content is about a concept you have described in a taxonomyNamed Entity Extraction uses Natural Language Processing Algorithms to discover new information like People, Places and OrganizationsPattern Detection uses rules expressed in some language to find patterns in the content – most effective when the domain is known.It’s important to note, real-life tests have shown Categorization to be three orders of magnitude faster than Named Entity Extraction. Pattern Detection falls in the large middle somewhere. This is important because when we talk about economies of scale with the cloud, Named Entity Extraction – which is compute intensive seems like a very nice fit for a Cloud-based solution. There are many other types of analysis, such as Sentiment Analysis and even more complex analysis like facial recognition in Images and Videos that I didn’t get into; but I think the point is made.... I think its safe to say that these types of analysis are compute intensive and would also be good candidates for leveraging the Cloud.------Now, you might look at the last couple of slides and think, well…no kidding. Sure, I need to establish the personas I’m serving, their needs, then design a solution...
  • Performance in the cloud has been touted; but one thing to keep in mind with regard to performance is the overall cost of content analytics. Don’t just evaluate cloud performance; but also consider network performance between the cloud edge and your enterprise or customer facing application.No cloud solutions are immune to latency – the extent varies; however there are architectures and solutions (such as WAN optimization) that can reduce the latency to an acceptable level. How you package your content to the cloud may also increase or decrease latency. For instance, if you are sending entire documents to the cloud (where text extraction occurs, then analysis), this may cost more to transfer if you were just transferring text (having done the extraction on-premise).In some cases, you may not have a choice. If you need to send video and images for complex analysis in the cloud, you will have to make do with the latency knowing you are benefitting from analyzing those assets in the cloud.Content Analytics in general is a very compute intensive process – some analysis are much faster than others; but we do know this – content must be extracted and analyzed in some form or fashion. You must ask, does executing content analytics in the cloud give me the performance benefit to justify the cost associated with latency. I believe it does in most cases.Here’s an interesting analogy of sorts:I was speaking to a colleague of mine in France about the benefits and disadvantages of content analytics in the cloud and he gave an interesting analogy. He said, in Europe, shrimp are fished in the North, then shipped to Morocco to be prepared, then shipped back to the North to be consumed. He finished by saying “Content is like Shrimp!”  (Hardly possessing any culinary skills, I have no clue what is involved in the preparation; but for some reason its worth it)-----The point here is that if the cost of shipping content to the cloud for analysis (and getting the results back) is smaller than the cost of analyzing it on premise, then its worth having the content analyzed on the Cloud. But keep in mind, this only applies if your content is on-premise. If the content is already in the Cloud, then the shipping cost kept to a minimum (such as results, or taxonomies). The size of the data also counts as well as network latency from the Cloud edge to your enterprise.
  • There are many applications and services available in the cloud today. Organizations are moving their IT operations, data and applications to the cloud and are reporting immediate benefits in terms of cost, performance and customer satisfaction.I believe the Cloud will continue to grow and as security concerns are mitigated, we’ll see greater adoption rates.For content analytics, I believe we all agree unstructured content is growing explosively, we also can agree that in order to find it, we need to efficiently analyze it and intelligently tag it. Knowing the potential of the cloud today, it makes sense to consider a cloud model for efficient content analytics.Facebook is producing summaries over large amounts of data to drive business decisions. With around a half billions users and billions of page views every day, you could say Facebook accumulates massive amounts of data. In order to drive innovation, developers needed tools to mine and manipulate data – roughly 15 terabytes per day. Before the cloud, this analysis was nearly impossible to solve. See full description here: http://www.boozallen.com/media/file/MassiveData.pdfBig Data trends, statistics are helping companies determine their next moves – via Hadoop & MapReduce, why not Content AnalyticsExamples:Log ProcessingEvent DetectionFraud AnalysisTrend Analysis
  • Today, the private cloud offers the best balance of cloud benefits. The private cloud takes the benefits of economies of scale, low cost and flexibility the public cloud offers - and keeps the infrastructure in an internal closed network – where knowledge can remain secure and under better control of your organization. Its also easier for your organization to consider migrating your content to data management services in the private cloud knowing that its secure – and if you choose to leave the content in your existing managed repositories, you’ll be confident that the latency issue will be a smaller factor moving within the private network.Any computational intensive process is an ideal candidate for leveraging the cloud – content analytics falls into this category. If security, control and latency issues are mitigated, there is little argument against using a cloud-based solution for content analytics.
  • Employ the Cloud for Efficient Content Analytics - 10 november 2011

    1. 1. Employ the Cloud for Efficient Content Analytics Enabling Optimal Decision Making with the Cloud and Content Analytics Samir A. Batla Principal Product Manager, EMC IIG samir.batla@emc.com © Copyright 2011 EMC Corporation. All rights reserved. 1
    2. 2. Information Growth – Three Dimensional VOLUME DISPERSION CONTENT ANALYTICS RICHNESS INFORMATION © Copyright 2011 EMC Corporation. All rights reserved. 2
    3. 3. Big Data Size: The Volume Of Content Continues To Explode The Digital Universe 2010 - 2020 90% 91% Video Unstructured1 It’s everywhere (2014)2 Data Volume Growing 44x 2010: 1.2 Zettabytes Source: IDC Digital2011 EMCStudy, sponsoredrights reserved.2010 © Copyright Universe Corporation. All by EMC, May 2020: 35.2 Zettabytes 3
    4. 4. Types of Content Analysis Driven by Need “I need to find “I need to discover documents about new knowledge to some concept” manage my taxonomies” Business Analyst: “I’m composing a workflow to protect documents containing sensitive information. I need to find content that contains employee id patterns in order to apply rights management policies” © Copyright 2011 EMC Corporation. All rights reserved. 4
    5. 5. Types of Content Analysis To list a few… • Categorization – Taxonomy Driven – Indicates what the content is about in a certain context • Named Entity Extraction – NLP-based – Finds what’s mentioned in the content • Pattern Detection – Rules-based e.g. Regex – Finds patterns in the content • Sentiment Analysis, Topic/Theme analysis, etc. © Copyright 2011 EMC Corporation. All rights reserved. 5
    6. 6. The Journey To Your Cloud Private Cloud is a logical first step Enterprise IT Private Cloud Complex Trusted Expensive Controlled Inflexible Reliable Siloed Secure Public Cloud Simple Low Cost Flexible Dynamic Infrastructure “70% Will Spend More On Private Cloud through 2012” GARTNER DATA CENTER CONFERENCE 2009 © Copyright 2011 EMC Corporation. All rights reserved. 6
    7. 7. The Hybrid Cloud Best of Private and Public Clouds Hybrid Cloud Information Private Cloud Public Cloud Hybrid Cloud use will triple within the next three years. Sand Hill Group 2010 © Copyright 2011 EMC Corporation. All rights reserved. 7
    8. 8. Latency: The Cloud’s Achilles Heel Shipping Costs © Copyright 2011 EMC Corporation. All rights reserved. 8
    9. 9. Applications In The Cloud ECM CUSTOMER COMMUNICATION CONTENT ANALYTICS GOVERNANCE DATA ANALYTICS © Copyright 2011 EMC Corporation. All rights reserved. CAPTURE/INGEST CONTENT DELIVERY 9
    10. 10. Private Cloud for Efficient Content Analytics Ideal First Step Trusted Controlled Reliable Secure © Copyright 2011 EMC Corporation. All rights reserved. Simple Low Cost Flexible Dynamic 10
    11. 11. THANK YOU © Copyright 2011 EMC Corporation. All rights reserved. 11

    ×