• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Tim Estes - Information Systems in an Entity Centric World

on

  • 2,418 views

Tim Estes, CEO of Digital Reasoning, talks about the use of Hadoop and other scalable technologies along with Digital Reasoning's analytics for automated understanding of cloud-scale text ...

Tim Estes, CEO of Digital Reasoning, talks about the use of Hadoop and other scalable technologies along with Digital Reasoning's analytics for automated understanding of cloud-scale text challenges.
This presentation was delivered at Hadoop World in New York in Oct 2010

Statistics

Views

Total Views
2,418
Views on SlideShare
2,231
Embed Views
187

Actions

Likes
1
Downloads
53
Comments
0

6 Embeds 187

http://identityresolutiondaily.com 88
http://www.digitalreasoning.com 62
http://www.identityresolutiondaily.com 31
http://webcache.googleusercontent.com 3
http://www.mikeplanet.com 2
http://codemike.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Tim Estes - Information Systems in an Entity Centric World Tim Estes - Information Systems in an Entity Centric World Presentation Transcript

    • Information Systems in an Entity-Centric World Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved
    • What Are We Covering? ‣ The New Entity-Oriented Mission in an Era of Information Overload ‣ Demonstration ‣ Entity-Oriented Analytics ‣ Processing ‣ Schema/Semantics ‣ The Takeaways Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved 2
    • Mission Evolution from Forces to Entities Individuals Armies Groups Multiple Cyber Presences for Individuals Time 1980 1990 2000 Current Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved 3
    • 4 Information Explosion: “A Wealth of Information creates a Poverty of Attention” 80% of all data is unstructured Tenfold growth in data in five years • 1.73 billion internet users as of September 2009 • 247 billion emails on average sent every day in 2009 • 126 million blogs on the internet •400+ million Facebook users generate more than 5 billion pieces of content every week • 50 million Twitter messages were sent every day in January 2010 (~17-18 Billion tweets so far…) Global information created shown in Exabytes (Source: IDC) ‣ Most of the time, what we don’t know poses the greatest risk. Missing critical information can have tragic consequences. • President Obama, when referencing the Christmas Day bomb attempt, recognized that the intelligence community failed to “connect the dots” ‣ Hiring more analysts to manually review is an expensive and insufficient approach • Costs not reasonable, lack of coverage intolerable for intelligence, defense and law enforcement agencies ‣ Most automated solutions do not understand human language sufficiently to be useful • variations in grammar, spelling and context obscure meaning to software ‣ Today's search solutions link you to matching keywords and don't deliver understanding of hidden relationships buried in unstructured text Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved
    • The Last 10 Years Have Been About Search The Next 10 Years Are Going To Be About Summarization Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved 5
    • Community Response Innovation Understand Data Enriched Data Improved Tools Keyword Search Time 1992 2000 2006 Current ‣ Documents retrieved by ‣Faceted search and ‣Extracts known entities ‣Entity Resolution matching keywords related filters make adding metadata to allow ‣Horizontal Scalability ‣ Limited as matching discovery easier filtering ‣Statistical Filtering keywords doesn’t provide ‣Actual “meaning” of data ‣Predetermined taxonomies ‣No preconceptions understanding remains hidden of limited value for ‣Context discovered unknowns automatically Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved 6
    • Transform Enterprise Knowledge: From Docs to Things Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved 7
    • Demonstration Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved 8
    • What Are We Using? ‣ Hadoop (Cloudera Distribution of Hadoop V3 – CDH3) for horizontally scalable ingestion and analytics processing ‣ CDH3 has beneficial tooling over vanilla Hadoop ‣ Its nice to have supported/assured capabilities by really smart guys ‣ Cassandra for persistence of entity data ‣ No Single Point of Failure – true peer data architecture ‣ Leveraging its very fast write performance ‣ Pushing and extending its distributed query capabilities ‣ Multi-point ingest and eventual consistency are useful for downstream multi- datacenter/multi-cloud deployment scenarios ‣ Complex and novel analytics algorithms working on Hadoop ‣ NLP and extraction technology is fresh and state of the art ‣ Associative Network algorithms are unique and patented ‣ Concept resolution from structured and unstructured data using unsupervised learning approach ‣ Can also incorporate human guidance (augmented intelligence) but current capabilities give us a great way to control the fire hose of information before an analyst has to deal with it Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved 9
    • TM Synthesys in Entity Analytics Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved 10
    • TM Synthesys in Entity Analytics Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved 11
    • Entity-Centric Clouds - Takeaways ‣ Analysts (and people) want an entity-centric system vs. a document centric one (but it has to work). ‣ Dealing with entities vs. documents or records makes the IT complexity jump 100X ‣ But it gives our analysts a 10X boost in productivity if done right (maybe 100X when we get around to writing Agents on the Entity-Centric Cloud) ‣ That’s where the cloud comes in ‣ We as a community have the processing and storage capacity to handle this now at the 100m-10B record level ‣ We should take a “conservative” approach to analysis where analytics is a continuous enrichment process and our current models are not dramatically privileged over potential future models ‣ Nearly all web-scale search / analytics systems use the following recipe: ‣ Distributed, columnar type data store ‣ Horizontally scalable analytics framework (like Hadoop) ‣ Future systems are all distributed and learning all the time. Our investments today can be ready for this future with the right architectural and technology considerations. Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved 12
    • Tim Estes| CEO | Digital Reasoning Systems | 730 Cool Springs Blvd, Suite 110, Franklin, TN 37067 office: 615.370.1860 | fax: 615.370.1865 | email: info@digitalreasoning.com website: http://www.digitalreasoning.com twitter: http://twitter.com/spooksandgeeks Digital Reasoning™ ❘ Copyright © 2010 ❘ All Rights Reserved 1